在C中沿第n维求最大值

I'm working on a problem where the execution time is critical. I have another C function that produces 3-D grids of values at a series of timestamps. What I want is to find the max_value in each 3-D grid at each timestamp. Additionally I am tracking the average value (sum / ncell) of each grid, and returning a maximum normalised by the average value.

我不精通C语言,因此我想检查我是否缺少任何内容,无论是实际代码还是使用OpenMP。我想我的问题是:

找到沿第n维切片的n维数组的最大值的最有效方法是什么?

我知道您可以期望的最好结果(因为网格是无序的)是O(n)。我的评估是,这个问题就是O(m x n),m =时间维度,n =网格维度,我想我的实现可以达到这一点。这些尺寸的典型值可能是m = 5000到20000,n = 200 * 200 * 60。

Currently, I am timing my Python wrapper function (which includes the initialisation of the numpy.ndarrays that receive the max, normMax, and maxIndex values:

  • m = 2400
  • n = 54000
  • 线程= 8

为此,我平均需要0.33秒才能找到最大值。

如果相关的话,这在我的笔记本电脑上具有:

  • 英特尔®酷睿TM i7-7700HQ CPU @ 2.80GHz(6MB缓存)
  • 32GB RAM

码:

void find_max(double *mapPt, double *maxPt, double *normMaxPt,
              int64_t *indexPt, int32_t nsamp, int32_t ncell,
              int64_t threads)
{
    double  maxValue, currentValue, sum;
    int32_t cell, maxIndex, timeSample;

    #pragma omp parallel for num_threads(threads)
    for (timeSample=0; timeSample<nsamp; timeSample++)
    {
        maxValue = 0.0;
        maxIndex = 0;
        sum = 0.0;
        for (cell=0; cell<ncell; cell++)
        {
            currentValue = mapPt[cell * nsamp + timeSample];
            sum += currentValue;
            if (currentValue > maxValue)
            {
                maxValue = currentValue;
                maxIndex = cell;
            }
        }
        maxPt[timeSample] = maxValue;
        normMaxPt[timeSample] = maxValue * ncell / sum;
        indexPt[timeSample] = maxIndex;
    }
}

我正在使用gcc 7.4.0进行编译,带有重要的标志可能是-Ofast和-lm。

我很高兴答案是“您无能为力”,只是想让自己安心。

评论
自娱自乐
自娱自乐

One suggestion I could see would be to have double *timesame_mapcells = &mapPt[timeSample]; at the start of every thread.

Then you can just index with cell * nsamp, so one addition less per access. But the compiler might have been clever enough to optimize that.

您也可以尝试在for循环中使用两个递增的变量:

for (cell = 0, map_idx = timeSample; cell < ncell; cell++, map_idx += nsamps)
{
    currentValue = mapPt[map_idx];
    [...]
}

每次使用timeSample加法+ nsamps乘法可以节省一些周期。 再说一次,这只是您尝试的建议。我不知道这是否会对性能产生明显影响。 (但我很想知道是否可以尝试一下)

点赞
评论