I'm working on a problem where the execution time is critical. I have another C function that produces 3-D grids of values at a series of timestamps. What I want is to find the max_value
in each 3-D grid at each timestamp. Additionally I am tracking the average value (sum / ncell
) of each grid, and returning a maximum normalised by the average value.
我不精通C语言,因此我想检查我是否缺少任何内容,无论是实际代码还是使用OpenMP。我想我的问题是:
找到沿第n维切片的n维数组的最大值的最有效方法是什么?
我知道您可以期望的最好结果(因为网格是无序的)是O(n)。我的评估是,这个问题就是O(m x n),m =时间维度,n =网格维度,我想我的实现可以达到这一点。这些尺寸的典型值可能是m = 5000到20000,n = 200 * 200 * 60。
Currently, I am timing my Python wrapper function (which includes the initialisation of the numpy.ndarray
s that receive the max, normMax, and maxIndex values:
- m = 2400
- n = 54000
- 线程= 8
为此,我平均需要0.33秒才能找到最大值。
如果相关的话,这在我的笔记本电脑上具有:
- 英特尔®酷睿TM i7-7700HQ CPU @ 2.80GHz(6MB缓存)
- 32GB RAM
码:
void find_max(double *mapPt, double *maxPt, double *normMaxPt,
int64_t *indexPt, int32_t nsamp, int32_t ncell,
int64_t threads)
{
double maxValue, currentValue, sum;
int32_t cell, maxIndex, timeSample;
#pragma omp parallel for num_threads(threads)
for (timeSample=0; timeSample<nsamp; timeSample++)
{
maxValue = 0.0;
maxIndex = 0;
sum = 0.0;
for (cell=0; cell<ncell; cell++)
{
currentValue = mapPt[cell * nsamp + timeSample];
sum += currentValue;
if (currentValue > maxValue)
{
maxValue = currentValue;
maxIndex = cell;
}
}
maxPt[timeSample] = maxValue;
normMaxPt[timeSample] = maxValue * ncell / sum;
indexPt[timeSample] = maxIndex;
}
}
我正在使用gcc 7.4.0进行编译,带有重要的标志可能是-Ofast和-lm。
我很高兴答案是“您无能为力”,只是想让自己安心。
One suggestion I could see would be to have
double *timesame_mapcells = &mapPt[timeSample];
at the start of every thread.Then you can just index with
cell * nsamp
, so one addition less per access. But the compiler might have been clever enough to optimize that.您也可以尝试在for循环中使用两个递增的变量:
每次使用timeSample加法+ nsamps乘法可以节省一些周期。 再说一次,这只是您尝试的建议。我不知道这是否会对性能产生明显影响。 (但我很想知道是否可以尝试一下)