昂贵的C#操作,希望提高性能

这将是一个很长的问题,在此先抱歉。我不希望有完整的代码解决方案,我希望从具有不同观点和更多经验的人那里获得一些建议。

My company is developing software for a product that does some rather expensive calculations using a film from an IR camera where every pixel contains a temperature value. The most costly of those methods is called Thermal Signal Reconstruction (if you are interested, you can read about it here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321698/ ). It basically performs a polynomial fit for each pixel over time (the number of frames). My C# implementation looks something like this:

public static double[,,] ThermalSignalReconstruction(List<Frame<double>> thermalFilm, byte polyOrder)
{
  Resolution filmResolution = thermalFilm[0].Resolution;
  uint width = filmResolution.Width;
  uint height = filmResolution.Height;
  int frames = thermalFilm.Count;
  double[,,] result = new double[polyOrder + 1, height, width];

  // using frame indexes as x-values for poly fit
  List<double> frameIndexes = new List<double>(frames);
  for (var frame = 0U; frame < frames; ++frame)
    frameIndexes.Add(frame);

  Parallel.For(0U, frames, frame =>
  {
     // (...) calculate difference images: subtract first frame from all others
  });

  Parallel.For(0U, frames, frame =>
  {
    // determine flashpoint: find frame with maximum average pixel value
  });
  // (...) remove frames preceeding flashpoint, including itself, from film

  Parallel.For(0U, frames, frame =>
  {
    // calculate System.Math.Log10 of all pixels and frame indexes
  });

  // perform polynomial fit for each pixel
  Parallel.For(0U, height, row =>
  {
    for (var col = 0U; col < width; ++col)
    {
      // extract poly fit input y-values for current pixel
      double[] pixelValues = new double[frames];
      for (var frame = 0U; frame < frames; ++frame)
        pixelValues[frame] = localThermalFilm[(int)frame].Data[row, col];

      // (...) do some value validations

      // poly fit for current pixel - this is the longest step
      double[] coefficients = Math.PolynomialRegression(frameIndexesValidated.ToArray(), pixelValuesValidated.ToArray(), polyOrder);

      // insert into coefficient images result array
      for (var coefficient = 0U; coefficient < result.GetLength(0); ++coefficient)
        result[coefficient, row, col] = coefficients[coefficient];
    }
  });

  return result;
}

As you can see, several parallelized loops performing several operations on the frames are executed in sequence with the polynomial fit (Math.PolynomialRegression) being the last and most expensive one. This is a function containing a polynomial fit algorithm I pieced together myself since it doesn't exist in the standard System.Math library and the only other one I tried from the Math.NET library actually runs slower than the one I wrote. My code is based on examples given on Rosetta Code: https://rosettacode.org/wiki/Polynomial_regression

我的观点是,我之前在非托管C ++中编写了整个算法,由于当时使用的GUI框架存在许可问题,我们公司决定放弃此算法,而改为使用C#/。NET。最近,将旧的非托管C ++代码与我上面在托管C#中发布的代码进行了直接比较,结果表明,即使算法相同,C#代码的执行时间也比C ++代码长48%(!!!)。我知道C#是一种更高级别的托管语言,因此比C ++具有更长的翻译距离,因此我完全希望它的运行速度较慢,但​​我没想到它会造成这种情况。 48%是很大的一笔钱,这使我相信我做错了什么。另一方面,我还没有那么多经验,所以如果我很诚实,我也不是很清楚在这种情况下会发生什么。

到目前为止我尝试过的是:

  • 在依次运行和并行化各个循环之间进行切换,这是最快的,所有循环都如上所述并行化
  • 调整由各个并行循环实例访问的变量(例如,每次访问同一分辨率对象时均不访问,而是在开始循环之前声明宽度和高度的单独变量),这已经使性能有了很大提高,但仍有48%之后
  • 尝试使用Parallel.ForEach(Partitioner.Create(0,frames)...)方法,即使用Partitioner类对数据块进行更粗略的分区,这无济于事,使代码运行速度更慢
  • 尽我所能优化其他调用的函数以及该调用方的代码

提出一个问题:是否有可能使像这样的C#代码以比C ++中的相同代码具有可比的性能运行?还是我观察到的完全正常并且必须处理?

评论