朱莉娅:成对距离的嵌套循环真的很慢

I have some code which loads a csv file of 2000 2D coordinates, then a function called collision_count counts the number of pairs of coordinates that are closer than a distance d of each other:

using BenchmarkTools
using CSV
using LinearAlgebra

function load_csv()::Array{Float64,2}
    df = CSV.read("pos.csv", header=0)
    return Matrix(df)'
end

function collision_count(pos::Array{Float64,2}, d::Float64)::Int64
    count::Int64 = 0
    N::Int64 = size(pos, 2)
    for i in 1:N
        for j in (i+1):N
            @views dist = norm(pos[:,i] - pos[:,j])
            count += dist < d
        end
    end
    return count
end

结果如下:

pos = load_csv()

@benchmark collision_count($pos, 2.0)
BenchmarkTools.Trial: 
  memory estimate:  366.03 MiB
  allocs estimate:  5997000
  --------------
  minimum time:     152.070 ms (18.80% GC)
  median time:      158.915 ms (20.60% GC)
  mean time:        158.751 ms (20.61% GC)
  maximum time:     181.726 ms (21.98% GC)
  --------------
  samples:          32
  evals/sample:     1

这比此Python代码慢30倍:

import numpy as np
import scipy.spatial.distance

pos = np.loadtxt('pos.csv',delimiter=',')

def collision_count(pos, d):
    pdist = scipy.spatial.distance.pdist(pos)
    return np.count_nonzero(pdist < d)

%timeit collision_count(pos, 2)

5.41 ms ± 63 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

有什么方法可以使其更快?那所有的分配又如何呢?