如何使用numpy计算df.Series和df.Series.shift(1)之间的相关系数?

I'm dealing with TSA, and need to know the corrcoef between df.Series and df.Series.shift(1) . df.corr() is helpful as showed below:

(1) df.DataFrame.corr()

df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv',
                 index_col=0, parse_dates=True)
values = pd.DataFrame(df.values)
dataframe = pd.concat([values.shift(1), values], axis=1)
dataframe.columns = ['col1', 'col2']

print(dataframe.corr())
"""
         col1     col2
col1  1.00000  0.77487
col2  0.77487  1.00000
"""

The questions is i don't know how to do it with numpy.corrcoef or scipy.stats.stats.pearsonr, thx in advance for any help!

(2) numpy and scipy.stats.stats.pearsonr is applied this way

a = dataframe['col1']
b = dataframe['col2']
print(np.corrcoef(a, b))
"""
[[nan nan]
 [nan  1.]]
"""

print(scipy.stats.stats.pearsonr(a, b))
"""
ValueError: array must not contain infs or NaNs
"""
评论
  • mut
    mut 回复

    The gist of the issue is that DataFrame.corr automatically excluded N/A values for you while numpy and scipy offer no such luxury. The first value in col2 in N/A because it was created from a shift.

    排除第一个值,您就可以进行以下操作:

    >>> a = df.iloc[1:, 0]
    >>> b = df.iloc[1:, 1]
    
    >>> np.corrcoef(a,b)
    array([[1.        , 0.77487022],
           [0.77487022, 1.        ]])
    
    >> scipy.stats.stats.pearsonr(a,b)
    (0.7748702165384456, 0.0)