I'm dealing with TSA, and need to know the corrcoef between df.Series
and df.Series.shift(1)
. df.corr()
is helpful as showed below:
(1) df.DataFrame.corr()
df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv',
index_col=0, parse_dates=True)
values = pd.DataFrame(df.values)
dataframe = pd.concat([values.shift(1), values], axis=1)
dataframe.columns = ['col1', 'col2']
print(dataframe.corr())
"""
col1 col2
col1 1.00000 0.77487
col2 0.77487 1.00000
"""
The questions is i don't know how to do it with numpy.corrcoef
or scipy.stats.stats.pearsonr
, thx in advance for any help!
(2) numpy
and scipy.stats.stats.pearsonr
is applied this way
a = dataframe['col1']
b = dataframe['col2']
print(np.corrcoef(a, b))
"""
[[nan nan]
[nan 1.]]
"""
print(scipy.stats.stats.pearsonr(a, b))
"""
ValueError: array must not contain infs or NaNs
"""
The gist of the issue is that
DataFrame.corr
automatically excluded N/A values for you while numpy and scipy offer no such luxury. The first value incol2
in N/A because it was created from ashift
.排除第一个值,您就可以进行以下操作: