在调试时偶然发现了这种奇怪的情况。更新从Pandas Dataframe列派生的numpy数组也意外地修改了Dataframe的值,尽管在更新中从未引用过它,仅提及了numpy数组。这怎么可能?
在这里输入代码
import numpy as np
import pandas as pd
df1 = pd.DataFrame(columns=["A"],data=[1,2,3,4,5,6,7,8,9,10])
xarray = df1.iloc[:,0].values #put df1 values into an np array
for i in range (0,len(xarray)): #change some of the np array values
if xarray[i] >5:
xarray[i] = 0
df1.head(10) #but why are the dateframe values also getting updated ?? df1 rows with values>5 also get zero'd
A
0 1
1 2
2 3
3 4
4 5
5 0
6 0
7 0
8 0
9 0
pandas.DataFrame.values
returns a view of the data (rather than a copy) if the columns are all of the same type. Since you only have one column, you actually have a reference to the data, so modifying it will modify the source dataframe.To ensure you have a copy, use the copy argument of
pd.DataFrame.to_numpy
, e.g.df.to_numpy(copy=True)
.您需要复制数据;