更新从Pandas DataFrame列派生的numpy数组还可能(意外地)更新data frame列

在调试时偶然发现了这种奇怪的情况。更新从Pandas Dataframe列派生的numpy数组也意外地修改了Dataframe的值,尽管在更新中从未引用过它,仅提及了numpy数组。这怎么可能?

在这里输入代码

    import numpy as np

    import pandas as pd

    df1 = pd.DataFrame(columns=["A"],data=[1,2,3,4,5,6,7,8,9,10])

    xarray = df1.iloc[:,0].values #put df1 values into an np array

    for i in range (0,len(xarray)): #change some of the np array values

        if xarray[i] >5:

          xarray[i] = 0

    df1.head(10) #but why are the dateframe values also getting updated ?? df1 rows with values>5 also get zero'd


A

0 1

1 2

2 3

3 4

4 5

5 0

6 0

7 0

8 0

9 0

评论
、哎喲喂
、哎喲喂

pandas.DataFrame.values returns a view of the data (rather than a copy) if the columns are all of the same type. Since you only have one column, you actually have a reference to the data, so modifying it will modify the source dataframe.

To ensure you have a copy, use the copy argument of pd.DataFrame.to_numpy, e.g. df.to_numpy(copy=True).

点赞
评论
波哩瓶
波哩瓶

您需要复制数据;

xarray = df1.iloc[:,0].copy() 
点赞
评论