在Python中用NaN替换一列中的多个字符

I want to replace the position words from strings column: if they are either present sole or in multiple but join with , and space.

    id                         strings
0    1                           south
1    2                           north
2    3                            east
3    4                            west
4    5               west, east, south
5    6                      west, west
6    7                    north, north
7    8                    north, south
8    9  West Corporation global office
9   10                     West-Riding
10  11      University of West Florida
11  12                       Southwest

我的预期结果将是这样。请注意,如果它们是短语或单词的组成部分,那么我不需要替换它们。

有可能这样做吗?谢谢。

    id                         strings
0    1                             NaN
1    2                             NaN
2    3                             NaN
3    4                             NaN
4    5                             NaN
5    6                             NaN
6    7                             NaN
7    8                             NaN
8    9  West Corporation global office
9   10                     West-Riding
10  11      University of West Florida
11  12                       Southwest

以下代码有效,但我只是想知道是否还有一些更简洁的方法?

df['strings'].astype(str).replace('south', np.nan).replace('north', np.nan)\
.replace('west', np.nan).replace('east', np.nan).replace('west, east', np.nan)\
.replace('west, west', np.nan).replace('north, north', np.nan).replace('west, east', np.nan)\
.replace('north, south', np.nan)
评论
  • Aries
    Aries 回复

    First use Series.str.split, forward filling for replace missing values, test if all matched values by DataFrame.isin and DataFrame.all for mask and last set missing values by Series.mask:

    L = ['south','north','east','west']
    m = df['strings'].str.split(', ', expand=True).ffill(axis=1).isin(L).all(axis=1)
    
    df['strings'] = df['strings'].mask(m)
    print (df)
        id                         strings
    0    1                             NaN
    1    2                             NaN
    2    3                             NaN
    3    4                             NaN
    4    5                             NaN
    5    6                             NaN
    6    7                             NaN
    7    8                             NaN
    8    9  West Corporation global office
    9   10                     West-Riding
    10  11      University of West Florida
    11  12                       Southwest
    

    Another idea with sets, isdisjoint and Series.where:

    m = [set(x.split(', ')).isdisjoint(L) for x in df['strings']]
    df['strings'] = df['strings'].where(m)
    print (df)
        id                         strings
    0    1                             NaN
    1    2                             NaN
    2    3                             NaN
    3    4                             NaN
    4    5                             NaN
    5    6                             NaN
    6    7                             NaN
    7    8                             NaN
    8    9  West Corporation global office
    9   10                     West-Riding
    10  11      University of West Florida
    11  12                       Southwest