如何在对象的DataFrame列类型中查找唯一值?

input_data = [np.array(['chicken', 'creamofchickensoup'], dtype=object), np.array(['stickscelery', 'Chicken', 'CreamofChickensoup', 'babycarrots', 'pepper'], dtype=object), np.array(['chicken', 'creamofchickensoup'], dtype=object)]

I want to find the unique values for this column 'input_data'. If pandas.unique() is used, I received this TypeError: unhashable type: 'numpy.ndarray'

When pd.Series(input_data).value_counts() used, the result is

>>> pd.Series(input_data).value_counts()
[chicken, creamofchickensoup]                                       1
[stickscelery, Chicken, CreamofChickensoup, babycarrots, pepper]    1
[chicken, creamofchickensoup]                                       1
dtype: int64

The expected value for [chicken, creamofchickensoup] should be 2.

如何为列dtype is object找到唯一值?谢谢。

评论
  • 菊花怒放
    菊花怒放 回复

    You can use Series.explode to transform each element of a list-like to a row and then find unique using Series.unique

    >>> pd.Series(input_data).explode().unique()
    array(['chicken', 'creamofchickensoup', 'stickscelery', 'Chicken',
           'CreamofChickensoup'], dtype=object)
    
  • 浅浅夏
    浅浅夏 回复

    You can chain the inner arrays with itertools.chain and take a set:

    from itertools import chain
    
    set(chain.from_iterable(input_data))
    
    {'Chicken',
     'CreamofChickensoup',
     'chicken',
     'creamofchickensoup',
     'stickscelery'}
    

    To obtain the value_counts, you need to explode the column:

    pd.Series(input_data).explode().value_counts()
    
    creamofchickensoup    1
    Chicken               1
    chicken               1
    CreamofChickensoup    1
    stickscelery          1
    dtype: int64