Python Pandas Groupby按类别占总数的百分比

我有下表:

+-----+----------+---+
| Grp | Category | X |
+-----+----------+---+
|   1 | A        | 1 |
|   1 | B        | 3 |
|   1 | B        | 2 |
|   1 | C        | 2 |
|   2 | A        | 2 |
|   2 | A        | 4 |
|   2 | B        | 4 |
|   3 | A        | 3 |
|   3 | C        | 7 |
+-----+----------+---+

并尝试获得以下信息:

+-----+----------+---------+
| Grp | Category | X_ratio |
+-----+----------+---------+
|   1 | A        | 1/8     |
|   1 | B        | 5/8     |
|   1 | C        | 2/8     |
|   2 | A        | 6/10    |
|   2 | B        | 4/10    |
|   3 | A        | 3/10    |
|   3 | C        | 7/10    |
+-----+----------+---------+

还有一点卡住。 库德有人建议有效的解决方案?

评论
  • det
    det 回复

    Because performance is important first aggregate sum to MultiIndex Series and then divide by Series.div summed values per first Grp level:

    s = df.groupby(['Grp','Category'])['X'].sum()
    df = s.div(s.sum(level=0), level=0).reset_index(name='X_ratio')
    print (df)
       Grp Category  X_ratio
    0    1        A    0.125
    1    1        B    0.625
    2    1        C    0.250
    3    2        A    0.600
    4    2        B    0.400
    5    3        A    0.300
    6    3        C    0.700
    

    较慢的选择:

    df = (df.groupby(['Grp','Category'])['X'].sum()
            .groupby(level=0)
            .apply(lambda x: x / x.sum())
            .reset_index(name='X_ratio'))
    print (df)
       Grp Category  X_ratio
    0    1        A    0.125
    1    1        B    0.625
    2    1        C    0.250
    3    2        A    0.600
    4    2        B    0.400
    5    3        A    0.300
    6    3        C    0.700