熊猫:基于另一个数据帧中每个数据帧的最后一行的条件过滤

I have a dataframe df as follows

Date          Group   Value   Duration
2018-01-01      A      20       30
2018-02-01      A      10       60
2018-03-01      A      25       88    <-----Last row for Group A
2018-01-01      B      15      180
2018-02-01      B      30      210
2018-03-01      B      25      238    <-----Last row of Group B

I want to drop the Group A because it hax max duration less than 90. Or in other words, considering the last row of each Group, if the Duration value is less than 90, we omit that group. So my resultant data frame should look like

    Date       Group   Value   Duration
 2018-01-01      B      15      180
 2018-02-01      B      30      210
 2018-03-01      B      25      240

为此,我的方法如下:

df_f = []
for k,v in df.groupby(['Group']):
    v_f = v[max(v['Duration'])>=90]
    df_f.append(v_f)

The above code snippet is throwing an error as KeyError: False

我在这里错过了什么吗?

评论
  • ket
    ket 回复

    您也可以使用过滤器。

    df.groupby('Group').filter(lambda x: x.Duration.max()>=90)
    
        Date        Group   Value   Duration
    3   2018-01-01  B       15      180
    4   2018-02-01  B       30      210
    5   2018-03-01  B       25      238
    

  • Out
    Out 回复

    You can test if maximal value per groups is hogher of equal like 90 in GroupBy.transform and then filter by boolean indexing:

    df = df[df.groupby('Group')['Duration'].transform('max') >= 90]
    #alternative
    #df = df[df.groupby('Group')['Duration'].transform('max').ge(90)]
    print (df)
             Date Group  Value  Duration
    3  2018-01-01     B     15       180
    4  2018-02-01     B     30       210
    5  2018-03-01     B     25       238