熊猫:过滤器或分组依据,然后变换以选择最后一行

This post has a reference to one of my post in SO.

Just to reiterate, I have a dataframe df as

   Date        Group   Value   Duration
 2018-01-01      A      20       30
 2018-02-01      A      10       60
 2018-03-01      A      25       88    <-----Last row for Group A
 2018-01-01      B      15      180
 2018-02-01      B      30      210
 2018-03-01      B      25      238    <-----Last row of Group B

Considering the last row of each Group, if the Duration value is less than 90, we omit that group. So my resultant data frame df_final should look like

    Date       Group   Value   Duration
 2018-01-01      B      15      180
 2018-02-01      B      30      210
 2018-03-01      B      25      240

我们有两种方法可以解决此问题。

First is filter method:

df.groupby('Group').filter(lambda x: x.Duration.max()>=90) 

Second is groupby.transform method:

df = df[df.groupby('Group')['Duration'].transform('last') >= 90] 

But I want to filter this by the Date column and NOT by Duration. I am getting the correct result by the following code:

df_interim = df.loc[(df['Date']=='2019-03-01')&(df['Duration'] >=90)]
df_final = df.merge(df_interim[['Group','Date']],on='Group',how='right').reset_index()

In the above code, I have hard coded the Date.

My question is : How can I dynamically select the last date in the data frame? And then perform the filter or groupby.transform on Group?

有什么线索吗?

评论
  • zvelit
    zvelit 回复

    We can select the last date by use transform as well

    lastd=df.groupby('Date')['Duration'].transform('max') 
    df_interim = df.loc[(df['Date']==lastd)&(df['Duration'] >=90)]