熊猫:过滤器或分组依据,然后变换以选择最后一行

This post has a reference to one of my post in SO.

Just to reiterate, I have a dataframe df as

   Date        Group   Value   Duration
 2018-01-01      A      20       30
 2018-02-01      A      10       60
 2018-03-01      A      25       88    <-----Last row for Group A
 2018-01-01      B      15      180
 2018-02-01      B      30      210
 2018-03-01      B      25      238    <-----Last row of Group B

Considering the last row of each Group, if the Duration value is less than 90, we omit that group. So my resultant data frame df_final should look like

    Date       Group   Value   Duration
 2018-01-01      B      15      180
 2018-02-01      B      30      210
 2018-03-01      B      25      240

我们有两种方法可以解决此问题。

First is filter method:

df.groupby('Group').filter(lambda x: x.Duration.max()>=90) 

Second is groupby.transform method:

df = df[df.groupby('Group')['Duration'].transform('last') >= 90] 

But I want to filter this by the Date column and NOT by Duration. I am getting the correct result by the following code:

df_interim = df.loc[(df['Date']=='2019-03-01')&(df['Duration'] >=90)]
df_final = df.merge(df_interim[['Group','Date']],on='Group',how='right').reset_index()

In the above code, I have hard coded the Date.

My question is : How can I dynamically select the last date in the data frame? And then perform the filter or groupby.transform on Group?

有什么线索吗?

评论
zvelit
zvelit

We can select the last date by use transform as well

lastd=df.groupby('Date')['Duration'].transform('max') 
df_interim = df.loc[(df['Date']==lastd)&(df['Duration'] >=90)]
点赞
评论