I have to apply the mean calculation in this dataset by customer, account but this mean needs to be applied to each 3 months in these groups. For the customer A1200 that don't has 3 months, the result need to be NaN
.
customer account month invoice
C1000 A1100 2019-10-01 34000
2019-11-01 55000
2019-12-01 80000
A1200 2019-10-01 90000
2019-11-01 55000
A1300 2019-10-01 10000
2019-11-01 10000
2019-12-01 20000
C2000 A2100 2019-10-01 78000
2019-11-01 55000
2019-12-01 80000
我尝试使用此命令,但平均值看起来不正确。
df_3m.groupby(['customer','account']).mean()
Are there some ideias in pandas
or pyspark
?
数据
您的查询
Query to filter accounts with
less than 3 months
最后结果
结果