均值分为两列,窗口时间为3个月,NaN时间少于3个月

I have to apply the mean calculation in this dataset by customer, account but this mean needs to be applied to each 3 months in these groups. For the customer A1200 that don't has 3 months, the result need to be NaN.

customer    account    month        invoice
C1000       A1100      2019-10-01   34000
                       2019-11-01   55000
                       2019-12-01   80000
            A1200      2019-10-01   90000
                       2019-11-01   55000
            A1300      2019-10-01   10000
                       2019-11-01   10000
                       2019-12-01   20000
C2000       A2100      2019-10-01   78000
                       2019-11-01   55000
                       2019-12-01   80000

我尝试使用此命令,但平均值看起来不正确。

df_3m.groupby(['customer','account']).mean()

Are there some ideias in pandas or pyspark?

评论
  • kautem
    kautem 回复

    数据

    +----------+---------+----------+----------+
    | customer | account |  month   |  invoice |
    +----------+---------+----------+----------+
    | C1000    | A1100   | 01-10-19 |    34000 |
    | C1000    | A1100   | 01-11-19 |    55000 |
    | C1000    | A1100   | 01-12-19 |    80000 |
    | C1000    | A1200   | 01-10-19 |    90000 |
    | C1000    | A1200   | 01-11-19 |    55000 |
    | C1000    | A1300   | 01-10-19 |    10000 |
    | C1000    | A1300   | 01-11-19 |    10000 |
    | C1000    | A1300   | 01-12-19 |    20000 |
    | C2000    | A2100   | 01-10-19 |    78000 |
    | C2000    | A2100   | 01-11-19 |    55000 |
    | C2000    | A2100   | 01-12-19 |    80000 |
    +----------+---------+----------+----------+
    

    您的查询

    res = df_3m.groupby(['customer','account']).mean()
    

    Query to filter accounts with less than 3 months

    lt_3 = df.groupby(['account']).count() >2
    

    最后结果

    res [lt_3]
    

    结果

    +----------+---------+--------------+
    | customer | account |   invoice    |
    +----------+---------+--------------+
    | C1000    | A1100   | 56333.333333 |
    |          | A1200   | NaN          |
    |          | A1300   | 13333.333333 |
    | C2000    | A2100   | 71000.000000 |
    +----------+---------+--------------+