在熊猫数据框的平面文件中插入缺少的日期

我有一个数据框,缺少日期。是否有可能在第二列和三月的第一列中使用所有日期来更新此数据框。我希望缺失日期的价格为0,并且产品为铅笔或钢笔,以形成完整的清单。数据框的顺序不一定总是正确的,因此不应使用填充。

               Price   Product
1/Feb/2020      4400     Pencils
2/Feb/2020      0        Pencils
3/Feb/2020      0        Pencils
27/Feb/2020     0        Pencils
28/Feb/2020     0        Pencils
1/Mar/2020      55000    Pencils
2/Mar/2020       0       Pencils
3/Mar/2020       0       Pencils
30/Mar/2020      0       Pencils
31/Mar/2020      0       Pencils
1/Feb/2020      4400     Pens
2/Feb/2020      4454     Pens
3/Feb/2020      0        Pens
27/Feb/2020     34534    Pens
28/Feb/2020     345345   Pens
1/Mar/2020      550345   Pens
2/Mar/2020       354     Pens
3/Mar/2020       454     Pens
30/Mar/2020      454     Pens
31/Mar/2020     4545     Pens
评论
  • Rabbit_19C
    Rabbit_19C 回复

    一个想法是否总是存在每月的第一个值和最后一个值:

    df.index = pd.to_datetime(df.index)
    
    df = (df.set_index('Product', append=True)
            .unstack()
            .asfreq('d')
            .stack(dropna=False)
            .sort_index(level=[1,0])
            .reset_index(level=1))
    print (df)
                Product   Price
    2020-02-01  Pencils  4400.0
    2020-02-02  Pencils     0.0
    2020-02-03  Pencils     0.0
    2020-02-04  Pencils     NaN
    2020-02-05  Pencils     NaN
                ...     ...
    2020-03-27     Pens     NaN
    2020-03-28     Pens     NaN
    2020-03-29     Pens     NaN
    2020-03-30     Pens   454.0
    2020-03-31     Pens  4545.0
    
    [120 rows x 2 columns]
    

    通用解决方案:

    df.index = pd.to_datetime(df.index)
    
    s = df.index.min().to_period('m').to_timestamp()
    e = df.index.max().to_period('m').to_timestamp(how='e').floor('d')
    
    df = (df.set_index('Product', append=True)
            .unstack()
            .reindex(pd.date_range(s, e))
            .stack(dropna=False)
            .sort_index(level=[1,0])
            .reset_index(level=1))