pandas数据框:重新采样以包含今天至今的所有日期

I am really struggling with something that sounded trivial when I first looked into it. I have asked a similar question here problem with pandas efficiency when working with dates where I was more wondering about the efficiency about one piece of code. I figure I could make a new post as I think that finding out a way to modify a dataframe as such could be helpful to more people than just me.

我需要修改一个数据框,以使其包含自当前日期起的所有日期。

例:

       Id     Supplier  Avg_NetAmountSpent        Date  Quantity  NetAmount
0  206433     BESNOWED              6593.0  2020-05-08        91      10181
1  206433     BESNOWED              6593.0  2020-05-06      1076       6069
2  206434  LENTIVECTOR              7335.0  2020-05-08        91      10181
3  206434  LENTIVECTOR              7335.0  2020-05-06      1076       6069

我想更改数据框,使其看起来像这样:

       Id     Supplier  Avg_NetAmountSpent        Date  Quantity  NetAmount
0  206433     BESNOWED              6593.0  2020-05-06        91      10181
1  206433     BESNOWED                   0  2020-05-07         0          0
2  206433     BESNOWED              6593.0  2020-05-08        91      10181
3  206433     BESNOWED                   0  2020-05-09         0          0
4  206433     BESNOWED                   0  2020-05-10         0          0
5  206433     BESNOWED                   0  2020-05-11         0          0
6  206434  LENTIVECTOR              7335.0  2020-05-06      1076       6069
7  206434  LENTIVECTOR                   0  2020-05-07         0          0
8  206434  LENTIVECTOR              7335.0  2020-05-08        91      10181
9  206434  LENTIVECTOR                   0  2020-05-09         0          0
10  206434  LENTIVECTOR                  0  2020-05-10         0          0
11  206434  LENTIVECTOR                  0  2020-05-11         0          0

我已经尝试过这样做,但这效率很低,采用上面的格式会更容易:

today1 = pd.to_datetime('today').normalize()
frequency1 = '1D'
Nbin1 = (today1 - data_initial['Date'].min()) // pd.Timedelta(frequency1) + 1# Number of bins
bins1 = [today1 - n * pd.Timedelta(frequency1) for n in range(Nbin1, -1, -1)]

data11 = data_initial.groupby(['Id', pd.cut(data_initial['Date'], bins=bins1)]).sum().reset_index()

我想知道是否有一种有效的方法来实现这一目标?

评论