我开始通过一个小项目进入熊猫。现在我面临这个挑战,我不确定如何处理或在熊猫中是否有此功能。
我所拥有的都是从头到尾的盗窃案。开始和结束是日期时间,我可以以小时为单位计算持续时间。现在,我想计算每个小时的总平均值。
是否有人对可用方法有建议?
一个例子:
- These are 5 thefts with a duration of x hours
- for each hour in the timespan of the theft the 'weight' is 1/(duration in hours)
- A theft of 5 hours adds a weight of 1/5 or 0.2 to the average for the hours that it spans
- Thefts can overlap so for hour h2:
T2 adds 0.2 and T2 adds 0.33, the average for h2 is now 0.53
T1: theft 1, start= h0, end = h4, duration = 5 hours
T2: theft 2, start= h1, end = h3, duration = 3 hours
T3: theft 3, start= h4, end = h7, duration = 4 hours
T4: theft 4, start= h7, end = h9, duration = 3 hours
T1 T2/T4 T3 summed average
h0 .2 .2
h1 .2 .33 .53
h2 .2 .33 .53
h3 .2 .33 .53
h4 .2 .25 .45
h5 .25 .25
h6 . 25 .25
h7 .33 .25 .58
h8 .33 .33
h9 .33 .33
我的数据失窃案如下所示:
start end duration
0 2011-01-01 00:00:00 2011-01-01 02:00:00 2.0
1 2011-01-01 16:00:00 2011-01-01 19:00:00 3.0
2 2011-01-01 12:00:00 2011-01-03 13:00:00 49.0
3 2011-01-01 01:00:00 2011-01-01 04:00:00 3.0
我的目标是制作一个新的数据框架,其中包含每小时的平均时间 (以下平均值是虚构的)
datetime summed_average
0 2011-01-01 00:00:00 .23
1 2011-01-01 01:00:00 .78
2 2011-01-01 02:00:00 .78
3 2011-01-01 03:00:00 1.2
...
0 2011-01-01 20:00:00 2.2
1 2011-01-01 21:00:00 1.6
2 2011-01-01 22:00:00 .77
3 2011-01-01 23:00:00 .55