重新采样数据集

I am working on the Metro Interstate Traffic Volume data set (available here: http://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume) but I can't resample the dataset to show the average traffic volume per day instead of showing it per hour.

metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
metro.set_index('date_time', inplace=True, drop=True)
metro.resample('1Y').mean()

这是我得到的:

                    holiday    temp  ...     weather_description  traffic_volume
date_time                            ...                                        
2012-10-02 09:00:00    None  288.28  ...        scattered clouds            5545
2012-10-02 10:00:00    None  289.36  ...           broken clouds            4516
2012-10-02 11:00:00    None  289.58  ...         overcast clouds            4767
2012-10-02 12:00:00    None  290.13  ...         overcast clouds            5026
2012-10-02 13:00:00    None  291.14  ...           broken clouds            4918
...                     ...     ...  ...                     ...             ...
2018-09-30 19:00:00    None  283.45  ...           broken clouds            3543
2018-09-30 20:00:00    None  282.76  ...         overcast clouds            2781
2018-09-30 21:00:00    None  282.73  ...  proximity thunderstorm            2159
2018-09-30 22:00:00    None  282.09  ...         overcast clouds            1450
2018-09-30 23:00:00    None  282.12  ...         overcast clouds             954

[48204 rows x 8 columns]

您对解决方法有任何想法吗?

评论
  • UFO
    UFO 回复

    您可以尝试通过创建年份列:

    metro['year'] =metro['date_time'].dt.year
    metro.groupby['year'].mean()
    
  • 糖长老
    糖长老 回复

    One year is not a fix time period: some years have 365 days, some have 366. You can use groupby:

    metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
    metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
    
    # extract the yer
    metro.groupby(metro['date_time'].dt.year).mean()
    

    输出:

                     temp   rain_1h   snow_1h  clouds_all  traffic_volume
    date_time                                                            
    2012       274.991782  0.000000  0.000000   65.295819     3207.802657
    2013       278.976352  0.161284  0.000000   52.560947     3286.762160
    2014       276.786438  0.243251  0.000000   49.070469     3250.938004
    2015       287.689574  0.339218  0.001795   40.988338     3242.900983
    2016       282.520790  1.192969  0.000308   48.628842     3169.441328
    2017       281.463309  0.000000  0.000000   50.005281     3340.703065
    2018       282.851502  0.121765  0.000000   45.567996     3260.112341