熊猫数据框结合了唯一的行值

我有一个类似以下的数据框,具有超过90000行。

origin      destination people
101011001   101011001   7378
101011001   101011002   120
101011001   101011002   8
101011001   101011002   285
101011001   101011003   7
101011001   101011004   0
101011001   101011004   1
101011001   101011004   2
101011001   101011004   9
101011002   101011001   5

As you can see, some origin and destination values repeat for example there are multiple rows where origin=101011001, destination=101011002. My goal is to group the repeating origin and destination values and sum the the people column, so the dataframe looks like this:

origin      destination people
101011001   101011001   7378
101011001   101011002   413
101011001   101011003   7
101011001   101011004   12
101011002   101011001   5

I've tried jsondf.groupby(['origin', 'destination']).sum() which gives me the correct sum and destination values but it's not quite what I want as I want the origin values to also be shown in the row for each destination.

Note My end goal is to get this dataframe into a SQL database as a table, and with the .groupby() code above, the origin and destination values are actually interpreted as NULL which is not what I want.

谢谢!

评论
  • BOX07
    BOX07 回复

    一种快速简单的方法来显示每个原始值将是在使用groupby之后简单地重置索引。这是一个示例,显示重置索引前后数据库的外观:

    df.groupby(['origin', 'destination']).sum()
    
    origin      destination  people
    101011001   101011001    7378
                101011002    413
                101011003    7
                101011004    12
    101011002   101011001    5
    

    添加reset_index()后,数据框将在每一行中表示每个原始值。

        df.groupby(['origin', 'destination']).sum().reset_index()
    
        origin      destination people
    0   101011001   101011001   7378
    1   101011001   101011002   413
    2   101011001   101011003   7
    3   101011001   101011004   12
    4   101011002   101011001   5
    

    这应该允许您发送到sql数据库,而无需将原始解释为空值。