合并两行数据框

让我们说我们有以下数据框:

df = pd.DataFrame(
    data={
        'from': [103, 444, 104, 999230],
        'to': [104, 999230, 103, 444],
        'id': [1] * 4,
        'p': [415, 1203.11, -414.35, -1197.37],
        'q': [0, -395.44, 62.23, 489.83]
    })

要么

     from      to  id        p       q
0     103     104   1   415.00    0.00
1     444  999230   1  1203.11 -395.44
2     104     103   1  -414.35   62.23
3  999230     444   1 -1197.37  489.83

The goal is to combine the rows that have the same from and to values. In the example above, rows 0 and 2, and rows 1 and 3, needs to be combined.

输出应如下所示:

   from      to  id        p       q       p1      q1
0   103     104   1   415.00    0.00  -414.35   62.23
1   444  999230   1  1203.11 -395.44 -1197.37  489.83

当然,以下内容也是可以接受的:

     from   to  id        p       q       p1      q1
0     104  103   1  -414.35   62.23   415.00    0.00
1  999230  444   1 -1197.37  489.83  1203.11 -395.44

任何帮助表示赞赏:)

评论
黑暗的冰
黑暗的冰

First sorting both columns from and to by numpy.sort, then create counter Series by GroupBy.cumcount, reshape by DataFrame.set_index and DataFrame.unstack with sorting second level by DataFrame.sort_index, last flatten MultiIndex with f-strings and convert Multiindex in index to columns by DataFrame.reset_index:

df[['from','to']] = np.sort(df[['from','to']], axis=1)
g = df.groupby(['from','to']).cumcount()

df = df.set_index(['from','to','id', g]).unstack().sort_index(level=1, axis=1)
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.reset_index()
print(df)
   from      to  id       p0      q0       p1      q1
0   103     104   1   415.00    0.00  -414.35   62.23
1   444  999230   1  1203.11 -395.44 -1197.37  489.83
点赞
评论