如何将重复的列表元素解析为数据帧

该示例将是一个名为“ main”的列表,如下所示

main =  [('date', '2020-04-21'),  ('oldname', 'Tap'),  ('newname', 'Tapnew'),  ('icon_url',   '3'),  ('date', '2020-04-21'),  ('oldname', 'Nod'),  ('newname', 'Nodnew'),  ('icon_url','4'),  ('date', '2020-04-21'),  ('oldname', 'Mik'),  ('newname', 'Miknew'),  ('icon_url','5')]

我尝试使用此方法直接解析和转换。

df = pd.DataFrame(main)
test = df.T
test.columns = test.iloc[0]
a = test.drop(test.index[0])

但是结果数据帧仍然是长稀疏的形式,具有重复的列

 date      oldname     newname    icon_url     date      oldname     newname    icon_url    date      oldname     newname    icon_url 
2020-04-21    Tap      Tapnew        3       2020-04-21      Nod     Nodnew       4      2020-04-21       Mik     Miknew      5  

所需的输出将是

 date      oldname     newname    icon_url     
2020-04-21    Tap     Tapnew        3     
2020-04-21    Nod     Nodnew        4      
2020-04-21    Mik     Miknew        5  

我整天都在挣扎---有人能对此有所启示吗?提前致谢。

评论
  • 呵呵
    呵呵 回复

    From df = pd.DataFrame(main) it's just pivot on two columns:

    (pd.DataFrame(main, columns=['col','val'])
       .assign(idx=lambda x: x.groupby('col').cumcount())
       .pivot('idx','col','val')
    )
    

    输出:

    col        date icon_url newname oldname
    idx                                     
    0    2020-04-21        3  Tapnew     Tap
    1    2020-04-21        4  Nodnew     Nod
    2    2020-04-21        5  Miknew     Mik
    
  • Giles
    Giles 回复

    Read in the DataFrame as you have. Then create an index for groups of data by checking where the word is 'date' and taking the cumsum. At this point we just pivot

    df = pd.DataFrame(main)
    df['index'] = df[0].eq('date').cumsum()
    df = df.pivot(index='index', columns=0, values=1).rename_axis(None, axis=1)
    
                 date icon_url newname oldname
    index                                     
    1      2020-04-21        3  Tapnew     Tap
    2      2020-04-21        4  Nodnew     Nod
    3      2020-04-21        5  Miknew     Mik