在熊猫上使用正则表达式以将值添加到空列

您好,我有一个数据框,例如:

DF1

    COL1  COL2                          COL3  COL4  
1   G1    SEQ1_10-67_-__Canis_lupus     A     B 
2   G1    SEQ4.1_90-345_-__Elpah_bis    C     D 
3   G1    SEQA.A-2-BICs_-__Felis_cattus E     F 
4   G1    SEQA.A_10-30_-__Felis_cattus          
5   G1    SEQA.A_34-50_-__Felis_cattus          
6   G2    SEQA.A_60-79_+__Felis_cattus  K     L 
7   G2    SEQA.A_34-50_-__Felis_cattus  M     N 
8   G2    SEQ3_10-67_-__Lupus_lupus     O     P 

and the idea is to look for within each Groups for COL2 valuesthat have a -BICs_pattern. Then here for exemple I get in line 3 : SEQA.A-2-BICs_-__Felis_cattus

then from this one I look the part before the first - : SEQA.A and the sign can be _-__ or a _+__ here in the exemple it is : _-__

then I look into the same groups if I have other values with SEQA.A and _-__ pattern and that have empty cell for the COL3 and COL4

在这里,我有2名候选人:

  1. line 4 SEQA.A_10-30_-__Felis_cattus

  2. line 5 SEQA.A_34-50_-__Felis_cattus

then I assign the COL3 COL4 values of the line3 to the line4 and 5 and get

    COL1  COL2                          COL3  COL4  
1   G1    SEQ1_10-67_-__Canis_lupus     A     B 
2   G1    SEQ4.1_90-345_-__Elpah_bis    C     D 
3   G1    SEQA.A-2-BICs_-__Felis_cattus E     F 
4   G1    SEQA.A_10-30_-__Felis_cattus  E     F 
5   G1    SEQA.A_34-50_-__Felis_cattus  E     F 
6   G2    SEQA.A_60-79_+__Felis_cattus  K     L 
7   G2    SEQA.A_34-50_-__Felis_cattus  M     N 
8   G2    SEQ3_10-67_-__Lupus_lupus     O     P 

I then remove the line3 and add a new column called BICs at line 4 and 5

    COL1  COL2                          COL3  COL4  BICs
1   G1    SEQ1_10-67_-__Canis_lupus     A     B 
2   G1    SEQ4.1_90-345_-__Elpah_bis    C     D 
4   G1    SEQA.A_10-30_-__Felis_cattus  E     F     BIC
5   G1    SEQA.A_34-50_-__Felis_cattus  E     F     BIC
6   G2    SEQA.A_60-79_+__Felis_cattus  K     L 
7   G2    SEQA.A_34-50_-__Felis_cattus  M     N 
8   G2    SEQ3_10-67_-__Lupus_lupus     O     P 

有人有想法要这样做吗?非常感谢你

评论