您好,我有一个数据框,例如:
DF1
COL1 COL2 COL3 COL4
1 G1 SEQ1_10-67_-__Canis_lupus A B
2 G1 SEQ4.1_90-345_-__Elpah_bis C D
3 G1 SEQA.A-2-BICs_-__Felis_cattus E F
4 G1 SEQA.A_10-30_-__Felis_cattus
5 G1 SEQA.A_34-50_-__Felis_cattus
6 G2 SEQA.A_60-79_+__Felis_cattus K L
7 G2 SEQA.A_34-50_-__Felis_cattus M N
8 G2 SEQ3_10-67_-__Lupus_lupus O P
and the idea is to look for within each Groups
for COL2 values
that have a -BICs_
pattern.
Then here for exemple I get in line 3 : SEQA.A-2-BICs_-__Felis_cattus
then from this one I look the part before the first -
: SEQA.A
and the sign can be _-__
or a _+__
here in the exemple it is : _-__
then I look into the same groups if I have other values with SEQA.A
and _-__
pattern and that have empty cell for the COL3
and COL4
在这里,我有2名候选人:
line 4
SEQA.A_10-30_-__Felis_cattus
line 5
SEQA.A_34-50_-__Felis_cattus
then I assign the COL3 COL4
values of the line3
to the line4
and 5
and get
COL1 COL2 COL3 COL4
1 G1 SEQ1_10-67_-__Canis_lupus A B
2 G1 SEQ4.1_90-345_-__Elpah_bis C D
3 G1 SEQA.A-2-BICs_-__Felis_cattus E F
4 G1 SEQA.A_10-30_-__Felis_cattus E F
5 G1 SEQA.A_34-50_-__Felis_cattus E F
6 G2 SEQA.A_60-79_+__Felis_cattus K L
7 G2 SEQA.A_34-50_-__Felis_cattus M N
8 G2 SEQ3_10-67_-__Lupus_lupus O P
I then remove the line3
and add a new column called BICs
at line 4 and 5
COL1 COL2 COL3 COL4 BICs
1 G1 SEQ1_10-67_-__Canis_lupus A B
2 G1 SEQ4.1_90-345_-__Elpah_bis C D
4 G1 SEQA.A_10-30_-__Felis_cattus E F BIC
5 G1 SEQA.A_34-50_-__Felis_cattus E F BIC
6 G2 SEQA.A_60-79_+__Felis_cattus K L
7 G2 SEQA.A_34-50_-__Felis_cattus M N
8 G2 SEQ3_10-67_-__Lupus_lupus O P
有人有想法要这样做吗?非常感谢你