使用R以便用条件填充NA值

您好,我有一个数据框,例如

> tab
   COL1                          COL2 COL3 COL4
1    G1     SEQ1_10-67_-__Canis_lupus    A    B
2    G1    SEQ4.1_90-345_-__Elpah_bis    C    D
3    G1  SEQA.B_34-50_-__Felis_cattus <NA> <NA>
4    G1  SEQA.B_34-50_-__Felis_cattus <NA> <NA>
5    G1 SEQA.A-2-BICs_-__Felis_cattus    E    F
6    G1  SEQA.A_10-30_-__Felis_cattus <NA> <NA>
7    G1  SEQA.A_34-50_-__Felis_cattus <NA> <NA>
8    G1 SEQA.B-2-BICs_-__Felis_cattus    L    P
9    G2  SEQA.A_60-79_-__Felis_cattus    K    L
10   G2  SEQA.A_34-50_-__Felis_cattus    M    N
11   G2     SEQ3_10-67_-__Lupus_lupus    O    P

> dput(tab)
structure(list(COL1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L), .Label = c("G1", "G2"), class = "factor"), COL2 = structure(c(1L, 
3L, 8L, 8L, 7L, 4L, 5L, 9L, 6L, 5L, 2L), .Label = c("SEQ1_10-67_-__Canis_lupus", 
"SEQ3_10-67_-__Lupus_lupus", "SEQ4.1_90-345_-__Elpah_bis", "SEQA.A_10-30_-__Felis_cattus", 
"SEQA.A_34-50_-__Felis_cattus", "SEQA.A_60-79_-__Felis_cattus", 
"SEQA.A-2-BICs_-__Felis_cattus", "SEQA.B_34-50_-__Felis_cattus", 
"SEQA.B-2-BICs_-__Felis_cattus"), class = "factor"), COL3 = structure(c(1L, 
2L, NA, NA, 3L, NA, NA, 5L, 4L, 6L, 7L), .Label = c("A", "C", 
"E", "K", "L", "M", "O"), class = "factor"), COL4 = structure(c(1L, 
2L, NA, NA, 3L, NA, NA, 6L, 4L, 5L, 6L), .Label = c("B", "D", 
"F", "L", "N", "P"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))

and I would like to fill the NA values in COL3 and COL4

The idea is for each COL1 groups, if a COL2 value contains the -BICs- pattern, then I extract the part before the first-and look within the groups if otherCOL2 valueshave the same pattern before the first-`too.

Here in the exemple there is the pattern -BICsin line3 SEQA.A-2-BICs_-__Felis_cattus

the part before the first - is : SEQA.A

lines 4 and 5 have the same SEQA.A in their string:

  1. SEQA.A10-30-__Felis_cattus
  2. SEQA.A34-50-__Felis_cattus

    then I add to them the COL3 and COL4 values of the line3 and get an expected output:

    COL1 COL2 COL3 COL4 G1 SEQ1_10-67_-__Canis_lupus A B G1 SEQ4.1_90-345_-__Elpah_bis C D G1 SEQA.B_34-50_-__Felis_cattus L P G1 SEQA.B_34-50_-__Felis_cattus L P G1 SEQA.A-2-BICs_-__Felis_cattus E F G1 SEQA.A_10-30_-__Felis_cattus E F G1 SEQA.A_34-50_-__Felis_cattus E F G1 SEQA.B-2-BICs_-__Felis_cattus L P G2 SEQA.A_60-79_-__Felis_cattus K L G2 SEQA.A_34-50_-__Felis_cattus M N G2 SEQ3_10-67_-__Lupus_lupus O P

and I also remove the line3 that had the -BICspattern.

有人使用dplyr这样的主意吗?