查找图案并过滤开始位置

我想找到图案的位置并过滤位置。

我正在寻找一个函数,用于为每行返回30到34之间的模式“ gaaa”的开始位置。

我解释一下,目前这是函数str_locate_all的结果:

library(stringr)
Sequence <- data.frame(All = c("ggcgaagcagugcucccaguguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuu",
"aggacaacucgcuccacggccguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuu",
"cugaaauggcagcagaaacguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcaacaaa",
"ggucaaagaggaggagcucguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuu"))
str_locate_all(pattern = 'gaaa', Sequence$All)

[[1]]
     start end
[1,]    33  36
[2,]    73  76

[[2]]
     start end
[1,]    34  37
[2,]    74  77

[[3]]
     start end
[1,]     3   6
[2,]    15  18
[3,]    32  35
[4,]    72  75

[[4]]
     start end
[1,]    32  35
[2,]    72  75

结果就是我想要的:

       start
1         33
2         34
3         32
4         32

谢谢!

评论
祝玛丽
祝玛丽
Sequence$start <- 
  sapply(str_locate_all(pattern = 'gaaa', Sequence$All),
         function(z) { ind <- which(30 <= z[,1] & z[,1] <= 34); if (length(ind)) z[ind[1],1] else NA })
Sequence[,2,drop=FALSE]
#   start
# 1    33
# 2    34
# 3    32
# 4    32
点赞
评论
看!灰鸡
看!灰鸡

One dplyr and purrr solution could be:

map_dfr(.x = str_locate_all(pattern = "gaaa", Sequence$All),
        ~ as.data.frame(.x) %>%
         filter(start %in% c(30:34)),
        .id = "ID")

  ID start end
1  1    33  36
2  2    34  37
3  3    32  35
4  4    32  35
点赞
评论
Clare
Clare

Here is a way. It uses the output of the str_locate_all instruction in the question and filters it inn a lapply loop.

found <- str_locate_all(pattern = 'gaaa', Sequence$All)
found <- lapply(found, function(x){
  y <- x[, 'start']
  data.frame(start = y[y >= 30 & y <= 34])
})
do.call(rbind, found)
#  start
#1    33
#2    34
#3    32
#4    32
点赞
评论