依次为值创建一个“运行ID”

我有一个包含重复整数有序序列的向量:

x <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5, 5, 5, 6, 6, 9, 9, 9, 9)

I want to create a "run ID" (I assume using data.table::rleid()) for numbers that are in sequence. That is, numbers which are either equal or +1 the previous value.

因此,预期输出为:

x
#> [1] 1 1 1 2 2 2 2 3 3   5 5 5 5 6 6   9 9 9 9
data.table::rleid(???)
#> [1] 1 1 1 1 1 1 1 1 1   2 2 2 2 2 2   3 3 3 3

My first thought was to simply check if each value is the same or +1 the previous, but that doesn't work since the first change is considered a run of its own, obviously (a FALSE surrounded by TRUEs):

x
#> [1] 1 1 1 2 2 2 2 3 3   5 5 5 5 6 6   9 9 9 9
data.table::rleid((x - lag(x, default = 1)) %in% 0:1)
#> [1] 1 1 1 1 1 1 1 1 1   2 3 3 3 3 3   4 5 5 5

我显然需要一些东西,使我可以将每个值与最后一个不同的值进行比较,但是我无法考虑如何有效地做到这一点。有指针吗?

评论
  • 吐泡泡oo
    吐泡泡oo 回复

    x <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5, 5, 5, 6, 6, 9, 9, 9, 9)
    
    
    
    tibble(X = x) %>% 
      mutate(PREV.X = lag(X, default = 0),
             IS.SEQ = X != PREV.X & X != PREV.X + 1,
             RZLT = 1 + cumsum(IS.SEQ))
    
    # A tibble: 19 x 4
           X PREV.X IS.SEQ  RZLT
       <dbl>  <dbl> <lgl>  <dbl>
     1     1      0 FALSE      1
     2     1      1 FALSE      1
     3     1      1 FALSE      1
     4     2      1 FALSE      1
     5     2      2 FALSE      1
     6     2      2 FALSE      1
     7     2      2 FALSE      1
     8     3      2 FALSE      1
     9     3      3 FALSE      1
    10     5      3 TRUE       2
    11     5      5 FALSE      2
    12     5      5 FALSE      2
    13     5      5 FALSE      2
    14     6      5 FALSE      2
    15     6      6 FALSE      2
    16     9      6 TRUE       3
    17     9      9 FALSE      3
    18     9      9 FALSE      3
    19     9      9 FALSE      3
    

  • Upton
    Upton 回复

    How about using lag from dplyr with cumsum?

    library(dplyr)
    cumsum(x - lag(x,default = 0) > 1)+1
    [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3
    

    Or the data.table way with shift:

    library(data.table)
    cumsum(x - shift(x,1,fill = 0) > 1) + 1
    [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3