依次为值创建一个“运行ID”

我有一个包含重复整数有序序列的向量:

x <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5, 5, 5, 6, 6, 9, 9, 9, 9)

I want to create a "run ID" (I assume using data.table::rleid()) for numbers that are in sequence. That is, numbers which are either equal or +1 the previous value.

因此,预期输出为:

x
#> [1] 1 1 1 2 2 2 2 3 3   5 5 5 5 6 6   9 9 9 9
data.table::rleid(???)
#> [1] 1 1 1 1 1 1 1 1 1   2 2 2 2 2 2   3 3 3 3

My first thought was to simply check if each value is the same or +1 the previous, but that doesn't work since the first change is considered a run of its own, obviously (a FALSE surrounded by TRUEs):

x
#> [1] 1 1 1 2 2 2 2 3 3   5 5 5 5 6 6   9 9 9 9
data.table::rleid((x - lag(x, default = 1)) %in% 0:1)
#> [1] 1 1 1 1 1 1 1 1 1   2 3 3 3 3 3   4 5 5 5

我显然需要一些东西,使我可以将每个值与最后一个不同的值进行比较,但是我无法考虑如何有效地做到这一点。有指针吗?

评论
吐泡泡oo
吐泡泡oo
x <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5, 5, 5, 6, 6, 9, 9, 9, 9)



tibble(X = x) %>% 
  mutate(PREV.X = lag(X, default = 0),
         IS.SEQ = X != PREV.X & X != PREV.X + 1,
         RZLT = 1 + cumsum(IS.SEQ))

# A tibble: 19 x 4
       X PREV.X IS.SEQ  RZLT
   <dbl>  <dbl> <lgl>  <dbl>
 1     1      0 FALSE      1
 2     1      1 FALSE      1
 3     1      1 FALSE      1
 4     2      1 FALSE      1
 5     2      2 FALSE      1
 6     2      2 FALSE      1
 7     2      2 FALSE      1
 8     3      2 FALSE      1
 9     3      3 FALSE      1
10     5      3 TRUE       2
11     5      5 FALSE      2
12     5      5 FALSE      2
13     5      5 FALSE      2
14     6      5 FALSE      2
15     6      6 FALSE      2
16     9      6 TRUE       3
17     9      9 FALSE      3
18     9      9 FALSE      3
19     9      9 FALSE      3
点赞
评论
Upton
Upton

How about using lag from dplyr with cumsum?

library(dplyr)
cumsum(x - lag(x,default = 0) > 1)+1
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3

Or the data.table way with shift:

library(data.table)
cumsum(x - shift(x,1,fill = 0) > 1) + 1
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3
点赞
评论