计算数字出现的次数

我有这个缩小的数据框

ind;year;n
67;2016;1
76;2016;1
95;2016;2
171;2016;3
60;2017;1
73;2017;1
95;2017;3
171;2017;1
175;2017;1
60;2018;4
95;2018;7
96;2018;1
99;2018;1
171;2018;1
171;2019;2
172;2019;1
178;2019;1

我想计算一下每年出现的人数,不包括前几年出现的人数。 在这种情况下,它将如下所示:

year       n
 2016      4
 2017      3
 2018      2
 2019      2

我使用了它,但不排除前几年出现的那些

df %>%
  group_by(ind, year) %>%
  dplyr::summarise(totalcount =n())%>%
  group_by(year)%>%
  tally()
评论
  • XOC莫
    XOC莫 回复

    Here is an option in base R

    lst1 <- split(df$ind, df$year)
    lst1[] <- lengths(Reduce(function(x, y) y[!x %in% y],
                split(df$ind, df$year), accumulate = TRUE))
    setNames(stack(lst1)[2:1], c('year', 'n'))
    #  year n
    #1 2016 4
    #2 2017 3
    #3 2018 3
    #4 2019 2
    
  • 旗才艺
    旗才艺 回复
    library(dplyr)
    dat %>%
      group_by(ind) %>%
      slice(which.max(year)) %>%
      group_by(year) %>%
      tally(name = "tallycount")
    # # A tibble: 4 x 2
    #    year tallycount
    #   <int>      <int>
    # 1  2016          2
    # 2  2017          2
    # 3  2018          4
    # 4  2019          3
    

    这些数字与您的问题略有不同,但基于您的措辞

    不包括前几年出现的那些

    我建议这些遵循您的约束:

    dat %>%
      group_by(ind) %>%
      mutate(choose = year == max(year)) %>%
      arrange(year, ind)
    # # A tibble: 17 x 4
    # # Groups:   ind [11]
    #      ind  year     n choose
    #    <int> <int> <int> <lgl> 
    #  1    67  2016     1 TRUE 
    #  2    76  2016     1 TRUE  
    #  3    95  2016     2 FALSE 
    #  4   171  2016     3 FALSE  # 2016: 4 total, 2 true
    #  5    60  2017     1 FALSE 
    #  6    73  2017     1 TRUE  
    #  7    95  2017     3 FALSE 
    #  8   171  2017     1 FALSE 
    #  9   175  2017     1 TRUE   # 2017: 5 total, 2 true
    # 10    60  2018     4 TRUE  
    # 11    95  2018     7 TRUE  
    # 12    96  2018     1 TRUE  
    # 13    99  2018     1 TRUE  
    # 14   171  2018     1 FALSE  # 2018: 5 total, 4 true
    # 15   171  2019     2 TRUE
    # 16   172  2019     1 TRUE  
    # 17   178  2019     1 TRUE   # 2019: 3 total, 3 true
    

    数据:

    dat <- structure(list(ind = c(67L, 76L, 95L, 171L, 60L, 73L, 95L, 171L, 175L, 60L, 95L, 96L, 99L, 171L, 171L, 172L, 178L), year = c(2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L), n = c(1L, 1L, 2L, 3L, 1L, 1L, 3L, 1L, 1L, 4L, 7L, 1L, 1L, 1L, 2L, 1L, 1L)), class = "data.frame", row.names = c(NA, -17L))