使用data.table的嵌套分组中的前n个

目标:按季度和名称分组我想按计数获得前n个名称(请参见下面的示例)。因此,对于前1个(对于以下示例),所需的输出为:

2019 Q1  Klaus 2
2019 Q2   Karl 3

As this is just a toy example going forward I also want to have the top 4, 5 etc by count per quarter and name. Do you have any good ideas how to implement this with data.table (no dplyr please). Many thanks!

library(data.table)

dt <- data.table(x = c("2019 Q1", "2019 Q1", "2019 Q1", "2019 Q2", "2019 Q2", "2019 Q2", "2019 Q2"),
                 y = c("Klaus", "Gustav", "Klaus", "Karl", "Karl", "Karl", "Stefan"))

# Structure of dt
# x      y
# 1: 2019 Q1  Klaus
# 2: 2019 Q1 Gustav
# 3: 2019 Q1  Klaus
# 4: 2019 Q2   Karl
# 5: 2019 Q2   Karl
# 6: 2019 Q2   Karl
# 7: 2019 Q2 Stefan


dt[, .N, by = .(x, y)]

# Output:
# x      y N
# 1: 2019 Q1  Klaus 2
# 2: 2019 Q1 Gustav 1
# 3: 2019 Q2   Karl 3
# 4: 2019 Q2 Stefan 1
评论
  • ~签名
    ~签名 回复

    Here is a base R solution using aggregate

    > aggregate(y~x,dt,function(v) as.matrix(head(data.frame(sort(table(v),decreasing = TRUE)),1)))
            x   y.1 y.2
    1 2019 Q1 Klaus   2
    2 2019 Q2  Karl   3
    
  • 龙舌兰
    龙舌兰 回复

    here is another data.table approach, almost the same as Gilean's answer, but without head().

    dt[, .N, by = .(x,y) ][ order(-N), .SD[1:1], by = x ]
    
    #          x     y N
    # 1: 2019 Q2  Karl 3
    # 2: 2019 Q1 Klaus 2
    
  • 笙歌i
    笙歌i 回复

    您可以首先计算每个名称和每个季度的N个,然后对data.table进行排序,然后选择每个季度的前n行:

    dt[, .N, by = .(x, y)][order(-N), head(.SD, 1), by = x]