Dplyr管道groupby top_n未在组中获得top_n

I'm trying to obtain the top 2 names, sorted alphabetically, per group. I would think that top_n() would select this after I perform a group_by. However, this does not seem to be the case. This code shows the problem.

df <- data.frame(Group = c(0, 0, 0, 1, 1, 1),
                 Name = c("a", "c", "b", "e", "d", "f"))

df <- df %>%
      arrange(Name, Group) %>%
      group_by(Group) %>%
      top_n(2)

df

# A tibble: 2 x 2
# Groups:   Group [1]
  Group Name 
  <dbl> <chr>
1     1 e    
2     1 f 

预期输出为:

df <- df %>%
      arrange(Name, Group) %>%
      group_by(Group) %>%
      top_n(2)
df

      Group Name
1     0    a
2     0    b
3     1    d
4     1    e

或类似的东西。谢谢。

评论
笑伴孤单
笑伴孤单

top_n selects top n max values. You seem to need top n min values. You can use index with negative values to get that. Additionaly you don't need to arrange the data when using top_n.

library(dplyr)
df %>% group_by(Group) %>% top_n(-2, Name)


#  Group Name 
#  <dbl> <chr>
#1     0 a    
#2     0 b    
#3     1 e    
#4     1 d    

Another way is to arrange the data and select first two rows in each group.

df %>% arrange(Group, Name) %>% group_by(Group) %>% slice(1:2)
点赞
评论