使用name_repair以编程方式处理重复的列

I am importing a spreadsheet where I have a known vector of what the column headings were originally. When read_excel imports the data, it rightly complains of the duplicated columns and renames them to distinguish them. This is great behaviour. My question is how might I select (from the duplicated columns) the first occurrence of that duplicated column, drop all other duplicated ones and then rename the column back to the original name. I have a working script but it seems clunky. I always struggle to manipulate column headers programmatically within a pipeline.

library(readxl)
library(dplyr, warn.conflicts = FALSE)

cols_names <- c("Sepal.Length", "Sepal.Length", "Petal.Length", "Petal.Length", "Species")

datasets <- readxl_example("datasets.xlsx")

d <- read_excel(datasets, col_names = cols_names, skip = 1)
#> New names:
#> * Sepal.Length -> Sepal.Length...1
#> * Sepal.Length -> Sepal.Length...2
#> * Petal.Length -> Petal.Length...3
#> * Petal.Length -> Petal.Length...4


d_sub <- d %>% 
  select(!which(duplicated(cols_names)))

new_col_names <- gsub("\\.\\.\\..*","", colnames(d_sub))

colnames(d_sub) <- new_col_names

d_sub
#> # A tibble: 150 x 3
#>    Sepal.Length Petal.Length Species
#>           <dbl>        <dbl> <chr>  
#>  1          5.1          1.4 setosa 
#>  2          4.9          1.4 setosa 
#>  3          4.7          1.3 setosa 
#>  4          4.6          1.5 setosa 
#>  5          5            1.4 setosa 
#>  6          5.4          1.7 setosa 
#>  7          4.6          1.4 setosa 
#>  8          5            1.5 setosa 
#>  9          4.4          1.4 setosa 
#> 10          4.9          1.5 setosa 
#> # ... with 140 more rows

Created on 2020-04-08 by the reprex package (v0.3.0)

知道如何以更简化的方式执行此操作吗?

评论