I am importing a spreadsheet where I have a known vector of what the column headings were originally. When read_excel
imports the data, it rightly complains of the duplicated columns and renames them to distinguish them. This is great behaviour. My question is how might I select (from the duplicated columns) the first occurrence of that duplicated column, drop all other duplicated ones and then rename the column back to the original name. I have a working script but it seems clunky. I always struggle to manipulate column headers programmatically within a pipeline.
library(readxl)
library(dplyr, warn.conflicts = FALSE)
cols_names <- c("Sepal.Length", "Sepal.Length", "Petal.Length", "Petal.Length", "Species")
datasets <- readxl_example("datasets.xlsx")
d <- read_excel(datasets, col_names = cols_names, skip = 1)
#> New names:
#> * Sepal.Length -> Sepal.Length...1
#> * Sepal.Length -> Sepal.Length...2
#> * Petal.Length -> Petal.Length...3
#> * Petal.Length -> Petal.Length...4
d_sub <- d %>%
select(!which(duplicated(cols_names)))
new_col_names <- gsub("\\.\\.\\..*","", colnames(d_sub))
colnames(d_sub) <- new_col_names
d_sub
#> # A tibble: 150 x 3
#> Sepal.Length Petal.Length Species
#> <dbl> <dbl> <chr>
#> 1 5.1 1.4 setosa
#> 2 4.9 1.4 setosa
#> 3 4.7 1.3 setosa
#> 4 4.6 1.5 setosa
#> 5 5 1.4 setosa
#> 6 5.4 1.7 setosa
#> 7 4.6 1.4 setosa
#> 8 5 1.5 setosa
#> 9 4.4 1.4 setosa
#> 10 4.9 1.5 setosa
#> # ... with 140 more rows
Created on 2020-04-08 by the reprex package (v0.3.0)
知道如何以更简化的方式执行此操作吗?