正则表达式正在运行,但是代码看起来很可怕

I'm cleaning up a long list of noun-phrases for further text mining. They're supposed to be 1- or 2-word phrases, but some have / in a conjunction. Here's what I've got:

library(tidyverse)
conjuncts <- tibble(usecase = 1:3,
                   classes = c("Insulators/Insulation",
                               "Optic/light fiber",
                               "Magnets"))

而且我要:

wanted <- tibble(usecase = c(1,1,2,2,3),
                 classes =  c("Insulators/Insulation",
                              "Insulators/Insulation",
                              "Optic/light fiber",
                              "Optic/light fiber",
                              "Magnets"),
                 bigrams = c("Insulators", "Insulation",
                             "Optic fiber", "Light fiber", NA))

我有一些有效的方法,但是它既可怕又不可扩展。

patternSplit <- function(class){
  regexs <- c("(?x) ^ (\\w+) / (\\w+) $",
              "(?x) ^ (\\w+) / (\\w+) \\s+ (\\w+) $")
  if(str_detect(class, regexs[1])){
    extr <- str_match(class, regexs[1])
    list(extr[1,2],
         extr[1,3]) 
  } else if(str_detect(class, regexs[2])){
    extr <- str_match(class, regexs[2])
    list(paste(extr[1,2], extr[1,4]), 
         paste(extr[1,3], extr[1,4])) 
  } else {
    list(NA_character_)
  }
}

anx <- conjuncts %>% 
  mutate(bigrams = map(classes, patternSplit)) %>% 
  unnest(cols = "bigrams") %>% 
  unnest(cols = "bigrams")

这给了我我想要的东西,但是blecchh!

# A tibble: 5 x 3
  usecase classes               bigrams    
    <int> <chr>                 <chr>      
1       1 Insulators/Insulation Insulators 
2       1 Insulators/Insulation Insulation 
3       2 Optic/light fiber     Optic fiber
4       2 Optic/light fiber     light fiber
5       3 Magnets               NA         

The top two problems (1) I have to run the rexex twice - once with str_detect to get the logical for the if / else and again with str_match to pull out the tokens. (2) I have do the double unnest to unwind the list structure. And smaller problem (3) Can I get out of if / else, into case_when or switch?

我最终会将其扩展到大约十二种模式和用例。