我正在尝试使用str_extract从给定模式的空格和标点符号的可变格式中提取文本字符串中的单词“ Present”或“ Absent”。我在逻辑上哪里去了?
test<-c("as follows: ABC Staining Absent in Tissue","as follows: ABC: StainingPresent in Tissue","as follows: ABC: Staining Present in Tissue","as follows ABC Staining Present in Tissue extra words here in Present")
pattern<-"(?<=ABC[:]|[\\s]* Staining ).*(?=in)"
unique(str_extract(string = test, (pattern)))
You may use
stringr::str_match
:See the R demo online and the regex demo.
细节
ABC
-ABC
string[:\s]*
- 0 or more colons or whitespacesStaining
- aStaining
string[:\s]*
- 0 or more colons or whitespaces(.*?)
-Group 1: any zero or more chars other than line break chars, as few as possible(?=\s*\bin\b)
- a positive lookahead that requires 0+ whitespaces and then a whole wordin
immediately to the right of the current location.