两个表中公司名称的部分匹配

我是R的初学者。 我有两个文件,一种日期类型-tbl_df(F1和F2)。每个文件中都包含一列公司名称(F1-Company和F2-Company)。 F1是我的主要文件,我从F2选择了公司,并将其与F1的公司进行了比较。 在每个文件中,我可以使用:

F1[grep('cola|bank|veriz', F1$F1-Company, ignore.case=TRUE),]

要么

F1 %>% filter(grepl('country1|country2', Country, ignore.case=TRUE) & grepl('bank', F1-Company, ignore.case=TRUE))

如果我需要使用多个参数进行搜索。 并且这两个代码运行完美。 我要解决的“问题”是:如何在F1(列F1-Company)中找到(列F2-Company)中的公司。 F1-Company和F2-Company之间的单元格中没有100%匹配(例如F1-Company列中的“ Coca-Cola”和F2-Company列中的“ cola”)。 我试图使用命令:

merge(F1[grepl('*','F1-Company',ignore.case=TRUE)], F2) 

要么

match_df(F2, F1, on = "Company")

In this case, key columns (Company) in both tables have the same name. But all of these work when the name of the company in cell 100% matched in both files F1 and F2 (such as "Coca-Cola" in F1-Company and "Coca-Cola" in F2-Company). P.S. Also I tried library(sqldf), and it works inside each file so nice:

sqldf("SELECT * FROM F1 WHERE F1-Company LIKE '%cola%' OR F1-Company LIKE '%bank%'") 

并再次实现单元格之间的100%匹配:

sqldf("SELECT * FROM F1 WHERE EXISTS (SELECT F2-Company FROM F2 WHERE F2-Company = F1-Company)")

I found interesting SQL examples, I thought it would useful to me:

sqldf("SELECT * FROM F1 WHERE F1-Company LIKE '%' + F2-Company + '%'")

but it doesn't work (Error: no such column: F2-Company), and

sqldf("SELECT * FROM testsql LEFT JOIN testsql2 ON testsql2.Company3 LIKE CONCAT('%', testsql.Company, '%')")

also doesn't work (Error: no such function: CONCAT). How I can modify R command grep/grepl or SQL code in R (sqldf) to use to my unusual case, compare company names between two files without 100% matching in cells? I am sorry maybe I miss something.

评论