在Google表格中提取不带域名扩展名的域名

我正在尝试仅从Google表格中的单元格提取域名。该单元格可以具有.com或.co.uk扩展名。

它们始终以www或https://的形式显示为裸域。

列A将包含一个随机URL列表,这些列表将显示为aol.com或amazon.co.uk

我已经尝试了以下和许多变体,我认为这实质上是“删除点之前的所有内容”,而regex实际上使用点来执行其职责。因此,这可能是造成潜在冲突的原因。

=REGEXEXTRACT(A4, "(.+).")

结果最终将通过管道传送到查询公司名称的QUERY。

=query(Companies!A2:F,"select A where F contains '"&A2&"'")
评论
Ethel
Ethel

Since you may have only .com or .co.uk at the end of the strings, you may use

=REGEXEXTRACT(A4, "^(.+)\.(?:co\.uk|com)$")

See the regex demo.

另外,您可以在结尾处将其删除

=REGEXREPLACE(A4, "\.(?:co\.uk|com)$", "")

See another regex demo

您可能还会考虑一些更通用的模式,例如

=REGEXEXTRACT(A4, "^(.+?)(?:\.co)?\.[^.]+$")
=REGEXREPLACE(A4, "(?:\.co)?\.[^.]+$", "")

图案细节

  • ^ - start of string
  • (.+) - 1 or more chars other than line break chars, as many as possible
  • (.+?) - 1 or more chars other than line break chars, as few as possible (needed in the more generic patterns because the subsequent pattern is optional)
  • \.(?:co\.uk|com)$ - . and then co.uk or com at the end of the string
  • (?:\.co)?\.[^.]+$ - an optional .co char sequence and then . and 1 or more chars other than a . till the end of the string.
点赞
评论