在Google表格中提取不带域名扩展名的域名

我正在尝试仅从Google表格中的单元格提取域名。该单元格可以具有.com或.co.uk扩展名。

它们始终以www或https://的形式显示为裸域。

列A将包含一个随机URL列表,这些列表将显示为aol.com或amazon.co.uk

我已经尝试了以下和许多变体,我认为这实质上是“删除点之前的所有内容”,而regex实际上使用点来执行其职责。因此,这可能是造成潜在冲突的原因。

=REGEXEXTRACT(A4, "(.+).")

结果最终将通过管道传送到查询公司名称的QUERY。

=query(Companies!A2:F,"select A where F contains '"&A2&"'")
评论
  • Ethel
    Ethel 回复

    Since you may have only .com or .co.uk at the end of the strings, you may use

    =REGEXEXTRACT(A4, "^(.+)\.(?:co\.uk|com)$")
    

    See the regex demo.

    另外,您可以在结尾处将其删除

    =REGEXREPLACE(A4, "\.(?:co\.uk|com)$", "")
    

    See another regex demo

    您可能还会考虑一些更通用的模式,例如

    =REGEXEXTRACT(A4, "^(.+?)(?:\.co)?\.[^.]+$")
    =REGEXREPLACE(A4, "(?:\.co)?\.[^.]+$", "")
    

    图案细节

    • ^ - start of string
    • (.+) - 1 or more chars other than line break chars, as many as possible
    • (.+?) - 1 or more chars other than line break chars, as few as possible (needed in the more generic patterns because the subsequent pattern is optional)
    • \.(?:co\.uk|com)$ - . and then co.uk or com at the end of the string
    • (?:\.co)?\.[^.]+$ - an optional .co char sequence and then . and 1 or more chars other than a . till the end of the string.