正则表达式也不返回定界模式

我是python和regex领域的新手,所以如果这是一个非常基本的问题,请告诉我。 我正在尝试编写一个正则表达式以返回li标签之间的字符串。

我正在查看以下2种类型的输入字符串:

Case 1:
      <li>some string with spaces and special characters

Case2:
      <li>some string with spaces and special characters</li>

我正在写一个python脚本。

我有的正则表达式是

<li>(.+)[\\n|</li>]

我面临的问题是案例2。 研究正在回归

some string with spaces and special characters</li

我不希望在返回字符串中关闭它或它的任何部分。

有指针吗?

提前致谢

评论
  • 冷Oo
    冷Oo 回复

    Your problem is that [\\n|</li>] is a character class, which means any one of the characters \n, |, <, etc. So it matches (.+) greedily and only leaves the > for the final part. You want (?:\\n|</li>) instead (a non-capturing group, instead of a character class).

    (\\n|</li>) would also work, but would capture that part as group 2 and you have no need for it, hence the addition of ?:.

    So your regex becomes: <li>(.+)(?:\\n|</li>)