我正在尝试创建一个查找具有最小长度的句子的正则表达式。
我的条件确实是:
- 序列中至少必须有4个字
- 顺序中的单词必须不同
- 顺序后必须加上一些标点符号。
到目前为止,我已经尝试过
^(\b\w*\b\s?){2,}\s?[.?!]$
如果我的示例文本是:
This is a sentence I would like to parse.
This is too short.
Single word
Not not not distinct distinct words words.
Another sentence that I would be interested in.
我想匹配字符串1和5。
I am using the python re library. I am using regex101 to test and it appears the regex I have above is doing quite a bit of work regards to backtracking so I imagine those knowledgable in regex may be a bit appalled (my apologies).