我有这段文字:
DIAGNOSIS
M19.072 Primary osteoarthritis, left ankle and foot
O RTHOSIS DEVICE(S) PRESCRIBED
gfgfgfggfgfg
111111111112ffffffffff
gfggggg
wwwwwwwwww
DIAGNOSIS
M17.12 Unilateral primary osteoarthritis, left knee
O RTHOSIS DEVICE(S) PRESCRIBED
gfgfgfggfgfg
11111ttttfffff
gffffffffffffffffffffffffwwwwwwwwwree
DIAGNOSIS
M75.42 Impingement syndrome of left shoulder
O RTHOSIS DEVICE(S) PRESCRIBED
gfgfgfggfgfg
111111111112ffffffffff
gfggggg
wwwwwwwwww
我只想得到这三行!
M19.072 Primary osteoarthritis, left ankle and foot
M17.12 Unilateral primary osteoarthritis, left knee
M75.42 Impingement syndrome of left shoulder
这是我的python代码,但有时没有用!
diagnosis_Answer = re.findall(r"(DIAGNOSIS(\s.*?)+RTHOSIS DEVICE)+", txt)
I'd suggest you just take the lines with the good beginning
M\d{2}\.\d+.*
, theM
and the digits您可以使用
See the regex demo. Details:
DIAGNOSIS\n
-DIAGNOSIS
and a newline(.*)
- Group 1: any 0 or more chars other than line break chars, as many as possible\n
- a newline.*RTHOSIS DEVICE
- any 0 or more chars other than line break chars, as many as possible and thenRTHOSIS DEVICE
string.Python demo: