python中的正则表达式

我有这段文字:

DIAGNOSIS

M19.072 Primary osteoarthritis, left ankle and foot

O   RTHOSIS DEVICE(S) PRESCRIBED



gfgfgfggfgfg

111111111112ffffffffff
gfggggg

wwwwwwwwww




DIAGNOSIS

M17.12 Unilateral primary osteoarthritis, left knee

O   RTHOSIS DEVICE(S) PRESCRIBED




gfgfgfggfgfg
11111ttttfffff

gffffffffffffffffffffffffwwwwwwwwwree





DIAGNOSIS

M75.42 Impingement syndrome of left shoulder

O   RTHOSIS DEVICE(S) PRESCRIBED




gfgfgfggfgfg
111111111112ffffffffff

gfggggg
wwwwwwwwww

我只想得到这三行!

M19.072 Primary osteoarthritis, left ankle and foot
M17.12 Unilateral primary osteoarthritis, left knee
M75.42 Impingement syndrome of left shoulder

这是我的python代码,但有时没有用!

diagnosis_Answer = re.findall(r"(DIAGNOSIS(\s.*?)+RTHOSIS DEVICE)+", txt)
评论
oqui
oqui

I'd suggest you just take the lines with the good beginning M\d{2}\.\d+.*, the M and the digits

diagnosis_Answer = re.findall(r"M\d{2}\.\d+.*", text)
点赞
评论
et_et
et_et

您可以使用

DIAGNOSIS\n(.*)\n.*RTHOSIS DEVICE

See the regex demo. Details:

  • DIAGNOSIS\n - DIAGNOSIS and a newline
  • (.*) - Group 1: any 0 or more chars other than line break chars, as many as possible
  • \n - a newline
  • .*RTHOSIS DEVICE - any 0 or more chars other than line break chars, as many as possible and then RTHOSIS DEVICE string.

Python demo:

import re
txt = 'DIAGNOSIS\nM19.072 Primary osteoarthritis, left ankle and foot\nO   RTHOSIS DEVICE(S) PRESCRIBED\n\ngfgfgfggfgfg\n111111111112ffffffffff\ngfggggg\nwwwwwwwwww\n\nDIAGNOSIS\nM17.12 Unilateral primary osteoarthritis, left knee\nO   RTHOSIS DEVICE(S) PRESCRIBED\n\n\ngfgfgfggfgfg\n11111ttttfffff\ngffffffffffffffffffffffffwwwwwwwwwree\n\nDIAGNOSIS\nM75.42 Impingement syndrome of left shoulder\nO   RTHOSIS DEVICE(S) PRESCRIBED\n\n\ngfgfgfggfgfg\n111111111112ffffffffff\ngfggggg\nwwwwwwwwww\n'
diagnosis_Answer = re.findall(r"DIAGNOSIS\n(.*)\n.*RTHOSIS DEVICE", txt)
print(diagnosis_Answer)
点赞
评论