我正在尝试修改文件的每一行,以删除任何以字符'('开头或方括号中包含数字/字符,即'[2]'的部分:
f = open('/Users/name/Desktop/university_towns.txt',"r")
listed = []
import re
for i in f.readlines():
if i.find(r'\(.*?\)\n'):
here = re.sub(r'\(.*?\)\[.*?\]\n', "", i)
listed.append(here)
elif i.find(r' \(.*?\)\n'):
here = re.sub(r' \(.*?\)\[.*?\]\n', "", i)
listed.append(here)
elif i.find(r' \[.*?\]\n'):
here = re.sub(r' \[.*?\]\n', "", i)
listed.append(here)
else:
here = re.sub(r'\[.*?\]\n', "", i)
listed.append(here)
我的输入数据样本:
Platteville (University of Wisconsin–Platteville)[2]
River Falls (University of Wisconsin–River Falls)[2]
Stevens Point (University of Wisconsin–Stevens Point)[2]
Waukesha (Carroll University)
Whitewater (University of Wisconsin–Whitewater)[2]
Wyoming[edit]
Laramie (University of Wyoming)[5]
我的输出数据样本:
Platteville
River Falls
Stevens Point
Waukesha (Carroll University)
Whitewater
Wyoming[edit]
Laramie
但是,我不需要诸如“((卡洛尔大学)”)或“ [编辑]”之类的部分。
如何修改配方?
如果有人可以给我任何建议,我将非常感激!
你可以做:
印刷品:
改用此RegEx:
\(.*\)|\[.*\]
像这样:
re.sub(r'\(.*\)|\[.*\]', '', i)
This will substitute anything in parenthesis (
\(.*\)
) or (|
) anything in square brackets (\[.*\]
)