python中的字符串操作以提取特定字段

我的文件中有一些数据,其中包含一些用户详细信息,示例行如下

<User id="123" directoryId="122" userName="vik_username" lowerUserName="vik_username" active="1" createdDate="2013-12-01 08:25:34.451" updatedDate="2014-01-20 19:45:49.133" firstName="Vik" lowerFirstName="vik" lastName="GG" lowerLastName="gg" displayName="Vik GG" lowerDisplayName="vikgg" emailAddress="vikgg@vik.com" lowerEmailAddress="vikgg@vik.com">

我想用python编写脚本并提取以下字段 1.用户名-userName =“ vik_username” 2.电子邮件地址-emailAddress =“ vikgg@vik.com” 3.显示名称-displayName =“ Vik GG” 4.活动状态-active =“ 1”

我写了这样的python脚本,然后在'“'上加双引号,后跟空格。

f = open("users.txt", "r")
user_array=[]
for x in f:
    y=x.split('" ')
    user_array.append(y)

这给了我一个名为user_array的数组,它具有每个用户的详细信息作为数组。

print user_array[0]

退货

['<User id="123', 'directoryId="122', 'userName="vik_username', 'lowerUserName="vik_username', 'active="1', 'createdDate="2013-12-01 08:25:34.451', 'updatedDate="2014-01-20 19:45:49.133', 'firstName="Vik', 'lowerFirstName="vik', 'lastName="GG', 'lowerLastName="gg', 'displayName="Vik GG', 'lowerDisplayName="vikgg', 'emailAddress="vikgg@vik.com', 'lowerEmailAddress="vikgg@vik.com">\n']

现在获取我想要的字段 1.用户名-userName =“ vik_username” 2.电子邮件地址-emailAddress =“ vikgg@vik.com” 3.显示名称-displayName =“ Vik GG” 4.活动状态-active =“ 1”

I would have to do something like print(user_array[0][<<index of my field>>]) and then split it again to remove the field tag for example userName="vik_username I will need to remove userName="

有人可以在python中以更有效的方式帮助我吗? 提前致谢。

评论
  • wqui
    wqui 回复

    您可以使用正则表达式:

    import re
    
    string = r'<User id="123" directoryId="122" userName="vik_username" lowerUserName="vik_username" active="1" createdDate="2013-12-01 08:25:34.451" updatedDate="2014-01-20 19:45:49.133" firstName="Vik" lowerFirstName="vik" lastName="GG" lowerLastName="gg" displayName="Vik GG" lowerDisplayName="vikgg" emailAddress="vikgg@vik.com" lowerEmailAddress="vikgg@vik.com">'
    re.findall(r'\"(.*?)\"', string)
    
    >>> ['123', '122', 'vik_username', 'vik_username', '1', '2013-12-01 08:25:34.451', '2014-01-20 19:45:49.133', 'Vik', 'vik', 'GG', 'gg', 'Vik GG', 'vikgg', 'vikgg@vik.com', 'vikgg@vik.com']
    

    The expression \"(.*?)\" captures (()) everything that begins and ends with quotation marks (\") with 0 or more characters in between (.*?).