使用正则表达式将自定义日志文件解析为字典

样本日志文件

Jun 15 02:04:59 combo sshd(pam_unix)[20897]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=220-135-151-1.hinet-ip.hinet.net  user=root\n'
Jun 15 02:04:59 combo sshd(pam_unix)[20898]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=220-135-151-1.hinet-ip.hinet.net  user=root\n'
Jun 15 04:06:18 combo su(pam_unix)[21416]: session opened for user cyrus by (uid=0)\n'
Jun 15 04:06:19 combo su(pam_unix)[21416]: session closed for user cyrus\n'
Jun 15 04:06:20 combo logrotate: ALERT exited abnormally with [1]\n'
Jun 15 04:12:42 combo su(pam_unix)[22644]: session opened for user news by (uid=0)\n'
Jun 15 04:12:43 combo su(pam_unix)[22644]: session closed for user news\n'

我想将数据分为4列:日期,时间,PID和消息。

样本输出为

Dict = {"Date": "Jun 15", "Time": "02:04:59", "PID": "20897", "Message": "authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=220-135-151-1.hinet-ip.hinet.net  user=root\n'"}

之后,我打算将这些信息基于列保存到CSV文件中

我尝试查看其他示例,例如:

Parse a custom log file in python

How to parse this custom log file in Python

但我不知道如何创建捕获组来帮助我实现这一目标。

我目前的正则表达式是

日期的“(\ w {3} \ d {2})”

“(\ d {2}:\ d {2}:\ d {2})”

PID的“(?<= [)。+?(?=] :)”

消息的“((?? ==)。*)”

但是当我将它们组合在一起时,什么也没有发生

评论
York
York

What do you mean combine them together? Have you tried doing it in a for loop? That's probably that way I would go about doing it. It sounds like you are trying to capture all groups and passing them to the re.findall (I'm guessing). But findall is used to capture multiple instances of a single capture group. Hence, put your regex in a list, iterate and match each one using re.find or the captures method. The regex you have is correct (though for the date, I would capture the first two words of each line).

点赞
评论