样本日志文件
Jun 15 02:04:59 combo sshd(pam_unix)[20897]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=220-135-151-1.hinet-ip.hinet.net user=root\n'
Jun 15 02:04:59 combo sshd(pam_unix)[20898]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=220-135-151-1.hinet-ip.hinet.net user=root\n'
Jun 15 04:06:18 combo su(pam_unix)[21416]: session opened for user cyrus by (uid=0)\n'
Jun 15 04:06:19 combo su(pam_unix)[21416]: session closed for user cyrus\n'
Jun 15 04:06:20 combo logrotate: ALERT exited abnormally with [1]\n'
Jun 15 04:12:42 combo su(pam_unix)[22644]: session opened for user news by (uid=0)\n'
Jun 15 04:12:43 combo su(pam_unix)[22644]: session closed for user news\n'
我想将数据分为4列:日期,时间,PID和消息。
样本输出为
Dict = {"Date": "Jun 15", "Time": "02:04:59", "PID": "20897", "Message": "authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=220-135-151-1.hinet-ip.hinet.net user=root\n'"}
之后,我打算将这些信息基于列保存到CSV文件中
我尝试查看其他示例,例如:
Parse a custom log file in python
How to parse this custom log file in Python
但我不知道如何创建捕获组来帮助我实现这一目标。
我目前的正则表达式是
日期的“(\ w {3} \ d {2})”
“(\ d {2}:\ d {2}:\ d {2})”
PID的“(?<= [)。+?(?=] :)”
消息的“((?? ==)。*)”
但是当我将它们组合在一起时,什么也没有发生
What do you mean combine them together? Have you tried doing it in a for loop? That's probably that way I would go about doing it. It sounds like you are trying to capture all groups and passing them to the
re.findall
(I'm guessing). But findall is used to capture multiple instances of a single capture group. Hence, put your regex in a list, iterate and match each one usingre.find
or thecaptures
method. The regex you have is correct (though for the date, I would capture the first two words of each line).