如何通过python列表项计算日期时间值

I have a python list that stores data for emails received and emails sent within same email thread. Emails received are marked as Email-In, and emails sent as Email-Out. Each Email-In or Email-Out has assigned specific requestID for the email thread and time stamp.

For example, for email thread with requestId = 735482556 there is incoming email on 15-May-2020 at 11:15:52:

[735482556, 'Email-In', '15-May-2020 11:15:52'] 

Values on my correspondence_list will change everyday and show data for different requestIDs as this script will scan my data daily.

My current correspondence_list:

[[735482556, 'Email-In', '15-May-2020 11:15:52'], [735482556, 'Email-Out', '15-May-2020 22:42:50'], [735482556, 'Email-In', '16-May-2020 11:58:41'], [735532797, 'Email-In', '16-May-2020 07:44:15'], [66789544, 'Email-In', '16-May-2020 10:44:15'], [66789544, 'Email-Out', '17-May-2020 11:44:15'], [66789544, 'Email-In', '17-May-2020 13:44:15'], [66789544, 'Email-Out', '17-May-2020 15:44:15'], [567432221, 'Email-In', '16-May-2020 20:30:15'], [567432221, 'Email-In', '16-May-2020 20:35:15'], [567432221, 'Email-Out', '16-May-2020 20:45:15']]

我要使用上面的列表来计算电子邮件输入和电子邮件输出的时间差,以便查看回复传入电子邮件需要多少时间。每个requestId可以有多封电子邮件进/出,具体取决于请求收到了多少回复。

So for example, requestId= 735482556 has 3 items. In this case I need to calculate the time difference between Email-In and Email-Out which is '11:26:58' and ignore the second 'Email-In' sent on '16-May-2020 11:58:41' as there is no Email-Out to pair it with.

 [735482556, 'Email-In', '15-May-2020 11:15:52'], [735482556, 'Email-Out', '15-May-2020 22:42:50'], [735482556, 'Email-In', '16-May-2020 11:58:41']

Desired output for my current correspondence_list:

 [[735482556, '11:26:58'], [735532797, 'not replied'], [66789544, '15:00:00', '02:15:00'],  [567432221, '0:15:00']

到目前为止,我的代码:

from datetime import datetime

s1 = '15-May-2020 11:15:52'
s2 = '15-May-2020 22:42:50' 
FMT = '%d-%b-%Y %H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)

def format_timedelta(tdelta):
    minutes, seconds = divmod(tdelta.seconds + tdelta.days * 86400, 60)
    hours, minutes = divmod(minutes, 60)
    return '{:d}:{:02d}:{:02d}'.format(hours, minutes, seconds)


myDifference = format_timedelta(tdelta)

上面的代码使我可以手动计算每个实例的时差。但是,我试图了解如何在列表中进行迭代,并在列表中进行计算,如果线程中没有电子邮件输入和电子邮件输出对,则忽略计算。有人可以帮忙吗?提前致谢!

评论
  • tqui
    tqui 回复

    What you want to do is find all matches. I will assume that requestID is unique and that there are no duplicate outgoing e-mails. In other words, there is only one pair of outgoing ingoing emails per ID.

    在这种情况下,您有几种选择。

    例如,您可以在列表上循环两次,并检查ID和传入的邮件。

    for ID, email, time in correspondence_list:
        if email.endswith("In"):
            for ID_2, email_2, time_2 in correspondence_list:
                if email_2.endswith("Out") and ID == ID_2:
                    #  calculate time
    

    或者您尝试从一开始就匹配它们。

    sorted_correspondence = dict()
    for ID, email, time in correspondence_list:
        sorted_correspondence.get(ID, []).append(time)
    

    现在您可以遍历字典,并且值应该是您的两个时间戳。如果只有1个值,则可以得出没有答案的结论。

    编辑

    只是注意到您想检查多个传入和传出电子邮件。在这种情况下,您可以使用字典方法,也可以向其中添加电子邮件数据。在这种情况下,您必须检查哪些进出。万一总是有一个传出和一个传入,您可以更改值以匹配它们。如果有多个传入电子邮件或多个外发电子邮件,您可以尝试为所有外发电子邮件创建堆栈,并按传入的电子邮件弹出它们。但是,最后的行动实际上取决于答复应如何与传入的电子邮件相匹配。