如何提取所有在2020年1月的前两个会话中产生的收入超过500美元的visitor_id的列表?

表格:

  1. 届会
  2. session_ts
  3. visitor_id
  4. 垂直
  5. session_id
  6. 交易次数
  7. session_ts
  8. session_id
  9. rev_bucket
  10. 收入

当前具有以下查询(使用SQLite):

SELECT
  s.visitor_id,
  sub.session_id,
  month,
  year,
  total_rev,
  CASE 
    WHEN (row_num IN (1,2) >= total_rev >= 500) THEN 'Yes'
    ELSE 'No' END AS High_Value_Transactions,
  sub.row_num
FROM
  sessions s
JOIN
  (
    SELECT
    s.visitor_id,
    t.session_id,
    strftime('%m',t.session_ts) as month,
    strftime('%Y',t.session_ts) as year,
    SUM(t.revenue) as total_rev,
    row_number() OVER(PARTITION BY s.visitor_id ORDER BY s.session_ts) as row_num
  FROM
    Transactions t
  JOIN
    sessions s
  ON
    s.session_id = t.session_id
  WHERE strftime('%m',t.session_ts) = '01'
  AND strftime('%Y',t.session_ts) = '2020'
  GROUP BY 1,2
) sub
ON
  s.session_id = sub.session_id
WHERE sub.row_num IN (1,2)
ORDER BY 1

我在确定前两个会话的总费用为500美元时遇到了麻烦。 公开接受任何反馈并简化查询。谢谢!

评论
qea
qea

您可以使用窗口函数和聚合:

select visitor_id, sum(t.revenue) total_revenue
from (
    select 
        s.visitor_id, 
        t.revenue, 
        row_number() over(partition by s.visitor_id order by t.session_ts) rn
    from transactions t
    inner join sessions s on s.session_id = t.session_id
    where t.session_ts >= '2020-01-01' and t.session_ts < '2020-02-01'
) t
where rn <= 2
group by visitor_id
having sum(t.revenue) >= 500

子查询将两个表连接起来,在目标月份上进行过滤(请注意,使用半开放间隔谓词比在date列上应用日期函数更有效),并在同一位客户的访问组中对每一行进行排名。

Then, the outer query filters on the first two visits per visitor, aggregates by visitor, computes the corresponding revenue, and filters it with a having clause.

点赞
评论