Amazon has web server logs from multiple hosts.
Each log entry consists of:
- a Session ID
- a Web Page ID
Example (conceptual):
session_id, web_page_id
0001, product_1
0001, cart
0002, product_2
0003, home_page
0005, help
0001, checkout
0002, cart
0001, help
0001, product_1
0002, checkout
0005, checkout
0002, help
We want to find the 3-page pattern that is visited most frequently across all sessions.
In the example, the sequence <cart, checkout, help> appears twice, so it is the most frequent 3-page pattern.
Given logs in the form:
logs = [
session:0001, web:product_1
session:0001, web:cart
session:0002, web:product_2
session:0003, web:home_page
session:0005, web:help
session:0001, web:checkout
session:0002, web:cart
session:0001, web:help
session:0001, web:product_1
session:0002, web:checkout
session:0005, web:checkout
session:0002, web:help
]
Return the 3-page sequence that is most frequently visited, counting only in-order sequences within the same session.
先按 session_id 分组,并按时间顺序排列每个会话的页面,然后在每个会话里用大小为 3 的 滑动窗口 枚举所有三页面序列,用哈希表统计出现次数,最后返回出现次数最多的那一个三页面模式。
VOprep 团队长期陪同学员实战各类大厂 OA 与 VO,包括 Google、Amazon、Citadel、SIG 等,提供实时答案助攻、远程陪练与面试节奏提醒,帮助大家在关键时刻不卡壳。
如果你也在准备公司,可以了解一下我们的定制助攻方案——从编程面到系统设计,全程护航上岸。