Given a log of website requests, where each line contains an entry with the following fields (time, customerId, pageVisited), write an algorithm to find the top 3-page sequence of page visits for all customers. Each line represents a request (A-Z) made by customer (C#) at time T to one of the website’s pages.
For example, given the following log file containing:
T0,C1,A
T0,C2,E
T1,C1,B
T1,C2,B
T2,C1,C
T2,C2,C
T3,C1,D
T3,C2,D
T4,C1,E
T5,C2,A
Sequence of visits for each customer:
C1 = A -> B -> C -> D -> E
C2 = E -> B -> C -> D -> A
Answer: We see that the most common 3-page sequence visited by a customer is B -> C -> D.
这道题本质上是日志统计与模式频次分析:先按 customerId 将访问记录按时间排序,恢复每个用户的完整浏览序列;再从每个用户的序列中枚举所有连续的 3 页窗口,把同一三页组合在不同用户中的出现次数累计起来,最终找出全局出现次数最多的 3-page sequence。实现时通常会用哈希表按用户分组、再用哈希表统计三元组频次,时间复杂度主要取决于日志条数和每个用户的访问长度。