Google VO Interview Question: RPC Timeout Detection from Start/End Logs

19 Views
No Comments

Imagine you have an RPC server that produces a log of entries, and you’re analyzing it offline. There are two entries for each call: one when the RPC starts and one when the RPC finishes processing. We’d like to know as soon as possible if there’s an RPC that took too much time / timed out.

With timeout 3

Schema: <RpcId, Timestamp, Start or End>

Example log:

ID 1, 0, Start
ID 2, 1, Start
ID 1, 2, End
ID 3, 6, Start
ID 2, 7, End
ID 3, 8, End

Find the top ten most frequent IP addresses from a web server log file.

A sample line from the log file:

14716104719, GET, /index.html, 132.49.16.172, ...

Part 2

Several IP addresses can have the same count. Let’s write code to output the top 10 counts with the list of IP addresses for each count, rather than the top 10 IPs.

You can optionally reuse all or parts of existing code in 1.

This question combines log processing and aggregation. In the RPC part, you pair each Start and End event by RpcId, compute the elapsed time, and detect timeouts as early as possible. In the web log part, you count IP frequencies with a hash map and then report the top 10 frequency groups, keeping all IPs that share the same count together. A clean solution usually relies on hash tables for counting or matching, plus sorting, a heap, or grouped buckets for the final ranking.

END
 0