You are a Backend Engineer at DoorDash and notice that your service can no longer keep up with traffic.
To handle the increased load, the team decides to scale horizontally by adding more pods running your service.
A colleague implemented a traffic router that distributes requests to pods using a round-robin algorithm. However, the implementation does not work correctly, and you are asked to debug it.
The system consists of the following classes:
- TrafficRouter: Handles routing. It takes a list of backend pods.
- Pod: Represents a service pod (hostname + port).
- Backend: Represents a pod with its state (
AVAILABLE,UNAVAILABLE). - HttpRequest: Represents an incoming HTTP request. All requests from the same user share the same request ID.
Methods you must debug or implement
TrafficRouter.get_backend(request)
Returns the next available pod (round robin). If all pods are unavailable, returnsNone.TrafficRouter.reportAvailability(backend, state)
Update pod availability.TrafficRouter.add_backend(backend)
Add a new backend pod at runtime.
Follow-up: Optimize using Consistent Hashing
After debugging the above implementation, you are told:
The service is stateful and keeps in-memory user sessions inside each pod.
To improve session cache hit rate, requests from the same user should ideally hit the same pod.
You must optimize the router using consistent hashing.
Consistent hashing is defined as:
- Create a hash function that maps request IDs to a numeric range.
- Create a ring structure representing that numeric space.
- Assign pods to positions on the ring.
- Route a request by hashing its ID and choosing the next clockwise pod.
- If a pod becomes unavailable, its range is taken over by its next neighbors.
- When a new pod is added, only a small portion of keys are remapped.
Original Code Snippets (for debugging)
class HttpRequest:
def __init__(self, id):
self.id = id
class Pod:
def __init__(self, hostname, port):
self.hostname = hostname
self.port = port
class Backend:
def __init__(self, pod, state):
self.pod = pod
self.state = state
def get_state(self):
return self.state
def set_state(self, updated_state):
self.state = updated_state
class TrafficRouter:
def __init__(self, backends):
self.backends = list(backends)
def get_backend(self, request):
index = 0
backend = self.backends[index]
if backend.state == "AVAILABL":
return backend
index += 1
return None
Unit test examples were provided to verify:
- Round robin works
- Unavailable pods are skipped
- Adding new pods updates routing behavior
这道 DoorDash VO 是非常典型的 后端 + 分布式系统 + Debug 综合题 。
它考察两个核心点:
✅ 第一部分:调试负载均衡器(Round Robin)
题目给你的代码几乎是“故意写坏的”:
- index 每次都从 0 开始
"AVAILABLE"写成"AVAILABL"- 完全没有 round robin
- 没有跳过 UNAVAILABLE
- add_backend 直接覆盖 list
- 整体逻辑和预期严重不符
这部分主要是测试你:
- 能不能快速看出逻辑错误
- 能不能从测试用例推导出正确行为
- 能不能把“正确版本”的负载均衡逻辑讲清楚
面试官会在意你怎么推理,而不是你敲的代码。
✅ 第二部分:系统设计升级 — 一致性哈希(Consistent Hashing)
因为服务是 有状态的(session 保存在 pod 内),
所以 round robin 会导致:
- 用户 session 丢失
- 缓存命中率低
- 跨 pod 的 session 重建 → 性能开销巨大
于是你需要:
✅ 如何解释一致性哈希的原理?
- 把哈希空间画成一个圆
- pod 在圆上按照 hash 值被放置
- request 通过 hash(request.id) 找到下一个顺时针节点
- 节点挂掉 → 只影响它的范围
- 新节点加入 → 只重定向部分用户
这是 Kubernetes、Envoy、Nginx upstream、Redis cluster 都在用的方案。
面试官要听的就是你是否:
- 能清楚解释为什么这么做
- 能说明什么数据会被重定向
- 能解释 virtual nodes 的必要性
- 能把复杂系统讲得通俗易懂
VOprep 团队长期陪同学员实战各类大厂 OA 与 VO,包括 Doordash、Google、Amazon、Citadel、SIG 等,提供实时答案助攻、远程陪练与面试节奏提醒,帮助大家在关键时刻不卡壳。
如果你也在准备 Tiktok 或类似工程向公司,可以了解一下我们的定制助攻方案——从编程面到系统设计,全程护航上岸。