Tesla Interview — Troubleshoot Linux Service Throughput Drop (1000 rps → 2 rps) | Performance Debugging | Linux Profiling Playbook

44 Views
No Comments

An application running on Linux used to process 1000 requests/second. Now it only processes ~2 requests/second.
How would you troubleshoot this issue?


A. Frame the problem (1–2 min)

  • Confirm scope: single host vs. fleet, since when, reproducibility, recent deploy/config/infra changes.
  • Check golden signals: RPS / latency / errors / saturation (dashboards, logs).
  • Validate external deps: DB, cache, message queue, third-party APIs.

B. Quick triage (5 min, command set)

  • Resource health: uptime, top/htop, free -m, df -h -i, vmstat 1, iostat -xz 1, sar, dstat, pidstat -udr 1.
  • Network: ss -s, ss -tan state established, netstat -s, ethtool -S, ifconfig drops, tcptraceroute, dig (DNS).
  • Limits & throttling: ulimit -a, /proc/sys/fs/file-max, sysctl net.core.somaxconn, net.ipv4.ip_local_port_range, nf_conntrack, cgroup CPU/mem limits, dmesg (OOM, throttling).
  • App process: restart count, crash loops, thread pool sizes, queue backlog, GC (JVM: jstat, Go pprof, Python GIL hot path).

C. Locate the bottleneck

  • CPU bound? High CPU → sample: perf top/record, perf sched, flamegraph (pprof). Look for spin locks, regex, JSON, crypto, logging.
  • IO bound? High iowait in vmstat / iostat busy disks; check fsync/small writes, rotated logs, disk full/inodes.
  • Network bound? Retransmits/drops, SYN backlog, TIME_WAIT storm, DNS slowness, TLS renegotiation, NIC duplex/MTU.
  • Lock/contention: strace -f -p <pid>, lsof -p, pstack, runtime metrics (mutex wait, goroutine/thread counts).
  • Dependency latency: DB slow queries, exhausted pool, cache miss storm, queue lag; compare service side vs. dependency side metrics.

D. Common instant fixes

  • Revert last deploy/flag; increase file descriptors / somaxconn / backlog; widen ephemeral port range; fix DNS resolver; clear disk or move logs; correct thread/db pool sizing; rollback kernel change; disable verbose logging; warm caches.

E. Verify & prevent (close loop)

  • Add SLOs + alerts; capture RED/USE metrics; keep perf/pprof profiles; chaos test; runbooks.

Answer pattern (talk track): start with measurement → hypothesize → isolate layer → confirm with tool → remediate → verify. Showcase concrete commands and what each confirms.

The VOprep team has long accompanied candidates through various major company OAs and VOs, including Tesla, Google, Amazon, Citadel, SIG, providing real-time voice assistance, remote practice, and interview pacing reminders to help you stay smooth during critical moments. If you are preparing for these companies, you can check out our customized support plans—from coding interviews to system design, we offer full guidance to help you succeed.

END
 0