Google VO 面试真题解析:Run Tests|并行测试与二分查找

15次阅读
没有评论

Run Tests

Background

Often at Google, there are tests that are resource intensive and long-running, which makes them difficult to use as pre-submit tests. However, these tests can still catch problems, so we often run them hourly. During that hour, many code changes are submitted across the Google codebase, so there’s a chance the tests will pass one hour and fail the next. Once the tests fail, it’s important to find which code change was the culprit. That’s where this problem begins.

The problem

To represent the code changes, you are given an array of objects that have two class methods:

  • run_tests() returns the result of running the tests on the object and returns a boolean.
  • get_name() returns a string unique identifier of the object.

The result of run_tests() represents whether or not the tests passed and will be either true (tests passed) or false (tests failed). The first object will have passing tests and the last will have failing tests. Going from the first object of the array to the last object of the array, the objects will pass the tests until one fails; after which all others will fail the tests.

Find and return the name of the first object that fails the tests.

Now, we know the function run_tests() takes a long time to run, so we don’t want to call it frequently. However, we have a new function called run_tests_parallel() that takes in the code_changes list and a constant m (where m is much less than the number of code changes) and runs their run_tests() in a list and returns their test results.

run_tests_parallel(code_changes, indices_to_check)
code_changes = code_changes list from function parameter
indices_to_check = list of integers of at most size m
returns = list of booleans of size len(indices_to_check)

How would you reimplement your function to take advantage of the ability to process these objects in parallel?

First, how do you pick the m indices that you want to pass in? Think first of what you’d use when m = 3 and n = 100.

def find_first_false(code_changes, m):

这道题的核心是:测试结果在数组中呈现“前面全通过、后面全失败”的单调结构,因此本质上是寻找第一个为 false 的位置。由于单次 run_tests() 很慢,不能逐个调用,题目额外提供了 run_tests_parallel(),允许一次并行检查最多 m 个对象。正确思路通常是结合分段并行采样和二分定位:先用并行方式在若干关键位置探测通过 / 失败边界,再把边界缩小到一个更小区间,最后在区间内继续并行或顺序查找,直到找到第一个失败对象并返回它的 name。

正文完
 0