Two Sigma VO 面试真题解析:Merge Sensor Data(按时间戳合并传感器数据)

21次阅读
没有评论

Implement a Merger class that accepts a batch of data from each sensor and merges the batches into a single output stream sorted by timestamp.

We have N sensors each producing a time-series of data in the form of (timestamp, data) pairs. To save space, the data emitted by each sensor is delta-encoded, meaning the sensor emits the differences between values rather than the values themselves.

For example, instead of [100, 102, 105, 100], the sensor emits [100, 2, 3, -5]. Notice the first value remains the same. (timestamps omitted for readability)

The sequence of elements outputted by the merger should also be delta-encoded.

The Merger class has two public methods:

  • acceptBatches(batches): Accept N batches of delta-encoded elements where the ith batch corresponds to data from the ith sensor. Data within a batch is sorted by timestamp and delta-encoded. Do not rely on references to the batches after acceptBatches returns.
  • getNextElement(): Return the next (timestamp, data) pair from the merged output sequence. The data should be delta-encoded and sorted by timestamp.

You can assume that acceptBatches will only be called once immediately after the Merger is constructed.

Example usage:

merger := Merger(3)
merger.acceptBatches([
  # sensor 0:
  [(0, 100), (2, 2), (5, 3)],
  # sensor 1:
  [(1, 110), (3, 5)],
  # sensor 2:
  [(4, 115), (7, -10)]
])

assertEquals(merger.getNextElement(), (0, 100))
assertEquals(merger.getNextElement(), (1, 10))
# consume the remaining elements of the merged data

这道题要求实现一个按时间戳合并多路传感器数据的 Merger。每个 sensor 的输入批次内部已经按时间排序,而且数值部分使用 delta encoding 存储:第一条记录保持原值,后续记录保存与前一个值的差。合并时要把所有传感器的数据按时间戳整体归并,并且输出结果也必须继续保持 delta 编码,因此不能直接输出原始值,而要在全局有序序列上逐条重建并再次编码。典型做法是先把每个批次还原成绝对值,再用最小堆按 timestamp 做 k 路归并,最后在输出端维护前一个已输出值计算差分。题目强调批次只调用一次、批次不可依赖引用,适合考察优先队列、归并排序以及增量编码的处理细节。

正文完
 0