Two Sigma VO Interview Question: Merge Sensor Data

18 Views
No Comments

Implement a Merger class that accepts a batch of data from each sensor and merges the batches into a single output stream sorted by timestamp.

We have N sensors each producing a time-series of data in the form of (timestamp, data) pairs. To save space, the data emitted by each sensor is delta-encoded, meaning the sensor emits the differences between values rather than the values themselves.

For example, instead of [100, 102, 105, 100], the sensor emits [100, 2, 3, -5]. Notice the first value remains the same. (timestamps omitted for readability)

The sequence of elements outputted by the merger should also be delta-encoded.

The Merger class has two public methods:

  • acceptBatches(batches): Accept N batches of delta-encoded elements where the ith batch corresponds to data from the ith sensor. Data within a batch is sorted by timestamp and delta-encoded. Do not rely on references to the batches after acceptBatches returns.
  • getNextElement(): Return the next (timestamp, data) pair from the merged output sequence. The data should be delta-encoded and sorted by timestamp.

You can assume that acceptBatches will only be called once immediately after the Merger is constructed.

Example usage:

merger := Merger(3)
merger.acceptBatches([
  # sensor 0:
  [(0, 100), (2, 2), (5, 3)],
  # sensor 1:
  [(1, 110), (3, 5)],
  # sensor 2:
  [(4, 115), (7, -10)]
])

assertEquals(merger.getNextElement(), (0, 100))
assertEquals(merger.getNextElement(), (1, 10))
# consume the remaining elements of the merged data

This problem combines k-way merging with delta encoding. Each sensor batch is already sorted by timestamp, but its data values are stored as differences from the previous value. A clean solution is to reconstruct each batch into absolute values, merge all streams by timestamp using a min-heap, and then re-encode the merged output as deltas before returning each element. The key is to preserve global time order while carefully handling the difference between absolute values and delta-encoded values.

END
 0