Fireworksai / VO 面试真题解析:Modal Vector

22次阅读
没有评论

ModalVector

vector1 = [1.0, 1.0, 1.0, 2.0, 0.0, 1.0]

vector2 = [4.0, 4.0, 2.0, 4.0, 4.0, 0.0]

vector1.dot(vector2) = 1*4 + 1*4 + 1*2 + 2*4 + 0*4 + 1*0 = 18

Assume you will be doing many many dot products with the same small set of vectors.

class ModalVector:
    def __init__(self, vector: list[float]):
        ...

    def dot(self, modalVector: 'ModalVector'):
        ...

Find the mode on a distributed dataset, assuming an approximately even global data distribution.

Starter code:

import threading
import time
import queue
import random

NUM_NODES = 10
DATASET_SIZE = 1000
DATA = list(range(DATASET_SIZE - 1))
DATA.append(323)

queues = [queue.Queue() for _ in range(NUM_NODES)]
_barrier = threading.Barrier(NUM_NODES)

def read_data(idx):
    time.sleep(0.01)
    return DATA[idx]


def send(sender_id, target_node_id, value):
    message_length = len(str(value))
    time.sleep(message_length * 0.005)
    queues[target_node_id].put((sender_id, value))


def recv(receiver_id):
    received_data = []
    while not queues[receiver_id].empty():
        sender_id, value = queues[receiver_id].get()
        message_length = len(str(value))
        time.sleep(message_length * 0.003)
        received_data.append(value)
    return received_data


def wait_barrier():
    _barrier.wait()

# Please fill out the following function.
# Your code must run in under 5 seconds.
# You may only read data using read_data(idx).
def print_mode_distributed(node_id: int) -> None:
    ...

这道题结合了分布式计算和并发通信:每个节点只能通过给定的 <code>read_data(idx)</code> 读取局部数据,并且发送、接收消息、等待屏障都带有时间开销,因此不能用朴素的“全量读完再统计”思路。核心做法通常是让每个线程先在自己的数据分片上做局部频次统计,再通过消息传递把候选值或局部计数汇总到少数节点,最终得到全局众数。由于题目强调数据分布大致均匀、运行时间必须低于 5 秒,解法重点在于减少读数据次数和跨节点通信量,选择合适的聚合策略比单纯写出正确统计更重要。

正文完
 0