ModalVector
vector1 = [1.0, 1.0, 1.0, 2.0, 0.0, 1.0]
vector2 = [4.0, 4.0, 2.0, 4.0, 4.0, 0.0]
vector1.dot(vector2) = 1*4 + 1*4 + 1*2 + 2*4 + 0*4 + 1*0 = 18
Assume you will be doing many many dot products with the same small set of vectors.
class ModalVector:
def __init__(self, vector: list[float]):
...
def dot(self, modalVector: 'ModalVector'):
...
Find the mode on a distributed dataset, assuming an approximately even global data distribution.
Starter code:
import threading
import time
import queue
import random
NUM_NODES = 10
DATASET_SIZE = 1000
DATA = list(range(DATASET_SIZE - 1))
DATA.append(323)
queues = [queue.Queue() for _ in range(NUM_NODES)]
_barrier = threading.Barrier(NUM_NODES)
def read_data(idx):
time.sleep(0.01)
return DATA[idx]
def send(sender_id, target_node_id, value):
message_length = len(str(value))
time.sleep(message_length * 0.005)
queues[target_node_id].put((sender_id, value))
def recv(receiver_id):
received_data = []
while not queues[receiver_id].empty():
sender_id, value = queues[receiver_id].get()
message_length = len(str(value))
time.sleep(message_length * 0.003)
received_data.append(value)
return received_data
def wait_barrier():
_barrier.wait()
# Please fill out the following function.
# Your code must run in under 5 seconds.
# You may only read data using read_data(idx).
def print_mode_distributed(node_id: int) -> None:
...
这道题结合了分布式计算和并发通信:每个节点只能通过给定的 <code>read_data(idx)</code> 读取局部数据,并且发送、接收消息、等待屏障都带有时间开销,因此不能用朴素的“全量读完再统计”思路。核心做法通常是让每个线程先在自己的数据分片上做局部频次统计,再通过消息传递把候选值或局部计数汇总到少数节点,最终得到全局众数。由于题目强调数据分布大致均匀、运行时间必须低于 5 秒,解法重点在于减少读数据次数和跨节点通信量,选择合适的聚合策略比单纯写出正确统计更重要。