ModalVector
vector1 = [1.0, 1.0, 1.0, 2.0, 0.0, 1.0]
vector2 = [4.0, 4.0, 2.0, 4.0, 4.0, 0.0]
vector1.dot(vector2) = 1*4 + 1*4 + 1*2 + 2*4 + 0*4 + 1*0 = 18
Assume you will be doing many many dot products with the same small set of vectors.
class ModalVector:
def __init__(self, vector: list[float]):
...
def dot(self, modalVector: 'ModalVector'):
...
Find the mode on a distributed dataset, assuming an approximately even global data distribution.
Starter code:
import threading
import time
import queue
import random
NUM_NODES = 10
DATASET_SIZE = 1000
DATA = list(range(DATASET_SIZE - 1))
DATA.append(323)
queues = [queue.Queue() for _ in range(NUM_NODES)]
_barrier = threading.Barrier(NUM_NODES)
def read_data(idx):
time.sleep(0.01)
return DATA[idx]
def send(sender_id, target_node_id, value):
message_length = len(str(value))
time.sleep(message_length * 0.005)
queues[target_node_id].put((sender_id, value))
def recv(receiver_id):
received_data = []
while not queues[receiver_id].empty():
sender_id, value = queues[receiver_id].get()
message_length = len(str(value))
time.sleep(message_length * 0.003)
received_data.append(value)
return received_data
def wait_barrier():
_barrier.wait()
# Please fill out the following function.
# Your code must run in under 5 seconds.
# You may only read data using read_data(idx).
def print_mode_distributed(node_id: int) -> None:
...
This problem combines distributed computation and concurrency: each node can only read data through the provided <code>read_data(idx)</code> helper, while sending, receiving, and synchronization all incur time costs. A brute-force global scan is too slow, so the key is to compute local frequency counts first, then combine them efficiently across nodes using message passing and barrier synchronization. Because the dataset is assumed to be roughly evenly distributed and the solution must finish under 5 seconds, the main challenge is minimizing both reads and communication overhead while still identifying the global mode correctly.