Pinterest OA Interview Question: Labels | Naive Bayes Classification

Implement the missing code, denoted by ellipses. You may not modify the pre-existing code.

Your task is to implement parts of a Naive Bayes algorithm from scratch (i.e., without importing any libraries or packages beyond numpy or math). As a reminder, Naive Bayes classification is an application of Bayes’ Theorem that predicts the probability that a case, or set of data points, belongs to one or more classes. It comprises four major steps:

Calculate feature descriptive statistics by class.
Calculate prior probabilities.
Implement a Gaussian density function.
Calculate posterior probabilities according to Bayes’ Theorem, and make a prediction, which is the class with the largest predicted probability.

To validate the algorithm implementation, you will need to use it for some classification tasks. Specifically, you will be given a two-dimensional array of float values x_train as training data, where each subarray x_train[i] represents a unique case. You will also be given a one-dimensional array y_train where each element represents the true class label of the corresponding subarray in x_train[i]. In addition, you will be given x_test as test data, with the same format as the training data but without class labels.

Note: It is guaranteed that all training and test data will be float values. Some skeleton code for the algorithm has already been created, so please do not edit them. You should only implement code under the # implement this sections.

Example

For

x_train = [[-2.6, 1.9, 2.0, 1.0],
           [-2.8, 1.7, -1.2, 1.5],
           [2.0, -0.9, 0.3, 2.3],
           [-1.5, -0.1, -1.6, -1.1],
           [-1.0, -0.6, -1.2, -0.7],
           [-0.3, 1.2, 2.6, 0.2],
           [-1.8, -1.3, -0.1, -1.2],
           [0.2, 1.2, -0.6, -1.3],
           [-5.2, 0.3, 0.2, 2.2],
           [-0.8, -0.1, 1.5, -0.1],
           [-2.3, 0.3, 0.8, 0.7],
           [0.2, 3.0, 3.6, -0.9],
           [1.7, -0.8, -0.0, 2.0],
           [2.8, 0.8, 1.8, -0.7]]

y_train = [1, 2, 0, 0, 0, 1, 0, 1, 2, 0, 2, 1, 0, 2]

and

x_test = [[-0.1, 1.4, 0.4, -1.0],
          [-1.3, 0.2, -1.3, -0.8],
          [-1.1, 1.5, -2.3, -2.5]]

the output should be solution(x_train, y_train, x_test) = [1, 0, 1].

The Naive Bayes Classifier should calculate the mean and variance of the features in x_train for each label in y_train (Step 1). This information will be used to calculate the prior probability for each label in Step 2. The Gaussian Density Function (Step 3) and the prior probability estimates will be used to calculate a posterior probability and predicted label for each case in x_test (Step 4).

This Pinterest OA problem asks you to complete a Gaussian Naive Bayes classifier from scratch: compute per-class mean and variance from the training set, derive class priors, evaluate Gaussian likelihoods for each test sample, and choose the label with the highest posterior probability. The key implementation points are class-wise aggregation, numerically stable variance handling, and predicting each test instance independently.

Post Views: 26

Pinterest OA Interview Question: Labels | Naive Bayes Classification

Contact me

Friendly reminder