Capital One OA Coding Interview: Taxi Driver Data Analysis

You are given datasets containing information about taxi drivers and their rides. Your task is to perform some basic data analysis and save the results to a CSV file.

The data is located in the following CSV files:

drivers.csv
rides_1.csv to rides_4.csv

drivers.csv contains:

driver_id (int): Unique driver identifier.
age (int): Driver’s age.
second_language (str): Driver’s second language. If a driver doesn’t have a second language, the value is "no".
rating (float): Driver’s average rating.

rides_i.csv contains:

ride_id (int): Unique ride identifier.
driver_id (int): Driver’s identifier.
passenger_id (int): Passenger’s identifier.
date (str): Date of the ride.
status (str): Status of the ride; one of ["Rejected by the driver", "Cancelled by the passenger", "Success"].

Your tasks are as follows:

Calculate the average driver rating.
Calculate the percentage of drivers with a second language.
Calculate the ride success rate.

Output requirements:

Save the results in a CSV file named analysis_results.csv.
The CSV file should have two columns: insight_type and value.
Each row corresponds to one of the tasks above, with the insight_type as specified and the calculated value.
All numeric values will be considered correct if they match the expected values up to two decimal places.

This is a straightforward pandas data aggregation task. Read drivers.csv, combine rides_1.csv through rides_4.csv into one rides table, and compute three metrics: the average driver rating, the percentage of drivers whose second_language is not "no", and the share of rides with status equal to "Success". Then save the results to analysis_results.csv with the required insight_type and value columns. The main points are correct file merging, filtering, and formatting the output as specified.

Post Views: 31

Capital One OA Coding Interview: Taxi Driver Data Analysis

Contact me

Friendly reminder