Capital One OA Coding Interview: Taxi Driver Data Analysis

19 Views
No Comments

You are given datasets containing information about taxi drivers and their rides. Your task is to perform some basic data analysis and save the results to a CSV file.

The data is located in the following CSV files:

  • drivers.csv
  • rides_1.csv to rides_4.csv

drivers.csv contains:

  • driver_id (int): Unique driver identifier.
  • age (int): Driver’s age.
  • second_language (str): Driver’s second language. If a driver doesn’t have a second language, the value is "no".
  • rating (float): Driver’s average rating.

rides_i.csv contains:

  • ride_id (int): Unique ride identifier.
  • driver_id (int): Driver’s identifier.
  • passenger_id (int): Passenger’s identifier.
  • date (str): Date of the ride.
  • status (str): Status of the ride; one of ["Rejected by the driver", "Cancelled by the passenger", "Success"].

Your tasks are as follows:

  • Calculate the average driver rating.
  • Calculate the percentage of drivers with a second language.
  • Calculate the ride success rate.

Output requirements:

  • Save the results in a CSV file named analysis_results.csv.
  • The CSV file should have two columns: insight_type and value.
  • Each row corresponds to one of the tasks above, with the insight_type as specified and the calculated value.
  • All numeric values will be considered correct if they match the expected values up to two decimal places.

This is a straightforward pandas data aggregation task. Read drivers.csv, combine rides_1.csv through rides_4.csv into one rides table, and compute three metrics: the average driver rating, the percentage of drivers whose second_language is not "no", and the share of rides with status equal to "Success". Then save the results to analysis_results.csv with the required insight_type and value columns. The main points are correct file merging, filtering, and formatting the output as specified.

END
 0