You are given datasets containing information about taxi drivers and their rides. Your task is to perform some basic data analysis and save the results to a CSV file.
The data is located in the following CSV files:
drivers.csvrides_1.csvtorides_4.csv
drivers.csv contains:
driver_id(int): Unique driver identifier.age(int): Driver’s age.second_language(str): Driver’s second language. If a driver doesn’t have a second language, the value is"no".rating(float): Driver’s average rating.
rides_i.csv contains:
ride_id(int): Unique ride identifier.driver_id(int): Driver’s identifier.passenger_id(int): Passenger’s identifier.date(str): Date of the ride.status(str): Status of the ride; one of["Rejected by the driver", "Cancelled by the passenger", "Success"].
Your tasks are as follows:
- Calculate the average driver rating.
- Calculate the percentage of drivers with a second language.
- Calculate the ride success rate.
Output requirements:
- Save the results in a CSV file named
analysis_results.csv. - The CSV file should have two columns:
insight_typeandvalue. - Each row corresponds to one of the tasks above, with the
insight_typeas specified and the calculatedvalue. - All numeric values will be considered correct if they match the expected values up to two decimal places.
This is a straightforward pandas data aggregation task. Read drivers.csv, combine rides_1.csv through rides_4.csv into one rides table, and compute three metrics: the average driver rating, the percentage of drivers whose second_language is not "no", and the share of rides with status equal to "Success". Then save the results to analysis_results.csv with the required insight_type and value columns. The main points are correct file merging, filtering, and formatting the output as specified.