You are given datasets containing information about taxi drivers and their rides. Your task is to perform some basic data analysis and save the results to a CSV file.
The data is located in the following CSV files:
drivers.csvrides_1.csvtorides_4.csv
drivers.csv contains:
driver_id(int): Unique driver identifier.age(int): Driver’s age.second_language(str): Driver’s second language. If a driver doesn’t have a second language, the value is"no".rating(float): Driver’s average rating.
rides_i.csv contains:
ride_id(int): Unique ride identifier.driver_id(int): Driver’s identifier.passenger_id(int): Passenger’s identifier.date(str): Date of the ride.status(str): Status of the ride; one of["Rejected by the driver", "Cancelled by the passenger", "Success"].
Your tasks are as follows:
- Calculate the average driver rating.
- Calculate the percentage of drivers with a second language.
- Calculate the ride success rate.
Output requirements:
- Save the results in a CSV file named
analysis_results.csv. - The CSV file should have two columns:
insight_typeandvalue. - Each row corresponds to one of the tasks above, with the
insight_typeas specified and the calculatedvalue. - All numeric values will be considered correct if they match the expected values up to two decimal places.
这道题本质上是一次典型的 Pandas 数据汇总题:先读取 drivers.csv,并把 rides_1.csv 到 rides_4.csv 合并成一个完整的 rides 表,然后分别计算三个统计指标——drivers 表中的平均 rating、second_language 不是 "no" 的司机占比,以及 rides 表中 status 等于 "Success" 的订单成功率。最后按要求输出为 analysis_results.csv,包含 insight_type 和 value 两列。题目重点在于正确合并分表、处理字符串条件筛选,以及按题目要求保留两位小数即可。