Capital One OA 面试真题解析:Taxi Driver Data Analysis 数据分析题

18次阅读
没有评论

You are given datasets containing information about taxi drivers and their rides. Your task is to perform some basic data analysis and save the results to a CSV file.

The data is located in the following CSV files:

  • drivers.csv
  • rides_1.csv to rides_4.csv

drivers.csv contains:

  • driver_id (int): Unique driver identifier.
  • age (int): Driver’s age.
  • second_language (str): Driver’s second language. If a driver doesn’t have a second language, the value is "no".
  • rating (float): Driver’s average rating.

rides_i.csv contains:

  • ride_id (int): Unique ride identifier.
  • driver_id (int): Driver’s identifier.
  • passenger_id (int): Passenger’s identifier.
  • date (str): Date of the ride.
  • status (str): Status of the ride; one of ["Rejected by the driver", "Cancelled by the passenger", "Success"].

Your tasks are as follows:

  • Calculate the average driver rating.
  • Calculate the percentage of drivers with a second language.
  • Calculate the ride success rate.

Output requirements:

  • Save the results in a CSV file named analysis_results.csv.
  • The CSV file should have two columns: insight_type and value.
  • Each row corresponds to one of the tasks above, with the insight_type as specified and the calculated value.
  • All numeric values will be considered correct if they match the expected values up to two decimal places.

这道题本质上是一次典型的 Pandas 数据汇总题:先读取 drivers.csv,并把 rides_1.csv 到 rides_4.csv 合并成一个完整的 rides 表,然后分别计算三个统计指标——drivers 表中的平均 rating、second_language 不是 "no" 的司机占比,以及 rides 表中 status 等于 "Success" 的订单成功率。最后按要求输出为 analysis_results.csv,包含 insight_type 和 value 两列。题目重点在于正确合并分表、处理字符串条件筛选,以及按题目要求保留两位小数即可。

正文完
 0