Airbnb Interview Question: Missing-Value Imputation Strategy for a Fraud Booking Model

16 Views
No Comments

We’ve provided a labeled training set for a fraudulent booking model.

Each record is a historical Airbnb booking and its corresponding characteristics (or features). These features are:

  • price: dollar amount paid for the reservation
  • nights: number of nights
  • market_avg_price_per_night: average price per night paid in the market
  • past_delta_checkin: days between past reservations by the guest
  • listing_market: market of the listing
  • host_past_nights: number of prior nights hosted by the host
  • ds: date-stamp of the reservation
  • label: whether the reservation is fake (1) or not (0)
  • pred_score: risk score from a model trained on this data set

Question:

Explore the data set and propose a good imputation strategy for missing values.

This Airbnb data-handling question asks candidates to inspect the missing-data pattern and design an imputation strategy that respects feature type, grouping, and leakage risk. A strong answer typically separates numeric and categorical fields, uses group-aware statistics such as market-level or time-aware medians where appropriate, adds missingness indicators, and treats absence as potentially informative for fraud detection. The key is to preserve signal while avoiding unrealistic global fill values and future-data leakage.

END
 0