Recently I worked on the data challenge by Uber. The goal is to predict whether a driver will take a first trip after signing up. I am showing some of visualizations I came up with explore the dataset.
I was given a list of signup events. Each signup contains the information of signup date, signup os (android, ios, etc.), city name, vehicle make, vehicle year, first trip date (if happened), and so on.
In aim to get insights on whether a driver will take a first trip, I first put the signups to three subgroups according to cities. For starters, I show whether a driver took a first trip versus the signup date and vehicle made. The result is summarized in the above figure. Major takeaways of the figure include:
-- Drivers from Wrouver have newer vehicles than those in other cities.
-- Many old vehicles (year <= 2000) did not take a first trip.
-- Otherwise, first trip event does not strongly depend on city and signup date.
The dataset provides dates of a number of events including 'signup', 'background check', 'add vehicle information', and 'first trip'. To get a sense of the time series of these events, I made the above figure. The different colors of the boxplots represent those who took a first trip or not.
What I learned from the time series is that drivers who immediately received the background check and filled out their vehicle information are more likely to take a first trip. Of course, those who respond slower (blue box-and-whiskers) may take a first trip at a later time (recommend further study to confirm). This becomes an important insight as if Uber follows up with a potential driver, then he/she are more prone to take actions.
The complete R notebook of this work can be found here. If you have any questions or concerns, you can reach out to me on my homepage.