The Deal with Data

Since I decided to insert myself into the conversation surrounding #ubergate, I’ve been in some interesting conversations. As a carless urban dweller in a city with minimal transit structure, I am especially sensitive to the choices available to me; as an advisor to several startups that aim to use data to improve society, I am particularly concerned about using it well; and as a hacker with plans to examine real transit data in the near future, this is a topic I cannot ignore.

Data is an interesting beast. It gets a lot of credit these days and is often held up as the best way for us to make more money, improve our government, and maybe even solve big community problems like homelessness. Those things are often true.

However, data is not somehow objective in a way that other information is not. Which information we gather, how we categorize and aggregate it, and why we interpret it the way we do are all based on human reasoning, which is often quite subjective–dangerously so. Just like lawyers and journalists, data scientists must approach information with a healthy dose of skepticism and an awareness of their own assumptions if they are to discover the truth.

Hence my morbid fascination with this #uberdata blog post from 2.5 years ago. In it, the author handily inserts a whopping assumption into user data: Riders who take Uber one way between 10 and 2, and the other way between 2 and 6 from the same location, have just had sex.

Yes. You read that right. They call this “Rides of Glory,” in an attempt to brand it as an empowering, grown-up version of the college “Walk of Shame” (which, FTR, in common usage refers primarily to women–who, unlike the men they sleep with, are told they should be ashamed of having sex–but that’s another story).

Now, I could be wrong. Back in 2012, the majority of Uber users could have been college students whose life experiences mirrored the stories Uber’s data team told in their minds. But that’s just it: The fact that Uber’s data team looked at this data, and “Rides of Glory” was the story they chose to tell, says a lot about who is on their team and what kind of culture the company embraces.

More to the point, it also exposes biases that are as common in data science as they are in law enforcement and political speeches. For example, there appears to be quite a bit of ignorance surrounding alternative options at those hours. Unlike NYC, which has a 24/7 subway system and ample taxi service in a densely populated metro area, Boston’s T stops running at around 1:00 am and spread-out Seattle has no subway. It’s no mystery as to why ridership in these cities between 2 and 6 would be higher than NYC, and it has nothing to do with whether Bostonians are hornier than New Yorkers.

Data is powerful. When used by people who understand the context, it can be an incredible tool for discovering system-wide efficiencies, making a case for approaches that demand an informed shift to established practices, and exposing problems that we might otherwise ignore. I believe that open data, when examined with perspective from subject matter experts and people with a range of biases, can help us build a better world.

But data is no more immune to manipulation than witness testimony or any other form of storytelling. I want to see it, and the conclusions it leads us to, in the hands of critical thinkers who are exposed to diverse ways of looking at the same information. In that area, Uber clearly has a long way to go.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s