Big Bad Detective Data

“Yes, Watson, there are good reasons to suspect that there has been a substitution of lodgers.”

“But for what possible end?”

“Ah! There lies our problem.”

– Sherlock Holmes & Watson, in “The Adventure of the Red Circle,” Sir Arthur Conan Doyle

As a kid, I worshipped Sherlock Holmes. What set him apart from other detectives was more than his fine sense of smell, attention to detail, and encyclopedic knowledge of the world around him: It was his deep understanding of people.

Any old hack can string together a series of clues and make up a storyline of best fit. The human mind is wired to seek patterns even where there are none. It’s a lot harder to hold space for the blank parts and ask the right follow-on questions. Only a skilled observer of the human spirit–one with an open mind and the emotional capacity to challenge his own assumptions–will  understand why we do what we do.

Last night, during a Town Hall discussion, Internet ethics professor Irina Raicu talked extensively about big data–who is collecting it (private companies and government entities, largely unregulated), whose data is collected (people with smartphones, Internet-driven lifestyles, and cash to spend) and what decisions they are allowed to make with it (our creditworthiness, the cost of our health insurance, even how long we might be sentenced for a crime).

But the most dangerous part of big data has nothing to do with the fact that it is being collected. After all, the capacity to collect and observe data has led us to major breakthroughs in science and health, and it can do the same for many other social problems. What concerns me is the assumption that data is somehow neutral, making all the predictions and conclusions we make based on them perfect and fair.

What I have found instead is that most of us are blind to our own assumptions about others, and that we become even more blind when we can back up our case with numbers. As I wrote last month about Uber’s controversial “Rides of Glory” post, it’s very easy to come to flawed conclusions about people when we assume that they make decisions for the same reasons we do.

A silly post by Uber may damage our collective data literacy, but it’s not the worst thing that can happen. Unless they start releasing information about each individual rider, the raw data they collected about ridership at certain hours can be useful for optimizing business performance. If they were now obligated to release that data, sans personally identifiable information (PII), it could be used to disrupt Uber’s move toward monopoly and hold the company accountable for its surge pricing.

However, there are cases in which poor analysis has serious consequences for individuals and society alike.

Take, for example, the story of  Umar Farouk Abdulmattab, aka “the underwear bomber.” He had been denied a visa to the United States twice, in face-to-face interviews with U.S. Consuls who knew something was up. But the second time, a supervisor, looking solely at the electronic data in his application, which indicated that his father was a prominent banker, overturned the decision. That decision could very well have cost lives.

As we talk about big data, I try to remember that there is really no such thing. There are big databases, but all they contain are millions of individual data points. Behind each of those data points is a host of subjective judgments that went into the tool or algorithm used to collect it. More importantly, as we sit down to analyze that data and make decisions based on it that impact people’s lives, we bring our own assumptions to the table as well.

Unfortunately, none of us is as brilliant an analyst as the fictional Sherlock Holmes. If we want to understand some higher truth inside the numbers, we have to invite critical thinkers with diverse perspectives into the discussion.

We need more diversity inside governments, companies, and organizations that make decisions based on data.

We need more of that data to be open so that we can scrutinize it and challenge those assumptions.

And, quite frankly, we need more targeted interventions along the lines of Hack to End Homelessness, where we connect well-meaning data analysts with the subject matter experts who have the empathy and capacity for nuanced interpretation to make sure that analysis gets done right.

In other words, if we are going to solve the major mysteries of our time, we need more than the clues themselves. We need to turn ourselves into assumption-challenging machines–less spreadsheet, more Sherlock Holmes.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s