Some highlights from an HBR article: The Hidden Biases in Big Data
These days the business and management science worlds are focused on how large datasets can decode consumers’ behavior patterns … enabling marketers to laser-target high potential prospects with finely-honed messages, offers, and “attention”.
“Big data” … becomes problematic when it adheres to “data fundamentalism” … the notion that correlation always indicates causation, and that massive data sets and predictive analytics always reflect objective truth … that “with enough data, the numbers speak for themselves.”
Big data has hidden biases in both collection methods and analysis …
For example, there is “signal bias” … some sub-populations may be over-represented and some under-represented … e.g. during Hurricane Sandy, tweet counts would suggest that the biggest problems were in Manhattan not the Jersey Shore.
Because shore residents didn’t have power to recharge cell phones and had more important things to do than tweet.
In other words, there was a “signal problem”: Data are assumed to accurately reflect the social world, but there are significant gaps, with little or no signal coming from particular communities.
If you rely on big data’s numbers to speak for themselves, you risk misunderstanding the results and in turn misallocating important resources.
imagine the substantial problems if FEMA had relied solely upon tweets about Sandy to allocate disaster relief aid.
With every big data set … ask which people are excluded. Which places are less visible?
= = = = =
How to address these weaknesses in big data science?
By complementing data sources with rigorous qualitative research.
Data scientists should take a page from social scientists … who ask where the data they’re working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation
Longer term … bring together big data approaches with small data studies — computational social science with traditional qualitative methods.
By combining methods such as ethnography with analytics … you can add depth to the data you collect … getting a much richer sense of the world when you ask people the why and the how not just the “how many” … moving move from the focus on merely “big” data towards something more three-dimensional: data with depth.