Big Data: breadcrumbs, revolutions and rebels
To understand Big Data you first need to understand how small it is. Its immensity is the sum of a lot of very small pieces of data - data that we’re leaving behind us as we go about our daily lives. Our ‘data exhaust’ (or what we like to call breadcrumbs) is left every time we interact with technology.
Whenever you pay for a latte, enter a car park, swipe your Oyster card, or perform a web search you create a data record of your interactions. You’re even leaving data breadcrumbs when you walk to the shop (probably without being aware of it) as your mobile phone connects to different cell masts whilst you travel around.
This granular data, a by-product of our daily lives, is what makes Big Data different. It used to be difficult and expensive to capture lots of data, and the information that was captured tended to be limited to solving a specific problem. Now data is captured from an endless array of sources, by default, regardless of how it might be used.
A (tech) Revolution
The ability to generate, capture and store Big Data has been enabled by a technological revolution. When we think of Big Data it’s what first springs to mind; great big rooms of whirring servers with flashing lights. What’s enabled this is that the cost of storing data has fallen significantly from $105,000 per gigabyte in 1983 to just $0.05 in 2013, processing costs have fallen even further from $15 million per gigaFLOP (a clever way to say processing power) in 1984 to $0.12 in 2013. This has meant that not only has data processing become cheaper but also that technology has become commonplace.
And let’s not forget the internet, the connectivity of all this cheap technology allows for the real-time capture and storage of fast quantities of data. The internet unshackles us from having to own and maintain the big “super computers” we’d need to process Big Data, and cloud computing allows us to rent the processing power as and when you need it. Which is great, especially as we are generating gigantic amounts of data. According to Harvard Magazine in the last two years alone more data has been created and stored than in the entirety of prior human history. That’s a pretty big amount!
Enter the Data Rebels
We’ve got the data and the technical means to capture, store and analyse it, but that alone does not make Big Data. The final, and most important, part of the Big Data equation is analysis.
The starting point for analysis is a question, a question that can only be answered by analysing a lot of data. This analysis is conducted by a new breed of maths geek, the Data Scientist (or data rebel for the cooler geek). Data Scientists build complex algorithms to help organisations gain insights from large volumes of messy data. Working for businesses, universities, governments or just for the love of it Data Scientists utilise new analysis tools to crunch these massive datasets.
Their analysis has created insights that have helped companies make more money, changed government policy, improve retention and even saved lives.
The Los Angeles Police Department have been using Big Data to trial a crime prediction system called PredPol. The idea is that it deploys police officers to areas of the city where crimes are likely to be committed. The system analyses years’ worth of historic crime data to predict where crimes are likely to occur, by type, down to a 500 by 500 foot area. Impressively the system predicted twice as much crime as crime analysts and reduced crime by 13% in trial areas. PredPol is now being trialled by Kent Police in the UK.
Harvard Medical School have also been researching uses for Big Data and have discovered that monitoring social media could identify outbreaks of infectious diseases long before they could be detected through formal channels. By analysing tweets following the Haitian earthquake of 2010 it was possible to pinpoint outbreaks of Cholera more than two weeks before they were recorded by the authorities.
The researchers used the HealthMap application to then map the outbreak. This system combines data from tens of thousands of sources, including social media, website and formal health information providers. Updated hourly it gives users a global view of disease outbreaks.
Big Data also has its place in politics. The success of Barack Obama’s 2012 re-election campaign owed a lot to the application of Big Data. Political campaigns have long had access to large datasets detailing voters’ addresses, telephone numbers, demographics and voting intentions. But this data wasn’t particularly useful when trying to contact hard to reach groups, like young people. By combining data from Obama’s massive follower count on social media with traditional demographics the campaign was able to target the right messages at the right voters. Not only to get them to vote, but more importantly to get them to encourage their friends to do the same.
If your interest in Big Data has increased recently, you’re not the only one. This graph shows the increase in Google searches for ‘Big Data’. You may also be interested to know that there has been a movement to open-up and democratise the Big Data that governments and other public bodies produce. In the UK this data has been made available to the public online.