There are some things that are so big that they have implications for everyone's life, whether we want it or not. And Big Data is one of those mega trends that will impact everyone in one way or another. The name (which by the way I don't like) might sound a bit techie or boring but believe me, it is not. With this post I want to explain what's behind this mega buzzword and outline why it will impact everyone.
The basic idea behind the phrase 'Big Data' is that everything we do in our lives is (or will soon) leave a digital trace (or data), which we (and others) can use and analyze. The advances in capturing and analyzing big data allow us to decode human DNA in minutes, find cures for cancer, accurately predict human behavior, foil terrorist attacks, pinpoint marketing efforts, prevent diseases and so much more. And like most things, it can be used for good or evil, but more on that later.
Basically, big data refers to our ability to collect and analyze the vast amounts of data we are now generating in the world. The ability to harness the ever-expanding amounts of data is completely transforming our ability to understand the world and everything within it. You might ask: So what is new here? Haven’t companies and organizations captured and analyzed data for a long time? Yes, but there are two things that are changing at the moment and are making the phenomenon of ‘big data’ real:
- The rate at which we are generating new data is frightening - I call this the ‘datafication’ of our world.
- Our ability to analyze large and complex forms of data has been transformed in recent years.
The Complete Datafication of Our World
All activities (human or otherwise) will soon leave a digital trace (which can be a scary thought):
- We increasingly leave digital records of our conversations: Emails are stored in corporate systems, our social media up-dates are filed and phone conversations are digitalized and stored.
- More and more of our activities are digitally recorded: Most things we do in our digitalized world leave a data trail. For example, our bowser logs what we are searching for and what websites we visit, websites log how we click through them, as well as what and when we buy, share or like something. When we read digital books or listen to digital music the devices will collect (and share) data on what we are reading and listening to and how often we do so. And when we make payments using credit or payment cards the transactions are logged.
- Most photos and videos are now digitally captured and stored. Just think of the millions of hours of CCTV footage captured every day. In addition, we take more videos on our smart phones and digital cameras leading to around 100 hours of videos being up-loaded to YouTube every minute and something like 200,000 photos added to Facebook every 60 seconds.
- We generate data using the ever-growing amounts of smart devices and sensors: Our smart phones track the location of where we are and how fast we are moving, there are sensors in our oceans to track temperatures and currents, there are sensors in our cars that monitor our driving, there are sensors on packaging and pallets that track goods as they are shipped along supply chains. Smart watches, Google Glass and pedometers collect data. For example I wear an Up band that tells me how many steps I have taken, the calories I have burnt each day as well as how well I have slept each night, etc. Many devices are now internet-enabled so that they self-generate and share data. Smart TVs and set-top-boxes, for example, are able to track what you are watching, for how long and even detect how many people sit in front of the TV.
I am sure you are getting the point. The volume of data is growing at a freighting rate. Google’s executive chairman Eric Schmidt brings it to a point: “From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating.”
So yes, we are generating unimaginable amounts of data. The other thing that has changed is that are now able to analyse more complex types of data such as digital phone records of conversations, video and photo images and conversation. In the world of ‘Big Data’ we talk about the 4 Vs that characterize big data:
- Volume – the vast amounts of data generated every second
- Velocity – the speed at which new data is generated and moves around (credit card fraud detection is a good example where millions of transactions are checked for unusual patterns in almost real time)
- Variety – the increasingly different types of data (from financial data to social media feeds, from photos to sensor data, from video capture to voice recordings)
- Veracity – the messiness of the data (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech)
So, we have a lot more data than ever before, in more complex formats, that are often fast moving and of varying quality – why would that change the world? The difference is that we now have tools that allow us to analyze vast amounts of data by breaking the task of processing very large data sets down into smaller tasks that are run in parallel using a large cluster of computers. Here are some real-life examples of how big data is used today:
- The FBI is combining data from social media, CCTV cameras, phone calls and texts to track down criminals and predict the next terrorist attack.
- Supermarkets are combining their loyalty card data with social media information to detect and leverage changing buying patterns. For example, it is easy for retailers to predict that a woman is pregnant simply based on the changing buying patterns. This allows them to target pregnant women with promotions for baby related goods.
- Facebook is using face recognition tools to compare the photos you have up-loaded with those of others to find potential friends of yours (see my post on how Facebook is exploiting your private information using big data tools).
- Politicians are using social media analytics to determine where they have to campaign the hardest to win the next election.
- Video analytics and sensor data of Baseball or Football games is used to improve performance of players and teams. For example, you can now buy a baseball with over 200 sensors in it that will give you detailed feedback on how to improve your game.
- Artists like Lady Gaga are using data of our listening preferences and sequences to determine the most popular playlist for her live gigs.
- Google’s self-driving car is analyzing a gigantic amount of data from sensor and cameras in real time to stay on the road safely.
- The GPS information on where our phone is and how fast it is moving is now used to provide live traffic up-dates.
- Companies are using sentiment analysis of Facebook and Twitter posts to determine and predict sales volume and brand equity.
- A hospital unit that looks after premature and sick babies is generating a live steam of every heartbeat. It then analyses the data to identify patterns. Based on the analysis the system can now detect infections 24hrs before the baby would show any visible symptoms, which allows early intervention and treatment.
Finally, no discussion about Big Data could be complete without mentioning the increasing concerns about privacy. Many concerns have been expressed about how retailers, credit card companies, search engine providers and mail or social media companies use our private information. However, the privacy concerns around big data started to explode with the revelations by Edward Snowden on how the U.S. National Security Agency (NSA) collects and analyses big data including the phone records and social media activities of millions of Americans. But because this is another massive issue in its own right I will address this in a future post.
As always, please let me know your thoughts on the topic. Do you find it frightening or exciting? Do you see business opportunities or ‘Big Brother’?
By Bernard Marr