I can remember the first time people started to talk about “Big Data”.
It may be a hackneyed phrase today, but some years ago it took the world of BI and analytics by storm. What then ensued was a mad frenzy to come up with a good definition that encapsulated what was meant by the term. Somehow the expression was meant to symbolize more than data that was big. So, cue the onset of creative people and the letter V. Folks started with the now much overused volume, velocity and variety, but then others set about adding to the list, coming up with a whole host of other v-words, including veracity, visualization, value, variability and so on.
To be fair, it’s a good thing to describe a technical term in a simple manner so that everyone understands it, no matter how much alliteration is used, but in my opinion, the whole thing set off in the wrong direction at the very start. Many put the focus on just the technical side of data, whether it be its size, structure or how often it is updated. This came with the advent of new technologies such as Hadoop or NoSQL systems, where new technical advances meant that data was processed faster (“velocity”), more of it could be stored (“volume”) and structured or unstructured versions could be captured and analyzed (“variety”).
As a result, people started to believe that you could only refer to a Big Data use case if all the described v-words were met, or at least the initial three. And that’s where the shortcomings come to light. For, what about situations where people store and analyze petabytes of just structured data? Or if they analyze vast amounts of unstructured data but just run nightly bulk load jobs? Are they not cases of Big Data, too?
In my opinion, the Big Data discussion focused too much on the technical side of things. It was only later when additional v-words were introduced into the mix that a broader discussion could take place that included non-technical aspects such as “value.” And yet, what on earth does value have in common with data volumes? Does more data equal more value? And where is the connection between velocity and visualization? In fact, is there any real correlation between all these dimensions at all? One that conjures up the perfect Big Data scenario? I doubt it. And always have done.
I believe that the market meant something totally different; it was talking about “digitalization.” These exciting use cases were different because they were brand-new; they were things you had not heard of before. We quickly saw the emergence of new technology trends: mobile devices, geolocation and traffic information generated by millions of GPS-tracking devices, products that included sensors to measure all kinds of interesting things, logistic and production chains where everything could suddenly be tracked accurately. All these new approaches threw up the need for totally new applications for the world of data. Police departments started to predict the probability of crimes, companies started to offer real-time road traffic monitoring, new value-add services such as predictive maintenance came about. In total, completely new data-driven business models were born.
Today, we live in an age where unstructured e-mails and photos are now stored, processed and analyzed, and as a result there are vast data lakes that store vast amounts of bits and bytes, with mobile devices and sensors guaranteeing an almost seamless, uninterrupted flow of data that is captured and processed. But while this is true, ask yourself a simple question: whenever you read an interesting data use case, is it not true that it has to do with the new digitalized era? You’ll soon agree that it does. So, it is for that reason, that I hope we will stop talking about Big Data using the Vs, for no matter how interesting they may sound, they are uncorrelated and too technical. Instead, let’s find a far more suitable and simpler definition for what we have been talking about over the last few years. My personal vote is for D and digitalization.