Making sense of big data – a case for simple maths

There is no denying that we’re in the midst of the age of big data. It’s impossible to ignore the fact that data volumes are growing rapidly into volumes many of us struggle to comprehend. Big data has arrived, and it's only going to get bigger as organizations find ways to collect more data about their customers. With new technologies and products arriving in the market, new data is unrelenting in its pace and volume, being created – and needed – constantly.

From personal relationships to global networks

A few decades ago, businesses may only have known their customers’ names and maybe their address. Businesses acted locally and customer service focused on personal relationships where deals were closed with a handshake.

In the years since, the emergence of the internet as a global web connecting millions of businesses to billions of consumers and leading to both transactions as well as a constant stream of information, interaction and the growth of social networks, the data underpinning all of this has exploded to a massive scale.

Organizations, no matter their size, had to invest in technology solutions to manage their new data assets and many have realized the importance of using their customer, product and market data to gain new insights that could deliver competitive advantages.

How big is big data?

As we move and operate in a digital world where every possible snippet of information is readily available at our fingertips, we often see comments on big data with authors throwing numbers around seemingly haphazardly. Millions and billions of rows of data, megabytes, gigabytes, terabytes and petabytes.

But can we actually comprehend the sheer size of these figures? Especially in relation to one another? Are they meaningless without some context?

I personally struggle to imagine 378 million rows of data, let alone 5 billion. So I thought it would be helpful to clarify how big these numbers are and put them into simple terms.

In the following examples, I compare 1 row of data to commonly used units of measurement relating to time, length and weight to make big data more understandable and relatable.

If 1 row of data were …

1 second

Imagine waking up tomorrow with a brilliant idea that could change the world. Your idea would require some research, funding and development and of course the final product would need to be brought to market to have an effect on our planet. How much time would you need?

Well, if 1 row of data were 1 second, then 1 million rows of data would give you 11 ½ days.

Not enough? Try 1 billion rows of data and you get 32 years…!

Surely that will be enough to change the world?

1 centimeter (0.4 inches)

You're an aviation engineer and want to revolutionize air travel by extending the reach of your airplanes as far as possible.

If 1 row of data were 1 cm, then 1 million rows of data would let your plane travel for 10km (6 mi). Pathetic, isn't it?

Take 1 billion rows of data and you'll get at least 10,000km (6,214 mi). That gets you from London, UK to Lima, Peru.

Let's be adventurous; let's take 5 billion rows of data: 50,000km (31,069 mi). That's quite something. Your plane could fly from London to Tokyo (Japan), from Tokyo to Houston (Texas, US), from Houston to Sydney (Australia), from Sydney to Rio de Janeiro (Brazil) and then finish the trip in Buenos Aires (Argentina) with a cold beer and a platter of empanadas, feeling very pleased with your engineering accomplishments.

That's a win for big data for sure.

100 grams (3.53 oz)

There are quite a few heavy objects in this world and mankind has created incredible structures from stone and steel over the centuries, some of which still make us speechless when we contemplate how they could even be possible...

If 1 row of data were 100g, then 2 million rows of data would amount to 200 metric tons, which equals the weight of the Statue of Liberty.

230 million rows of data would be equivalent to the Titanic. But what about billions of rows of data?

3.5 billion rows of data get you the Empire State Building. Yes, the whole building.

It's easy to get a bit lost in the millions and billions when talking about data simply because these magnitudes are so far from what we can visually and perceptually imagine.

By drawing on the comparisons above, I hope to have given you an easy way to put data volumes into perspective and to help you in your next conversation and decision-making processes around dataset size, data storage, retrieval and load.

Whitepaper: Big Data Science – The future of analytics

Erfahren Sie alles über einen neuen technischen Ansatz für Big Data Science Systeme

play_arrow Jetzt herunterladen

Abonnieren Sie unser Blog