Definition of Hadoop

What does is it mean and how is it used

What is a Hadoop?

Hadoop is an open-source framework written in Java for storing and processing large data sets across clusters of computers, using off-the-shelf hardware. Hadoop is often used to describe the ecosystem of different modules and has almost become synonymous with the term ‘Big Data.’

Why use Hadoop?

Hadoop saves time by giving you a single platform for analytics as you can use it to get valuable data analytics from any source. It also has a variety of functions such as data warehousing, fraud detection, and market campaign analysis. It’s relatively cheap to set up and it’s easy to make multiple copies of your data.

On the downside, real-time analytics can be slow – especially if your data is growing as all those copies will start to slow down your systems. And with GDPR you might want to look at security – although Java is a common language now, if you put all your sensitive data in this language it is easy for cyber criminals to hack.

Did you know?

Doug Cutting, one of the two creators of Hadoop, named it after his son’s stuffed toy elephant. 

Hadoop is based on a research paper Google released in late 2004 describing the MapReduce model they were using internally at the time for their search engine.

Latest Hadoop Insights

Hadoop - Use Case

Turning a Hadoop-based data lake into a system which can cope with your quickly-growing data volumes and increasing analytics requirements can be challenging. Read further to explore how it can be leveraged to steer your business.

Read our use case on Hadoop >

Analyzing Big Data with Hadoop and Exasol

Hadoop is able to store massive amounts of data on cost-effective, commodity hardware. While it excels at storing massive amounts of data it has trouble with real-time data analysis. Read on to learn how the limitations of Hadoop can be amended to make it fitting to a growing organization’s analytics needs.

Read more on Hadoop and Exasol >

Big Data Analytics: Apache Spark or Exasol – why not both?

Spark is suitable to complement Hadoop as well as some components of the Hadoop concepts can be partially replaced by it. Find out how Exasol can maximize Apache Spark’s performance to provide a superior solution for your business needs.

Read about Apache Spark and Exasol >

Interested in learning more?

Whether you’re looking for more information about our fast, in-memory database, or to discover our latest insights, case studies, video content and blogs and to help guide you into the future of data.