Insights Blog

What are GPUs and why do data scientists love them?

Exasol - the analytics database

Move over, CPUs, the GPUs have arrived in modern enterprises and data scientists are eager to use them for their modelling and Loading...deep learning applications.

Why is this happening and what are the advantages GPUs bring for Loading...data science applications? Read on and find out…

What are GPUs?

GPUs, or graphics processing units, have been used for decades in the gaming industry and became more popular when Sony first used the term with reference to its PlayStation console. They have been essential for the fast rendering and processing of computer games, revolutionizing the experience for gamers as graphics became more and more detailed, nuanced and realistic.

While GPUs were designed to render graphics through rapid mathematical calculations, it is this high performance processing that makes them appealing for data science. It enables AI to learn from images and sounds, using massive amounts of image and sound inputs for these deep learning processes.

To make this a reality, GPUs power neural networks being trained at scale, so that end users can enjoy image, video and voice based applications as well as the recommendation engines so many of us use, whether it is to find a good restaurant or our new favorite sneakers.

How do GPUs make an impact in data science?

All of us are familiar with the need for good processing power to get our work done. That applies to our laptops and desktop computers as well as larger infrastructure such as servers, switches and of course the network we all rely on.

The term CPU, central processing unit, is commonplace and describes the main processor within a computer, the ‘brain’ of the machine that executes instructions and programs.

In data science, Loading...Python libraries have become increasingly efficient at utilizing the existing CPU power available. When you want to work with hundreds of millions or even billions of records, running deep learning applications, however, CPUs will not be sufficient.

Enter: GPUs with their powerful parallel processing architecture, which allows organizations to run, for example, forecasting models across millions of possible product combinations for their retail stores to inform, plan, and optimize their warehousing operations.

GPUs and the power they bring to data science, opens up new opportunities for data scientists, analytics departments and the organization as a whole.

CPUs process sequentially, while GPUs process in parallel. So even a large cluster of CPUs cannot achieve the same performance as the right architecture of GPUs for training deep learning algorithms.

GPUs and Exasol

Now imagine this optimized parallel architecture of GPUs combined with the massively parallel processing built into the Exasol database. Your database software and hardware are now perfectly aligned for the AI tasks you want to complete, both benefiting from each other and utilizing each other optimally.

To test the performance of Exasol running on GPUs, we trained a model in Tensorflow, using a dataset of fine food reviews from Amazon. The dataset contains 500,000+ reviews from over 10 years. Our model is designed to predict the score for each product, based on the text of the review, where we want to analyze the sentiment. We can then compare the predicted score with the actual score that is available in the data, but not being used in the training.

The dataset contains categorical, numerical and text data, making it a nice challenge for our model, which is based on a pre-trained model in Tensorflow, called Universal Sentence Encoder. The complexity of the dataset creates an interesting use case for GPUs, because not only do the different data types require specific encoding strategies, the large amount of text also requires a large model, which will be a good test for our GPUs.

If you want to learn how to train a Tensorflow Model in Loading...UDFs on GPUs, check out this guide on GitHub.

the results

For our tests we used the following setup on Google Cloud Platform (Region: Iowa):

  • 1x NVIDIA Tesla K80
  • 1x NVIDIA Tesla V100
  • 1x CPU with 8 cores
  • 1x CPU with 32 cores

When comparing the performance of GPUs and CPUs for training our model, the K80 completed each epoch 9.9x faster (72sec versus 710sec) than the CPU with 8 cores and 2.9x faster (72sec versus 210sec) than the CPU with 32 cores.

The V100, the most advanced GPU currently available in the cloud, completed each epoch 28.4x faster (25sec versus 710sec) than the 8 core CPU and 8.4x faster (25sec versus 210sec) than the 32 core CPU.

These results speak for themselves and present real opportunities to our customers and users to move their data science applications directly into the database.

What are the opportunities for GPUs in data science and analytics?

GPUs are instrumental for data scientists working with large data volumes on developing, training and refining their models. They provide a more cost effective option for loading and manipulating data at this scale than CPUs and therefore achieve the dual benefit of reduced infrastructure expenses combined with improved performance.

Given the demand for data scientists in the market and the value organizations should place on their skills, GPUs provide great opportunities for enabling data scientists to spend more time  focusing on value-added tasks and experience fewer frustrations stemming from slow-performing systems and tools.

GPUs provide these benefits wherever an organization has data – in the cloud, on premises or in a hybrid model.

At Exasol we love making people’s day at work better and helping them have more fun in their work with data. So we’ve been working hard behind the scenes to bring the power of GPUs to Exasol databases everywhere. This feature will become available in the near future, so stay tuned for more information.

Eva Murray, Technology Evangelist, Exasol

Video

00:0
Start your Journey

Get in touch today

Let us know how we can support your business.