Insights Blog

Edge analytics with Exasol using data virtualization – part one

31 Mar 2016 | Share

Data virtualization and its relationship with edge analytics

The wind farm challenge

I have avoided referring to the Internet of Things in the title of this article as IoT already has become a very blurry and broad concept. Nevertheless, I would like to refer to a typical example or rather an area that is frequently addressed in the context of IoT, as this has some important factors that are becoming more and more relevant. The domain in question is renewable energy and sensor data.

Wind turbines use sensor data to monitor weather conditions as well as their own operating levels so that they can generate the maximum amount of energy while minimizing the risk of damage and wear-and-tear. The information generated by each wind turbine locally can be enormous in terms of volume. These vast amounts of data can be used by companies to establish predictive maintenance which in turn helps to minimize outages and maintenance costs.

So, how do you analyze this vast amount of data?

The naïve approach would see professionals copy all the data generated by the wind turbines into a central database server and analyze it. While this may seem like a perfect solution to some, I wholeheartedly disagree. Wind turbines and other devices are often connected via low-bandwidth connections to the rest of the world (for example, UMTS-connected wind turbines in the North Sea). Yet, such wind turbines’ sensors are capable of generating gigantic amounts of data within just seconds.

Transferring the data that was generated within just seconds could in fact take minutes. Put simply, it is technically impossible to transfer all the data generated by dozens or even hundreds of wind turbines; the speed of transfer is inferior to the speed of data generation.

Edge analytics

Admittedly, the imbalance between generated data and available bandwidth might be very significant in the context of wind turbines located in the North Sea, but this fact is equally true in many other application scenarios and domains. The fact remains that sensors are capable of generating extremely large amounts of data within just seconds, and network bandwidth is not sufficient to transfer all of the data.

One obvious question that this then throws up: Do we in fact need all that raw data in a central “data lake”? In most cases the answer is no. The reason lies in the fact that we are interested primarily in the following:

  • Detecting outliers
  • Detecting trends over time
  • Analyzing and archiving different kinds of aggregations

This is where the concept of “edge analytics” comes into play. What if we could analyze the data directly at its source, i.e. inside the wind turbines themselves? Then data transfer would become trivial; we would only need to transfer information on outliers, trends and aggregations.

How does data virtualization come into play?

Ok, now let’s move on to data virtualization and its relationship with edge analytics in the example of our wind turbines. Referring to Wikipedia, we are told:

Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.

That is exactly what Exasol’s data virtualization framework called “virtual schemas” does. You can connect any data source to Exasol using “virtual schemas” and a user who queries a central database with virtually connected data sources does not see that the data of a “virtual schema” is still stored in the original database or, in other words, at the edge location (e.g. inside the wind farm database). Instead, it looks like a normal schema with data stored in the database the user is interacting with. If the user runs a query that references a virtually connected schema and a physical schema, Exasol sends a part of the query to the connected dataset and combines the received data with the locally processed information.

Put more technically, it pushes as many operators as possible down to the edge location. Although the first release of the Virtual Schema feature might have some functional limitations, there will be always a feasible workaround to guarantee optimal performance.

In the second part of this blog post, we will drill down into the details and answer the question why Exasol is a perfect fit for this challenge.

In the meantime, why not get started with Exasol today?
Try Exasol for free by downloading the software.

10 trends impacting data analytics

Now that we’re well and truly in the age of data, what’s coming next?