So, how does the concept of virtual schemas relate to edge analytics?
In the first part of this blog post we advocated the need for “edge analytics” in the context of renewable energy where we discussed how data is generated in vast amounts at the network edge and yet there are insufficient means to transfer the data to a centralized location. We also introduced the concept of “virtual schemas”- the new data virtualization technology from Exasol.
So, how does the concept of virtual schemas relate to edge analytics? Let me explain. We let the data stay where it is and where it has been generated, inside a virtually connected remote database at the edge location within the wind farm. And when users run analytic queries, as little data as possible is then transferred between the remote edge databases and the central database. A user can then run their analytics on the virtually connected data source as if it were stored within one database.
Would there be an advantage in using an Exasol database at the edge location instead of using a “standard” relational database?
Absolutely! Standard relational databases will suffer from poor performance levels, and using them at the edge would result in query processing times of minutes or even hours. Using Exasol high performance databases at the edges and centrally, connected via virtual schemas, would instead result in dramatically faster response times when querying vast amounts of distributed data. (Of course, network latency limits the overall response time).
In contrast to other solutions, this would allow interactive analytics on large amounts of data stored around the globe using Exasol’s high performance in-memory technology.
Now that we have discussed architectures and usage of technologies, let’s turn our attention to why Exasol and data virtualization in particular are the perfect choice for addressing such a challenge like analyzing the data generated by large farms for renewable energy.
Why is it better to base edge analytics on an open extensible virtualization framework such as virtual schemas?
First, it is robust:
Even if an edge database is not available, the remaining components can still be used for analytics
Second, it is product and vendor-agnostic and therefore future-proof and easy to update & maintain:
When building such a federated infrastructure from scratch, we recommend opting for the latest products provided by a single vendor for this endeavor, but this might not feasible in many cases.
Open data virtualization provided by Exasol gives you both these advantages:
- Exasol technology not only offers seamless integration and optimal performance levels, but you can also integrate it with any other database, whether relational or NoSQL
- Exasol offers an open source framework which means that technology that is not already supported out-of-the-box can be connected by implementing the missing virtual schemas adaptor
Why is an Exasol database a perfect choice for analytics at the edge?
Although many of our customers operate large cluster systems with TBs of main memory, this does not mean that Exasol always requires large amounts of resources. Indeed, the opposite is true. Exasol has proven to be extremely resource efficient. You can process 100GB of data using 10GB of main memory at high and interactive speeds with great ease.
In the field of IoT and edge analytics, that means that a micro server with 16-64GB of memory could be perfectly suited to running analytics on up to 1TB of data!
Furthermore, one other ground-breaking property of Exasol is that the database is self-tuning; there’s no need to manage and tweak hundreds of edge databases. They just work.
And last but not least. Even when running on very limited hardware, Exasol offers a comprehensive set of analytic functionality ranging from SQL to in-database analytics with R or Java at the edge locations!
While you may not be active in the renewable energy market, this use case has hopefully demonstrated how powerful Exasol’s upcoming “virtual schema” feature is.
Having understood the use case described above, perhaps you can now understand how this might be relevant for your own infrastructure that consists of various data marts, data sources and systems that constantly generate data, and how your infrastructure has to meet an even bigger set of requirements and keep up with increased user expectations.