What you will learn from this blog:
- A comprehensive list of all of our Open Source offerings and support levels
- This first blog will cover our offerings in terms of; database drivers and interfaces, cloud, data science, analytics and machine learning, business intelligence, data virtualization and access restriction, distributed computing, administration and monitoring and data migration
- The next two blogs will cover our capabilities with regards to software development and software testing
Staying in touch with our users
Practically all of our software that integrates Exasol with other ecosystems is Open Source. This way we can stay in close contact with our user community and use their contributions to improve our product. People can contribute by writing feature requests, issues tickets or even pull requests for code they wrote to extend one of our offerings.
At the time I am writing this blog post, Exasol has 41 public repositories on GitHub. The vast majority of them contain Open Source software, while the other ones are tutorials. So, before we dive into the offerings, what does support look like for you?
Two levels of support
Our Open Source offerings have two different levels of support:
- Fully supported including the possibility to contact our support desk. This is the level for most of our projects
- Not officially supported
While the first is self-explanatory, the second can be found in projects that enthusiasts here at Exasol started. For example getting Exasol integrated with their favorite framework or tool. We still try to provide as much help as possible and if we see customer interest spiking in one of those areas we will consider raising the support level.
By default all our Open Source offerings are licensed under the MIT license, a very permissive free software license.
In cases where we integrate into an existing Open Source project, we usually choose the same license as the parent project. So, what are our offerings that you can benefit from?
Our Open Source offerings – what can you work with?
Now if you are not a frequent GitHub user, I would like to take this opportunity to give you a little overview of what we have available today. Since we are always expanding our OpenSource offerings, I would also like to encourage you to check back in Github now and then to see what’s new.
The following sections will give you a short overview of all of our offerings. In the two blogs that follow this post we will look specifically into software development and software testing capabilities. Please follow the links in the description if you want to learn more.
Database drivers and interfaces
Python Database Driver
If you are a Python programmer and want the best possible performance when accessing Exasol, pyexasol is the driver you should use. Originally created by our friends at Badoo, we are now developing this driver together.
This driver supports scaling via parallel processes on multi-core machines.
R programmers find the driver for the programming language of their choice in the r-exasol project. You get a DBI -compliant database interface. Additionally you can deploy and execute R dynamically, thanks to Exasol’s R support of User Defined Functions (UDFs).
Exasol provides a large number of features that are not accessible through a regular database driver — the most obvious ones being administrative tasks. Thanks to the websocket-api you can remotely control your Exasol installations. This comes in handy when you want to automate your Exasol setups.
Of course the API also supports data access too, so it is an ideal basis for custom-made database drivers.
EXAoperation XMLRPC API
Are you planning to automate the administration of your Exasol cluster?
The exaoperation-xmlrpc repository contains Python script examples showcasing how to automate the administration using the XMLRPC API instead of EXAoperation’s web interface. You will also find a complete description of all XMLRPC methods there.
Cloud Storage ETL UDFs
This project offers a lot of User Defined Functions (UDFs) that facilitate reading data from and writing data to storage in the cloud. With the cloud-storage-etl-udfs you can read data in an ever growing list of formats like Apache Avro, Apache Orc and Apache Parquet.
Additionally, you can import Avro formatted data from Apache Kafka clusters.
If you prefer setting up your Exasol installation on AWS using Terraform, then terraform-aws-exasol is the thing for you. This repository contains a Terraform module, documentation and example that helps you set up a cluster quickly.
Data Science, Analytics and Machine Learning
Data Science Examples
The data-science-examples repository contains a collection of examples and tutorials for Data Science and Machine Learning with Exasol. In those examples and tutorials you learn how to explore and prepare your data and build, train and deploy your model with and within Exasol.
Here you will find full tutorials that take you through the complete process from setting up to using Exasol in data science scenarios. Additionally small examples show you important points but leave the setup to you.
Tired of working with dry accounting data? Why not analyze sports for a change?
In the sports-analytics repository you can find examples of how to analyze data from sports events, which is both educational and entertaining.
Power BI – Exasol Connector
While our connector for Microsoft Power BI is certified and bundled with the Power BI Desktop, in the powerbi-exasol repository you can find the very latest version even before a new bundle is released.
Data Virtualization and Access Restriction
Exasol’s virtual-schemas allow you to project external data sources into your Exasol database as if they were views inside that database.
The nicest part is that querying those external sources is done via regular Exasol SQL commands.
At the time of writing we already support the following data sources: Athena, Aurora, Big Query, DB2, Exasol, Hive, Impala, MySQL, Oracle, PostgreSQL, Redshift, SAP HANA, SQL Server, Sybase ASE, Teradata, Generic JDBC.
Learn more about Virtual Schemas in these blog posts:
Redis Virtual Schema
Redis is an in-memory key-value store. The redis-virtual-schema is a very compact showcase that demonstrates how to access a data source like through a Virtual Schema Adapter written in Python. Additionally it demonstrates the concept of mapping data structures from a NoSQL database.
Row Level Security
Our row-level-security implementation is based on virtual-schemas. It introduces an additional access control layer on top of Exasol’s built-in authorization mechanisms. With row-level security you can decide on a per-dataset level who is allowed to see what.
Spark Exasol Connector
Apache Spark is an engine for distributed data processing. Among other interfaces there is Spark SQL which processes data in distributed collections called “datasets”. “Data frames” organize those datasets in named columns, much like tables do in a relational database.
Our spark-exasol-connector is a library that lets you create Spark data frames from Exasol queries and save data frames as Exasol tables.
Hadoop ETL UDFs
Exasol’s hadoop-etl-udfs are the main way to transfer data between Exasol and Hadoop. The SQL syntax for calling the UDFs is similar to that of Exasol’s native IMPORT and EXPORT commands, but with added UDF parameters for specifying the various necessary and optional Hadoop properties.
Confluent Kafka + Exasol
Administration and Monitoring
Monitor your Exasol cluster using the nagios-monitoring container. You can either use this container as a stand-alone monitoring solution, or extract the Nagios configuration and use it in your existing Nagios installation.
Check out the helper scripts collected in the exa-toolbox. It contains useful nuggets ranging from JSON processing over geospatial computations to views that make system data easier to interpret and other small utilities.
If you prefer GUI applications over command line tools and work with BucketFS regularly, you might be interested in the bucketfs-explorer. This little GUI client lets you browse buckets and their contents as well as upload, download and delete objects in those buckets.
The database-migration repository contains a large collection of SQL scripts that help you to move your data from a 3rd-party database into Exasol. The main focus is on cases where you move away from the old database.
Exasol’s team creates and maintains a large number of Open Source Software projects that help integrate our analytics platform with various ecosystems, ranging from on-premises solutions to setups hosted in a public cloud.
We like to stay in close contact with our customers and the Open Source Community to continuously improve our offerings. So, if you feel up to contributing, please share your ideas, bug reports or code commits on GitHub.
Over the next few weeks, the second and third blogs in this series will explore our Open Source offerings relating to software testing and development. Keep an eye on the blog to find out more.