Tech Blog

Open Source: what are our offerings and how can you use them?

What you will learn from this blog:

  • A comprehensive list of all of our Open Source offerings and support levels
  • This first blog will cover our offerings in terms of; database drivers and interfaces, cloud, Loading...data science, analytics and Loading...machine learning, business intelligence, data virtualization and access restriction, distributed computing, administration and monitoring and data migration
  • The next two blogs will cover our capabilities with regards to software development and software testing

Staying in touch with our users

Practically all of our software that integrates Exasol with other ecosystems is Open Source. This way we can stay in close contact with our user community and use their contributions to improve our product. People can contribute by writing feature requests, issues tickets or even pull requests for code they wrote to extend one of our offerings.

At the time I am writing this blog post, Exasol has 41 public repositories on GitHub. The vast majority of them contain Open Source software, while the other ones are tutorials. So, before we dive into the offerings, what does support look like for you?

Two levels of support

Our Open Source offerings have two different levels of support:

  1. Fully supported including the possibility to contact our support desk. This is the level for most of our projects
  2. Not officially supported

While the first is self-explanatory, the second can be found in projects that enthusiasts here at Exasol started. For example getting Exasol integrated with their favorite framework or tool. We still try to provide as much help as possible and if we see customer interest spiking in one of those areas we will consider raising the support level.

Licenses

By default all our Open Source offerings are licensed under the MIT license, a very permissive free software license.

In cases where we integrate into an existing Open Source project, we usually choose the same license as the parent project. So, what are our offerings that you can benefit from?

Our Open Source offerings – what can you work with?

Now if you are not a frequent GitHub user, I would like to take this opportunity to give you a little overview of what we have available today. Since we are always expanding our OpenSource offerings, I would also like to encourage you to check back in Github now and then to see what’s new.

The following sections will give you a short overview of all of our offerings. In the two blogs that follow this post we will look specifically into software development and software testing capabilities. Please follow the links in the description if you want to learn more.

Database drivers and interfaces

Loading...python-database-driver">Python Database Driver

If you are a Python programmer and want the best possible performance when accessing Exasol, pyexasol is the driver you should use. Originally created by our friends at Badoo, we are now developing this driver together.

This driver supports scaling via parallel processes on multi-core machines.

R Interface

R programmers find the driver for the programming language of their choice in the r-exasol project. You get a DBI -compliant database interface. Additionally you can deploy and execute R dynamically, thanks to Exasol’s R support of User Defined Functions (Loading...UDFs).

Websocket API

Exasol provides a large number of features that are not accessible through a regular database driver — the most obvious ones being administrative tasks. Thanks to the websocket-api you can remotely control your Exasol installations. This comes in handy when you want to automate your Exasol setups.

Of course the API also supports data access too, so it is an ideal basis for custom-made database drivers.

EXAoperation XMLRPC API

Are you planning to automate the administration of your Exasol cluster?

The exaoperation-xmlrpc repository contains Python script examples showcasing how to automate the administration using the XMLRPC API instead of EXAoperation’s web interface. You will also find a complete description of all XMLRPC methods there.

Cloud

Cloud Storage ETL UDFs

This project offers a lot of User Defined Functions (UDFs) that facilitate reading data from and writing data to storage in the cloud. With the cloud-storage-etl-udfs you can read data in an ever growing list of formats like Apache Avro, Apache Orc and Apache Parquet.

Additionally, you can import Avro formatted data from Apache Kafka clusters.

Terraform Module

If you prefer setting up your Exasol installation on AWS using Terraform, then terraform-aws-exasol is the thing for you. This repository contains a Terraform module, documentation and example that helps you set up a cluster quickly.

Data Science, Analytics and Machine Learning

Data Science Examples

The data-science-examples repository contains a collection of examples and tutorials for Data Science and Machine Learning with Exasol. In those examples and tutorials you learn how to explore and prepare your data and build, train and deploy your model with and within Exasol.

Here you will find full tutorials that take you through the complete process from setting up to using Exasol in data science scenarios. Additionally small examples show you important points but leave the setup to you.

Sports Analytics

Tired of working with dry accounting data? Why not analyze sports for a change?

In the sports-analytics repository you can find examples of how to analyze data from sports events, which is both educational and entertaining.

Business Intelligence

Power BI – Exasol Connector

While our connector for Microsoft Power BI is certified and bundled with the Power BI Desktop, in the powerbi-exasol repository you can find the very latest version even before a new bundle is released.

Data Virtualization and Access Restriction

Virtual Schemas

Exasol’s virtual-schemas allow you to project external data sources into your Exasol database as if they were views inside that database.

The nicest part is that querying those external sources is done via regular Exasol SQL commands.

At the time of writing we already support the following data sources: Athena, Aurora, Big Query, DB2, Exasol, Hive, Impala, MySQL, Oracle, PostgreSQL, Redshift, SAP HANA, SQL Server, Sybase ASE, Teradata, Generic JDBC.

Learn more about Virtual Schemas in these blog posts:

Redis Virtual Schema

Redis is an Loading...in-memory Loading...key-value store. The redis-virtual-schema is a very compact showcase that demonstrates how to access a data source like through a Virtual Schema Adapter written in Python. Additionally it demonstrates the concept of mapping data structures from a Loading...NoSQL database.

Row Level Security

Our row-level-security implementation is based on virtual-schemas. It introduces an additional access control layer on top of Exasol’s built-in authorization mechanisms. With row-level security you can decide on a per-dataset level who is allowed to see what.

Distributed Computing

Spark Exasol Connector

Apache Spark is an engine for distributed data processing. Among other interfaces there is Spark SQL which processes data in distributed collections called “datasets”. “Data frames” organize those datasets in named columns, much like tables do in a Loading...relational database.

Our spark-exasol-connector is a library that lets you create Spark data frames from Exasol queries and save data frames as Exasol tables.

Loading...hadoop-etl-udfs">Hadoop ETL UDFs

Exasol’s hadoop-etl-udfs are the main way to transfer data between Exasol and Hadoop. The SQL syntax for calling the UDFs is similar to that of Exasol’s native IMPORT and EXPORT commands, but with added UDF parameters for specifying the various necessary and optional Hadoop properties.

Data Streaming

Confluent Kafka + Exasol

If you want to use Exasol as either data source or sink for your Confluent Kafka Connect cluster, check out kafka-connect-jdbc-exasol.

Administration and Monitoring 

Nagios Monitoring

Monitor your Exasol cluster using the nagios-monitoring container. You can either use this container as a stand-alone monitoring solution, or extract the Nagios configuration and use it in your existing Nagios installation.

Exasol Toolbox

Check out the helper scripts collected in the exa-toolbox. It contains useful nuggets ranging from JSON processing over geospatial computations to views that make system data easier to interpret and other small utilities.

BucketFS-Explorer

If you prefer GUI applications over command line tools and work with BucketFS regularly, you might be interested in the bucketfs-explorer. This little GUI client lets you browse buckets and their contents as well as upload, download and delete objects in those buckets.

Data Migration

Database Migration

The database-migration repository contains a large collection of SQL scripts that help you to move your data from a 3rd-party database into Exasol. The main focus is on cases where you move away from the old database.

We support a lot of different sources already: CSV, DB2, Exasol, MySQL, Netezza, Oracle, PostgreSQL, Redshift, S3, SAP Hana, SQL Server, Teradata, Vectorwise, Vertica, Google BigQuery

What’s next?

Exasol’s team creates and maintains a large number of Open Source Software projects that help  integrate our analytics platform with various ecosystems, ranging from on-premises solutions to setups hosted in a public cloud.

We like to stay in close contact with our customers and the Open Source Community to continuously improve our offerings. So, if you feel up to contributing, please share your ideas, bug reports or code commits on GitHub.

Over the next few weeks, the second and third blogs in this series will explore our Open Source offerings relating to software testing and development. Keep an eye on the blog to find out more.

Sebastian Bär
Video

00:0
Start your Journey

Get in touch today

Let us know how we can support your business.