Technical Exasol

Aurora Virtualis — using AWS Aurora with Exasol’s Virtual Schema

24 Jul 2019 | Share

Aurora Virtualis — using AWS Aurora with Exasol's Virtual Schema

In today’s article I’m going to explain how to set up a low-cost playground for using Amazon’s AWS Aurora with Exasol’s Virtual Schema. The goal is to demonstrate how to connect to Aurora.

Aurora is client-compatible with:

This tutorial looks at the cheapest installation possible. You’ll certainly choose some different options for a production setup.

Fair warning: the setup isn’t easy due to the fact that establishing a secure connection requires Aurora users to jump through some hoops.

Setting up a low-cost Aurora installation

Aurora is part of the Relational Database Services (RDS) solutions of Amazon’s AWS offer. You’ll get a managed installation that you can create with a reasonable amount of effort.

As Virtual Schemas currently don’t support MariaDB/MySQL, we’re going to install the PostgreSQL variant of Aurora.

To save money and reduce complexity, we’re going to:

These aren’t the choices you’d make for a production environment, but they’re more than enough for the sake of this tutorial.

Step-by-step Aurora setup

Many of AWS’ services are region-based. That means they are hosted in a specific area as opposed to services that are distributed world-wide by default. In this article region based links — like the ones to the AWS console  — use the Frankfurt region. Choose a region closest to you for your trial for best network performance.

To start, log into AWS.

Setting up the virtual network

I’m assuming that you have the community edition of Exasol set up on your computer and connect to Aurora via the internet. Another option would be to install Exasol pay-as-you-go on AWS. You could then install it on the same virtual network as Aurora.

  1. Open the VPC console
  2. Create the VPC
  3. Make sure DNS resolution and DNS hostnames options are enabled
  4. Create two subnets in different availability zones
  5. Create an internet gateway
  6. Select the newly created internet gateway
  7. Under Actions, click Attach to VPC
  8. In the dialog box Attach to VPC:
    1. Select the VPC you created earlier
    2. Click Attach

Setting up the Aurora infrastructure

Aurora is a managed service, but the underlying infrastructure shines through. Aurora uses EC2 under the hood to do the computing much like other RDS services. Some of the settings will look familiar if you ever set up an EC2 instance like the instance type and the need to define the VPC where you want install your cluster.

  1. Open the RDS console
  2. Click Create database
  3. In the dialog box Create database:
    1. Choose Amazon Aurora
    2. Select PostgreSQL-compatible
    3. Click Next
    4. Aurora PostgreSQL should be displayed as DB engine. Note that you can only use the Provisioned capacity type at the time of writing. Serverless would require MariaDB/MySQL.
    5. Choose db.r4.large as this is the smallest available instance type
    6. Switch off Multi-AZ deployment
    7. Set a DB instance identifier such as aurora-tutorial
    8. Enter master as the master username
    9. Enter a secure password for the master user
    10. Click Next
    11. Select the VPC you created earlier
    12. Let RDS create the subnets for you
    13. Make the instance publicly accessible
    14. Let RDS create a security group for you
    15. Name your database instance aurora-db-instance
    16. Call your initial Aurora database aurora_tutorial_database
    17. Disable IAM DB authentication
    18. Disable Encryption of data at rest
    19. Reduce the backup retention period to one day
    20. Disable Copy tags to snapshots
    21. Disable Enhanced monitoring
    22. Switch off Performance Insights
    23. Click Create database

This will take a while — the perfect opportunity to grab a tea or coffee.

Tweaking the security group

Note that RDS will limit the source IP for inbound traffic to your current IP address while it creates the security group for you. You might want to widen it to the address range of your organization later.

Connecting the database with an SQL client

Aurora instance hostname

Aurora instance hostname

We’re now going to test our newly installed Aurora database. Start your preferred SQL client and follow these instructions to get a connection.

Before you can connect, find the hostname of the instance you want to connect to. To achieve this:

  1. Open the RDS console
  2. Select your database instance in the list
  3. Switch to the tab Connectivity & security
  4. Copy the hostname under Endpoint

Get the Aurora version number as a first smoke test of your connection:

SELECT AURORA_VERSION();

Establishing a secure connection

Next we should talk a little bit about Transport Layer Security (TLS). If that is new to you I suggest reading up on TLS first before you continue. A starting point is the description section in the Wikipedia article. As it’s a very complex topic, covering all important aspects is way beyond the scope of this tutorial.

I also recommend reading ‘s article that explains the difference between a Java keystore and a truststore.

At the moment, Aurora has at least one certificate in the certificate validation chain with 1024-bit key length. This is considered too low to be acceptable by some modern Linux distributions. On my Fedora 29 machine, for example, you’ll find the following:  /etc/crypto-policies/back-ends/java.config:

jdk.certpath.disabledAlgorithms=MD2, MD5, DSA, RSA keySize < 2048

Generally speaking, this configuration is a good idea since it improves system security. Unfortunately it prevents establishing a connection with Aurora as the server certificate can’t be verified. I had to reduce the value to < 1024 in order to get TLS working.

On Exasol 6.1.1, where I tested this tutorial, key size 1024 was still allowed.

Important properties

You need to set the property to ssl=true to switch SSL on, including full certificate chain checking. To use the default Java certificate store, set the property to  sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory. Both settings are essential to successfully establish the secured connection in a way that also works with virtual schemas.

Authentication and certificates

I now briefly explain the concept of a authentication trust chain built on certificates. I can only scratch the surface in this article. If you’re new to this topic, I strongly recommend reading up on secure network communication via TLS.

TLS uses a cryptographic certificate to authenticate the server your client is talking to. Since you can’t store all certificates of all machines in the world on your computer, certificate chaining was invented. The idea is that you only need to store certificates of root certification agencies (CAs) that are considered trusted on your computer. Operating systems, browsers and Java all come with a — mostly overlapping — set of root CA certificates preinstalled. Next a chain of certificates is established. This helps tracing a certificate your machine doesn’t know all the way down to a root CA. And this establishes a trusted authentication.

If the service provider derives the host certificate of the server you’re talking to from a root CA your computer already knows, you’re fine.

If not, you have to install the missing certificates. Amazon provides a bundle of certificates for their RDS service for download. And this is where the installation becomes inconvenient. Successful authentication against Aurora required the installation of these extra certificates during my trials. For a cluster you’d have to install them on every data node.

Installing the certificates

This script installs the bundle on an Exasol node — provided you have permissions to log into the node as root.

#!/bin/bash
cd /tmp
wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
openssl x509 -outform der -in rds-combined-ca-bundle.pem -out rds-combined-ca-bundle.der
keytool -keystore /etc/pki/ca-trust/extracted/java/cacerts -import -file rds-combined-ca-bundle.der -alias awsrds

updatedb
CONTAINER_TRUSTSTORE=$(locate --regex 'ScriptLanguages.*/java/cacerts')
keytool -keystore "$CONTAINER_TRUSTSTORE" -import -file rds-combined-ca-bundle.der -alias awsrds

The script first switches to the /tmp directory, then downloads the drivers from the internet. It then converts the certificates from PEM to DER format in order to be able to install them in the Java truststores (central and BucketFS).

If your cluster doesn’t have an internet connection, you have to download the certificate bundle to your machine, upload it to the cluster node via SSH and then install it in the truststore.

Setting up the Virtual Schema

  1. Download the latest PostgreSQL JDBC driver for JDBC 4.2
  2. Deploy the driver
    1. Install the JDBC driver via EXAOperation
      When you deploy the JDBC driver, it’s essential to do it in ExaOperation too. While Exasol comes with a preinstalled PostgreSQL driver, you need version 42.2.6 or later for setting up the TLS connection as I’ve explained.

      PostgreSQL JDBC driver 42.2.6

      PostgreSQL JDBC driver 42.2.6

    2. Copy the driver
    3. to BucketFS service bfsdefault under /jars
  3. Create a named connection
    CREATE OR REPLACE CONNECTION AURORA_CONNECTION
    TO 'jdbc:postgresql://<cluster id>.<region>.rds.amazonaws.com/<database>?ssl=true&sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory'
    USER '<user>'
    IDENTIFIED BY '<password>'

    The property ssl=true switches SSL on, including full certificate chain checking. To use the default Java certificate store, set the property to sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory.

    Remember to adapt the parameters for user, cluster, region and database to match your setup.

  4. Deploy the adapter
    1. Copy the adapter JAR to BucketFS service bfsdefault under /jars
    2. Register the adapter
      CREATE SCHEMA ADAPTER;
      
      CREATE OR REPLACE JAVA ADAPTER SCRIPT ADAPTER.AURORA_ADAPTER AS
        %scriptclass com.exasol.adapter.RequestDispatcher;
        %jar /buckets/bfsdefault/jars/virtualschema-jdbc-adapter-dist-1.19.1.jar;
        %jar /buckets/bfsdefault/jars/postgresql-42.2.6.jar;
      /
      ;
  5. Create the Virtual Schema
    CREATE VIRTUAL SCHEMA AURORA
        USING ADAPTER.AURORA_ADAPTER
        WITH
        SQL_DIALECT = 'POSTGRESQL'
        CONNECTION_NAME = 'AURORA_CONNECTION'
        SCHEMA_NAME = 'public'
        EXCLUDED_CAPABILITIES='JOIN,JOIN_TYPE_INNER,JOIN_TYPE_LEFT_OUTER,JOIN_TYPE_RIGHT_OUTER,JOIN_TYPE_FULL_OUTER,JOIN_CONDITION_EQUI';

    The last line with the excluded capabilities is only necessary for Exasol versions prior to up to and including 6.1.4. You can drop it for newer releases.

You now have a TLS-secured JDBC connection from the Exasol Virtual Schema to the Aurora cluster.

Conclusion

AWS Aurora is a managed relational database that you can access via a PostgreSQL JDBC connection. This lets us create a Virtual Schema on top of an Aurora database. The setup is complex because authentication only works if you install the certificate bundle Amazon provides on each Exasol cluster node. And if you manage to wrap your head around this, you’re rewarded with a secure connection between Exasol and Aurora.

10 trends impacting data analytics

Now that we’re well and truly in the age of data, what’s coming next? 

FREE WHITE PAPER