The three methods of predictive analytics
Descriptive analytics has long been used in the field of business intelligence to gain information and knowledge from historical data. Predictive analytics methods go one step further and use this historical data to make predictions for the future. These methods can be roughly divided into 3 groups: Statistical models, classical machine learning methods and deep learning. Deep learning methods or neural networks offer a great potential as they are able to learn better from really huge data sets.
But big data sets also have a big disadvantage. The more data you have, the more data has to be moved from your database to your predictive models. So why not move your predictive models to your data? This blog explains, how to execute Tensorflow inside our data analytics platform, directly on your data.
Let’s start with dockers
Our data analytics platform gives you the possibility to extend the programming functionality by adding new Scripting Language Docker container. Such docker containers should be created under Linux, as working with Docker for Windows isn’t my idea of fun.
Create Ubuntu VM
If there’s no Linux server or client available to execute the Docker, it’s possible to create a small Linux VM with VMware Player. Ubuntu is a good choice – and better still, it’s available for free.
The Ubuntu Server ISO is available with the following link:
While creating a new virtual machine you just have to select the downloaded ISO to use Ubuntu as an OS.
During the installation of Ubuntu you have to define the user and password, which you use later to login to the VM.
Additional tools you might find useful
The following additional tools are recommended for working with a Linux Server:
Putty – Putty is a slim SSH client, used to connect and work with the command line of the Ubuntu VM. And that’s way more enjoyable than using the small VMware window.
WinSCP – WinSCP is an open source SFTP file transfer client. That’s the easiest way to exchange files with the Ubuntu VM without thinking about shared folders or curl commands.
To get both tools work, you have also to install a SSH server on your Ubuntu VM:
[code language="bash"] sudo apt-get install openssh-server [/code]
How to install the docker
In the next step, you have to install the docker at the Ubuntu VM. This is needed to create and handle the language docker container of our data analytics platform.
The following guide explains, how to install Docker on Ubuntu:
Nest, install GitHub
Our platform also offers a GitHub repository with different solutions. You can also find some scripts, which fully and automatically create new Script Language containers. To do this you need to clone the Exasol GitHub repository. Here’s how:
[code language="bash"] #install git hub sudo apt-get install git #clone exasol repository git clone https://github.com/exasol/script-languages [/code]
Create a Python 3 container
The Exasol GitHub repository “script languages” contains a guide and all the scripts you need to create Docker language container for different flavors. A flavor is a specific combination of programming language and packages. If you need multiple programming languages, you don’t have to create a container for each language. You just define your own flavor with all needed information.
The flavor “python3-ds-EXASOL-6.1.0” is suitable for Exasol Version 6.1 and it already contains Python 3 with all theTensorFlow packages you need. Further packages can be added by editing the flavor base file
After all adaptions, the docker container can be build and exported like so:
[code language="bash"] ./build --flavor=python3-ds-EXASOL-6.0.0 ./export --flavor=python3-ds-EXASOL-6.0.0 [/code]
As a result you’ll receive a standalone archive “python3-ds-EXASOL-6.1.0.tar.gz, which needs to be copied to the BucketFS, the internal file system of your Exasol database. The BucketFS explorer is the easiest way, to manage different buckets and copy files.
How to test TensorFlow once it’s been implemented
TensorFlow is now implemented on your Exasol data analytics platform.
So now it’s time to test, whether everything is running fine.
First you have to populate the new TensorFlow container as a new script language „PYTHON3“:
[code language="SQL"] ALTER SESSION SET SCRIPT_LANGUAGES = 'PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=localzmq+protobuf:///bucketfsname/bucketname/python3-ds-EXASOL-6.1.0?lang=python#buckets/bucketfsname/bucketname/python3-ds-EXASOL-6.1.0/exaudf/exaudfclient_py'; [/code]
For testing TensorFlow a small UDF can be used, which just reads a parameter und returns it by executing the typical “Hello World” example for TensoFlow.
[code language="SQL"] create or replace PYTHON3 scalar script sandbox.test_tensorflow (p_test varchar(100)) returns varchar(100) as import tensorflow as tf def run(ctx): v_const = tf.constant(str(ctx.p_test)) v_sess = tf.Session() v_return = v_sess.run(v_const) return v_return.decode() ; [/code]
The created UDF can easily be used inside every SELECT statement.
[code language="SQL"] select sandbox.test_tensorflow('Hello bla TensorFlow!') from dual; [/code]
The Result seems to be correct:
And the result?
By extending the functionality of such an UDF, you’re now able to execute your trained Tensorflow models directly on your data. The predictive variables are provided through the UDF parameters.
Inside the UDF you can load the existing model, which also has to be available on the BucketFS, and do your predictive analysis. When the result of this prediction comes back, you can process it in your data flow.