Uploaded image for project: 'Solution Center'
  1. Solution Center
  2. SOL-148

Pentaho Data Integration - PDI, former Kettle

    Details

    • Type: How To
    • Status: Obsolete
    • Affects Version/s: EXASOL 6.0.0, Exasol 6.1.0, EXASolution 5.0.0
    • Fix Version/s: None
    • Component/s: 3rd Party Tools
    • Labels:
      None
    • Environment:
      Windows, Unix
    • Solution:
      Hide

      Note: This solution is no longer maintained. For the latest information, please refer to our documentation:

      https://docs.exasol.com/connect_exasol/data_integration_etl/pentaho_di.htm

      Overview

      Pentaho Data Integration (PDI) delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools.

      Installation

      Pentaho offers a free community edition under a GPL license.

      Within the file section at http://sourceforge.net/projects/pentaho/ reside folders for all components. Choose 'Data Integration' and download the zip-file with your desired version.

      To install PDI simple copy the contents of the downloaded zip-file to a directory of your choice. No further action is needed.

      JDBC connection to EXASolution

      PDI has a "Exasol 4" option natively in the list of available connection types.
      If you get an error concerning the missing driver 'com.exasol.jdbc.EXADriver' copy
      the file "exajdbc.jar" from your EXASOL JDBC or EXAplus installation to
      '/data-integration/libext/JDBC' (4.4) or into the '/data-integration/lib' (5.x or later) - folder.

      Don't get confused by the name "Exasol 4", even Exasol6 drivers will do the job.

      Best practices

      1 - Logging

      Do not use EXASolution as target for pentaho's logging information. Pentaho will perform single-row insert/commit pairs using one concurrent connection per running transformation.
      Instead, send logs to a (CSV) file or a dedicated logging database. You can then access that log information using the IMPORT statement.

      2 - Encrypting Passwords

      While using Kettle, passwords are required in a number of places:

      • database connections in jobs or transformations
      • jobs and steps with connections to a server (FTP, HTTP, ...)
      • configuration files, e.g. kettle.properties

      All passwords can be entered in plain text, but it is good practice to encrypt them using the following command.

      In the Kettle-Home directory you can find encr.bat on windows or encr.sh on linux/unix.

      Example:

      home> cd data-integration
      data-integration> ./encr.sh -kettle "My Password"
      Encrypted 2be98afc86aea8bc49b18bd63c99dbdde
      data-integration>
      

      Be aware to include the prefix 'Encrypted ' (the word plus a blank) with the result to indicate the obfuscated nature of the password.

      3 - Using the "kettle.properties" file

      It's a best practice to put all your variables for your jobs and transformations in the "kettle.properties" for easy reuse or changing them once in a central position for different environments (Develpment, Test, Production).

      Be aware the kettle.properties file is user specific and for Windows it resides in a folder like 'c:\Users\<<user>>\.kettle\'.

      Known Problems

      • Pentaho version 5.4.0.1-130 seems to suddenly imply case-sensitive field names. However, it happily loads and executes old transformations without displaying any errors. But you will get null values for every mis-spelled field name!
      Show
      Note: This solution is no longer maintained. For the latest information, please refer to our documentation: https://docs.exasol.com/connect_exasol/data_integration_etl/pentaho_di.htm Overview Pentaho Data Integration (PDI) delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools. Installation Pentaho offers a free community edition under a GPL license. Within the file section at http://sourceforge.net/projects/pentaho/ reside folders for all components. Choose 'Data Integration' and download the zip-file with your desired version. To install PDI simple copy the contents of the downloaded zip-file to a directory of your choice. No further action is needed. JDBC connection to EXASolution PDI has a "Exasol 4" option natively in the list of available connection types. If you get an error concerning the missing driver 'com.exasol.jdbc.EXADriver' copy the file "exajdbc.jar" from your EXASOL JDBC or EXAplus installation to '/data-integration/libext/JDBC' (4.4) or into the '/data-integration/lib' (5.x or later) - folder. Don't get confused by the name "Exasol 4", even Exasol6 drivers will do the job. Best practices 1 - Logging Do not use EXASolution as target for pentaho's logging information. Pentaho will perform single-row insert/commit pairs using one concurrent connection per running transformation. Instead, send logs to a (CSV) file or a dedicated logging database. You can then access that log information using the IMPORT statement. 2 - Encrypting Passwords While using Kettle, passwords are required in a number of places: database connections in jobs or transformations jobs and steps with connections to a server (FTP, HTTP, ...) configuration files, e.g. kettle.properties All passwords can be entered in plain text, but it is good practice to encrypt them using the following command. In the Kettle-Home directory you can find encr.bat on windows or encr.sh on linux/unix. Example: home> cd data-integration data-integration> ./encr.sh -kettle "My Password" Encrypted 2be98afc86aea8bc49b18bd63c99dbdde data-integration> Be aware to include the prefix 'Encrypted ' (the word plus a blank) with the result to indicate the obfuscated nature of the password. 3 - Using the "kettle.properties" file It's a best practice to put all your variables for your jobs and transformations in the "kettle.properties" for easy reuse or changing them once in a central position for different environments (Develpment, Test, Production). Be aware the kettle.properties file is user specific and for Windows it resides in a folder like 'c:\Users\<<user>>\.kettle\'. Known Problems Pentaho version 5.4.0.1-130 seems to suddenly imply case-sensitive field names . However, it happily loads and executes old transformations without displaying any errors. But you will get null values for every mis-spelled field name!
    • Category 1:
      3rd Party Tools - ETL
    • Category 2:
      ETL / Data Integration

      Attachments

        Issue Links

          Activity

            People

            • Assignee:
              CaptainEXA Captain EXASOL
              Reporter:
              CaptainEXA Captain EXASOL
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: