Uploaded image for project: 'Solution Center'
  1. Solution Center
  2. SOL-443

How to create and use User Defined Functions (UDF) ?

    XMLWordPrintable

    Details

    • Type: How To
    • Status: Published
    • Affects Version/s: EXASolution 5.0, EXASOL 6.0.0, Exasol 6.1.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Solution:
      Hide

      This is a basic tutorial on User Defined Functions. For details regarding the programming languages see the EXAOL manual, section 2.11.3 More technical insight is also given in the white paper "Big Data Science - the future of analytics" (link).

      UDFs vs. control scripts

      Control scripts User Defined Functions (UDFs)
      for control jobs for operations on data sets
      written in Lua written in SQL, Lua, Java, Python, R
      executed via SQL statement: 'EXECUTE SCRIPT ...' executed within SQL 'SELECT' statement, e.g. 'SELECT udf(col1, col2) FROM tbl1;'
      supports executing of SQL statements & SQL error handling supports loading of Java / Python / R packages
      runs iteratively & single threaded on one node runs massively parallel

      Creating a UDF

      Parallelism

      Tables are stored in EXASOL row-wise distributed across a cluster's nodes. An UDF environment runs "sandboxed" on each node, computing a local result set integrated in EXASOL's SQL pipeline (see white paper (link): "A peek under the hood"). The local result sets are sent to the node handling the client connection and merged to the global result set to be sent to the client.

      Scalar UDFs

      A scalar UDF computes on one input row per UDF instance, and returns one output row. It is automatically executed in massively parallel in the EXASOL cluster.

      Aggregate UDFs

      An aggregate UDF consumes multiple input tuples, i.e. the whole columns of the table or the group if a GROUP BY clause is defined in the SELECT statement. One UDF instance returns one single value.

      The number of UDF instances to be started is handled by the GROUP BY clause. Without a GROUP BY clause one instance is started which runs on one node, not in parallel. A GROUP BY clause that results in multiple groups causes multiple UDF instances, which are then computed massively parallel in EXASOL.

      Analytical UDFs

      These consume as input values multiple data tuples (rows), and they also return a result set containing multiple rows. Parallelism is handled by the GROUP BY (see above).

      In the following example, a running sum is computed. Notice that the simple algorithm iterates through the entire input set within one instance (repeat loop).

      Adding language specific libraries / packages

      EXASOL 5 or before: Libraries are made available via EXAOperation: Software -> UDF Libraries :


      EXASOL 6 or later: See detailed information about EXABucket FS in SOL-503 and in particular the subsection Expanding script languages using BucketFS in the EXASOL user manual.

      Show
      This is a basic tutorial on User Defined Functions. For details regarding the programming languages see the EXAOL manual, section 2.11.3 More technical insight is also given in the white paper "Big Data Science - the future of analytics" (link). UDFs vs. control scripts Control scripts User Defined Functions (UDFs) for control jobs for operations on data sets written in Lua written in SQL, Lua, Java, Python, R executed via SQL statement: 'EXECUTE SCRIPT ...' executed within SQL 'SELECT' statement, e.g. 'SELECT udf(col1, col2) FROM tbl1;' supports executing of SQL statements & SQL error handling supports loading of Java / Python / R packages runs iteratively & single threaded on one node runs massively parallel Creating a UDF Parallelism Tables are stored in EXASOL row-wise distributed across a cluster's nodes. An UDF environment runs "sandboxed" on each node, computing a local result set integrated in EXASOL's SQL pipeline (see white paper (link): "A peek under the hood" ). The local result sets are sent to the node handling the client connection and merged to the global result set to be sent to the client. Scalar UDFs A scalar UDF computes on one input row per UDF instance, and returns one output row. It is automatically executed in massively parallel in the EXASOL cluster. Aggregate UDFs An aggregate UDF consumes multiple input tuples, i.e. the whole columns of the table or the group if a GROUP BY clause is defined in the SELECT statement. One UDF instance returns one single value. The number of UDF instances to be started is handled by the GROUP BY clause. Without a GROUP BY clause one instance is started which runs on one node, not in parallel. A GROUP BY clause that results in multiple groups causes multiple UDF instances, which are then computed massively parallel in EXASOL. Analytical UDFs These consume as input values multiple data tuples (rows), and they also return a result set containing multiple rows. Parallelism is handled by the GROUP BY (see above). In the following example, a running sum is computed. Notice that the simple algorithm iterates through the entire input set within one instance (repeat loop). Adding language specific libraries / packages EXASOL 5 or before : Libraries are made available via EXAOperation: Software -> UDF Libraries : EXASOL 6 or later : See detailed information about EXABucket FS in SOL-503 and in particular the subsection Expanding script languages using BucketFS in the EXASOL user manual.
    • Category 1:
      UDFs and In-Database Analytics
    • Category 2:
      Database Design

      Attachments

        Issue Links

        1. adding_libs.png
          adding_libs.png
          70 kB
        2. first_udf.png
          first_udf.png
          33 kB
        3. udf_aggregate.png
          udf_aggregate.png
          9 kB
        4. udf_analytic.png
          udf_analytic.png
          9 kB
        5. udf_parallelism.png
          udf_parallelism.png
          50 kB
        6. udf_scalar.png
          udf_scalar.png
          10 kB

          Activity

            People

            • Assignee:
              CaptainEXA Captain EXASOL
              Reporter:
              CaptainEXA Captain EXASOL
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: