[SOL-130] Multi-path SQL: Tables vs. Views vs. Subselects Created: 13.08.2014  Updated: 08.06.2020  Resolved: 08.06.2020

Status: Obsolete
Project: Solution Center
Component/s: EXASolution
Affects Version/s: Exasol 6.2.x
Fix Version/s: None

Type: How To
Reporter: Captain EXASOL Assignee: Captain EXASOL
Labels: None

Solution:

Note: This solution is no longer maintained. For the latest information, please refer to our Knowledge Base:

https://community.exasol.com/t5/database-features/multi-path-sql-tables-vs-views-vs-subselects/ta-p/770

--------------------------------------------------------------------------------------------------------------------------------------------------------------

Background

Conventional Method: (Temporary) Tables

Often, an analysis will be complex enough that it is very hard or cumbersome to express in a single SQL statement. The usual way to work around this is to create temporary tables with intermediate results and then combine those in a 'final' select statement:

CREATE TABLE german_cities AS
(
       SELECT cities.* FROM cities, countries
       WHERE
              cities.country_id=countries.country_id
              AND countries.name='Germany'
);
 
SELECT b.name, c.name, count(*)
FROM
       customer_moves a,
       german_cities b, german_cities c
WHERE
        a."from" = b.city_id AND
        a."to" = c.city_id
GROUP BY b.name, c.name;

However, this has some side-effects:

  • The user/application has to take care of namespace-congestion, as every session will have to create their own (uniquely named) tables.
  • Filters that are to be applied to the final select will have to be (manually) integrated into the creation of the temporary tables, or they might get unnecessary big.
  • Whenever the base data changes, the (temporary) tables need to be updated or the analysis might give outdated results.

Please note that EXASolution does not know the concept of temporary tables. As this is an in-memory database and we really stick to the ACID rules, every table can be interpreted as temporary until it is committed.

How to implement multi-path SQL 

So if you implement multi-path SQL using intermediate tables, we suggest you turn off auto-commit for this and drop the tables when the analysis is complete. After that, it is 'safe' to commit your changes (if any).

Better way: views

The better solution, in this case, is to replace the temporary tables with views, giving the following side-effects

  • EXASolution's optimizer can see the big picture in the final select. It can choose to rearrange join orders, maybe even cross-reference conditions that originally were local in different views.
  • The same holds true to additional conditions that are applied in the final select: They can usually be propagated down to low levels, reducing the amount of data flowing through the query graph very early.
  • There is no outdated data. The views always access the currently valid (committed) fact data.
CREATE VIEW german_cities AS
(
       SELECT cities.* FROM cities, countries
       WHERE
              cities.country_id=countries.country_id
              AND countries.name='Germany'
);
 
SELECT b.name, c.name, count(*)
FROM
       customer_moves a,
       german_cities b, german_cities c
WHERE
        a."from" = b.city_id AND
        a."to" = c.city_id
GROUP BY b.name, c.name;

But in the case where the multi-path query is generated on the fly by some application, you still have the overhead of views being created and possible namespace clashes.

All-in-one: Named subselects

In these cases, named subselects (also called Common Table Expressions or CTE) might be the thing you've been looking for.

Basically, they work like views, but they are not database objects, but parts of your query. No need to create them, drop them, commit them: No overhead, but still all the flexibility you wish for:

WITH german_cities AS
(
       SELECT cities.* FROM cities, countries
       WHERE
              cities.country_id=countries.country_id
              AND countries.name='Germany'
)
SELECT b.name, c.name, count(*)
FROM
       customer_moves a,
       german_cities b, german_cities c
WHERE
        a."from" = b.city_id AND
        a."to" = c.city_id
GROUP BY b.name, c.name;
Category 1: SQL
Generated at Mon May 17 03:03:53 CEST 2021 using Jira 7.13.18#713018-sha1:e1230154f8ff8cc9272975bf568fc732e806fd68.