Uploaded image for project: 'Solution Center'
  1. Solution Center
  2. SOL-136

Data distribution

    XMLWordPrintable

    Details

    • Type: Explanation
    • Status: Published
    • Affects Version/s: EXASolution 4.2.0, EXASolution 5.0, EXASOL 6.0.0, Exasol 6.1.0
    • Fix Version/s: None
    • Component/s: EXASolution
    • Labels:
      None
    • Explanation:
      Hide

      Definition of the data distribution

      The following command defines the data distribution by setting some columns as key and the table distribution and distributes the rows accordingly.

      ALTER TABLE ... DISTRIBUTE BY
      

      If the chosen distribution key would lead to a considerable data unbalance, the database rejects such a distribution key. Of course data inserted later could unbalance the table.

      You can remove the set distribution attributes with the help of

      ALTER TABLE <name> DROP DISTRIBUTION KEYS.
      

      You can also set a distribution key also directly at creation time of a table within the corresponding DDL-Statement. Inserted data will be automatically distributed according to the attributes set, thus there is no need to redistribute data later.

      CREATE TABLE test1 (     c_id INT,     c_name varchar(40),     c_birthday date,     DISTRIBUTE BY c_id );  
      

      Best Practices

      • Choose only one single distribution key unless all joins to the tables use the identical multi-column join key (e.g. combined primary / foreign key).
      • At best choose primary keys or columns that don't contain many duplicate values for distribution key since they will probably be the best (i.e. most restrictive) join conditions in joins.
      • If the tables are used in other queries you should choose a distribution key that is used in other joins as well. Even if those joins cannot be executed locally because the joined table might not be partitioned, the joins are a little faster with one table partitioned by part of the join condition.
      • Don't choose any char / varchar columns for distribution key if possible.

      Propagation of a distribution key

      At executing of the CREATE TABLE AS SELECT command, distribution attributes will be derived from the orginal table if possible. The distribution of the new table will be set automatically.

      Data Reorganization

      After adding some nodes to your cluster, you can resume the work at once. These new nodes however has no data. Data distribution won't be adjusted automatically to shorten the downtime. You can choose then the suitable time slot and start data reorganization also according to the object priority.

      The command REORGANIZE adjustes the data distribution taking into account new nodes. It ensures that the data is distributed according the set key. You can reorganize single tables, whole schemas or the complete database.

      You need to start this command only after a cluster enlargment, otherwise the distribution will be maintained automatically.

      REORGANIZE (TABLE| SCHEMA| DATABASE)
      

      Please note the following
      *the reorganization will be performed in any case table-wise. If you reorganize a schema or your complete database, each table will be committed automatically.

      • reorganize sets a write-lock on the corresponding table (see SOL-135 for more details on EXASolution's transaction management)
      • reorganize re-creates all the indices
      • reorganize uses a large amount of RAM for re-distributing of data and re-creating the indexes. Is the RAM not sufficient, data will be swapped on the disk. Please ensure, that your persistent data volume has enough free space.
      Show
      Definition of the data distribution The following command defines the data distribution by setting some columns as key and the table distribution and distributes the rows accordingly. ALTER TABLE ... DISTRIBUTE BY If the chosen distribution key would lead to a considerable data unbalance, the database rejects such a distribution key. Of course data inserted later could unbalance the table. You can remove the set distribution attributes with the help of ALTER TABLE < name > DROP DISTRIBUTION KEYS . You can also set a distribution key also directly at creation time of a table within the corresponding DDL-Statement. Inserted data will be automatically distributed according to the attributes set, thus there is no need to redistribute data later. CREATE TABLE test1 ( c_id INT , c_name varchar (40), c_birthday date , DISTRIBUTE BY c_id ); Best Practices Choose only one single distribution key unless all joins to the tables use the identical multi-column join key (e.g. combined primary / foreign key). At best choose primary keys or columns that don't contain many duplicate values for distribution key since they will probably be the best (i.e. most restrictive) join conditions in joins. If the tables are used in other queries you should choose a distribution key that is used in other joins as well. Even if those joins cannot be executed locally because the joined table might not be partitioned, the joins are a little faster with one table partitioned by part of the join condition. Don't choose any char / varchar columns for distribution key if possible. Propagation of a distribution key At executing of the CREATE TABLE AS SELECT command, distribution attributes will be derived from the orginal table if possible. The distribution of the new table will be set automatically. Data Reorganization After adding some nodes to your cluster, you can resume the work at once. These new nodes however has no data. Data distribution won't be adjusted automatically to shorten the downtime. You can choose then the suitable time slot and start data reorganization also according to the object priority. The command REORGANIZE adjustes the data distribution taking into account new nodes. It ensures that the data is distributed according the set key. You can reorganize single tables, whole schemas or the complete database. You need to start this command only after a cluster enlargment, otherwise the distribution will be maintained automatically. REORGANIZE ( TABLE | SCHEMA | DATABASE ) Please note the following *the reorganization will be performed in any case table-wise. If you reorganize a schema or your complete database, each table will be committed automatically. reorganize sets a write-lock on the corresponding table (see SOL-135 for more details on EXASolution's transaction management) reorganize re-creates all the indices reorganize uses a large amount of RAM for re-distributing of data and re-creating the indexes. Is the RAM not sufficient, data will be swapped on the disk. Please ensure, that your persistent data volume has enough free space.
    • Category 1:
      Database Administration - Data Organization

      Attachments

        Issue Links

          Activity

            People

            • Assignee:
              CaptainEXA Captain EXASOL
              Reporter:
              CaptainEXA Captain EXASOL
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: