Details

    • Type: Explanation
    • Status: Published
    • Affects Version/s: EXASolution 4.2.7, EXASOL 6.0.0, Exasol 6.1, Exasol 6.2.x
    • Fix Version/s: None
    • Component/s: EXASolution
    • Labels:
      None
    • Explanation:
      Hide

      Question

      Scenario: Merge data from a (small) source table into a (big) target table.

      The source contains only a few rows (below 10.000, mostly about 1000).
      The target contains some million records.

      Solution1:
      delerte from target where id in (select id from source);
      insert into target select * from source;

      Solution 2:
      merge into target t using source s on (t.id = s.id)
      when matched then update ...
      when not matched then insert;

      Questions:
      Is the merge command more effectiv than a delete plus insert?
      Is there a need to delete duplicate rows from the source in advance?

      Answer

      Testing with a variety of source rowsets against a target with about 6 mio. rows
      showed a slighty time advance using the merge command.

      Overall less internal steps are performed in the merge compared to delete/insert. Furthermore the subselect in the delete command will be materialized in a temporäry table and then replicated over all knodes, because the number or rows is below the replication border. This causes additional load in the network.

      The suggestion is therefore: use the merge command

      Cleaning the source from duplicates is mandatory since the on-condition within the merge-statement needs a unique source rowset.
      Otherwise the exception "Unable to get a stable set of rows in the source tables"
      will be thrown.

      Show
      Question Scenario: Merge data from a (small) source table into a (big) target table. The source contains only a few rows (below 10.000, mostly about 1000). The target contains some million records. Solution1: delerte from target where id in (select id from source); insert into target select * from source; Solution 2: merge into target t using source s on (t.id = s.id) when matched then update ... when not matched then insert; Questions: Is the merge command more effectiv than a delete plus insert? Is there a need to delete duplicate rows from the source in advance? Answer Testing with a variety of source rowsets against a target with about 6 mio. rows showed a slighty time advance using the merge command. Overall less internal steps are performed in the merge compared to delete/insert. Furthermore the subselect in the delete command will be materialized in a temporäry table and then replicated over all knodes, because the number or rows is below the replication border. This causes additional load in the network. The suggestion is therefore: use the merge command Cleaning the source from duplicates is mandatory since the on-condition within the merge-statement needs a unique source rowset. Otherwise the exception "Unable to get a stable set of rows in the source tables" will be thrown.
    • Category 1:
      SQL

      Attachments

        Activity

          People

          • Assignee:
            CaptainEXA Captain EXASOL
            Reporter:
            CaptainEXA Captain EXASOL
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: