[EXASOL-2423] Exasol on AWS 6.1.0-8: Enhanced Failover Capabilities Created: 27.01.2019  Updated: 24.01.2020  Resolved: 01.02.2019

Status: Resolved
Project: EXASOL Roadmap
Component/s: None
Fix Version/s: Exasol 6.1.0

Type: New Feature Priority: Major
Reporter: Captain EXASOL Assignee: Captain EXASOL
Resolution: Fixed Votes: 0
Labels: None

Issue Links:


Before Exasol 6.1 for AWS a spare node had to be running all the time to provide automatic fail-safety capabilities in case of a node failure.

Starting with 6.1 on AWS failover capabilities are also available if:

  • no spare/standby node is configured
  • a spare node was configured but suspended (not running)

In case an active node fails and a runnig spare node is available the spare node automatically replaces the failed node as usual (standard behaviour also before Exasol 6.1),

New behaviour with Exasol 6.1:

In case an active node fails and no running spare node can be detected the failed node will be restarted by the cloud failover plugin (that means the virtual machine will be stopped and started again). In most cases, the instance is migrated to a new underlying host computer when it's started.

If after this process step the node still fails the system checks whether a cold spare node is available.  A cold spare node is a spare node that was configured and then suspended thus causing no costs for a running VM instance (storage costs still apply). In case a cold standby node can be found it is started and the standard fail-over mechanism is triggered.

Prerequisites for cold standby nodes and node reprovisioning:

  • EXAoperation (including the failover-plugin) need direct access to the relevant AWS services (i.e. not through a proxy)
  • Please note that a cold standby node can only be used and activated in case  the remaining cluster has quorum when a node fails.  That means a cluster consisting of Management Node and at least 3 Data Nodes is required to use the cold standby node functionality with one configured cold standby node.
  • If no standby node is configured, at least the Management Node and 2 Data Nodes are required to leverage the reprovisioning functionality in case of a node failure as described in the last section.

Advantages/Disadvantages of a hot spare node

A running hot spare node provides the shortest service interruption in case of a node failure. On the other hand it causes the highest costs.


Generated at Sat Mar 06 06:21:14 CET 2021 using Jira 7.13.18#713018-sha1:e1230154f8ff8cc9272975bf568fc732e806fd68.