Business continuity – Beyond buzz
You probably aren’t fully aware of how your daily work and your company’s business success have become dependent on functioning IT infrastructure. In today’s digital world, system outages can lead to dramatic financial losses or negative impact on your reputation. You surely remember examples of when, for example Facebook was offline for a couple of hours. A couple of hours doesn’t sound harmful, right? But it’s an eternity in digital time. Now take that a step further to imagine your business going offline for a day – or even more!
That’s the reason why the term word “business continuity” has become more frequently spoken about and has gained more and more attention in IT departments across industries and sectors. Despite the spotlight on business continuity, most of the systems underpinning this very continuity are not really redundantly operated, and there is a lot of work to do in that area.
Prevention over cure: Dual data center strategy
Although most IT departments may be able to cope with “local” problems, they struggle when the scope becomes more global and can’t be solved within a single data center. If your whole data center is affected by a network outage because of construction work and becomes completely inoperable, neither your high availability SAN nor the hot standby servers within your data center can help you – you need a disaster recovery and business continuity strategy to address such “disasters”.
Typically you would consider implementing a secondary data center to ensure business continuity in such situations (sometimes also referred to as a twin data center or dual data center).
If the primary data center goes down, the secondary data center must be able to instantly take over operations. This presupposes that all your business-critical data is available at both sites – always.
Synchronous or asynchronous replication?
Two operational replication modes for your data are generally available for dual data center setups: synchronous and asynchronous.
In case of asynchronous implementations the primary system that directly communicates with the client confirms data modifications before they have been persisted at the secondary site.
Such systems typically follow a lazy replication approach accepting that both sites are not instantly in-sync.
As a consequence – as the second data center takes over operations when the primary data center goes down – some data may be missing or may differ from the inaccessible system at the primary data center.
Depending on your processes it might be acceptable to continue business operations using the systems in the secondary data center knowing that there is some unresolvable data inconsistency unless you can access the systems in the primary data center.
In such cases the data inconsistency problem continues to get worse as you continue business operations using the incomplete and inconsistent data (e.g. products may be delivered twice or never, invoices might be paid twice).
Obviously, in most businesses, these kinds of problems are unacceptable. Data modification needs to be executed synchronously on both sites to ensure absolute data consistency – always. So why should one use asynchronous replication instead of synchronous replication in the face of such risks?
The answer is simple: synchronous replication always affects performance in cases of write operations and puts higher demands on the connectivity between both data centers. The impact might be quite low in cases of load operations and higher in cases of high transactional (read/write) load.
So the trade-off here is write performance and network requirements between both sites versus data consistency:
|Asynchronous approach||Synchronous approach|
|Read performance||no impact||no impact|
|Write performance||no impact||Low impact on load performance.
Higher impact on transactional performance (many single transactions).
Impact depends on network link capabilities.
|Network link requirements||High latency, relatively low bandwidth, sufficient in case of low insert load||Medium-to-low latency, medium-to-high throughput|
|Data consistency in case of interruption||High risk of inconsistency – may not be possible to solve without connection to primary site||Full consistency and ACID compliance|
EXASOL’s approach to efficient synchronous replication
Although synchronous replication puts higher requirements on the network link between both data centers and may affect write performance it’s the only valid choice for real high availability and disaster recovery.
That’s the reason why EXASOL decided to go for synchronous replication. EXASOL is an in-memory database that runs on a cluster of standard servers and is used by many companies for their critical data-driven business operations.
In contrast to many other database systems, EXASOL comes with a dedicated cluster storage subsystem that efficiently handles data redundancy and high availability and is optimized for low latency and maximum throughput.
In cases of a dual data center configuration EXASOL’s storage layer replicates the master data segments from the primary site synchronously to the EXASOL storage subsystem within the secondary site.
Think of this configuration as one database cluster that is simply stretched across two data centers. The actual database engine components on the secondary site’s nodes do not have to be up and running all the time. As soon the primary site fails the secondary site can take over operations quickly.
Keeping business flowing
This combination of robustness, efficiency and simplicity for replicating data across two data centers is unique in the market and solves many existing problems around how to setup a robust and performing dual data center operation of your analytics database. This is a clear reason why we’ve seen successful real-world implementations; even a data center in the sensitive banking sector relies on EXASOL’s synchronous replication strategy based on this “stretched cluster” paradigm.
If you want to learn more about EXASOL download our whitepaper or try out EXASOL.