On next Monday August 5th 2019 starting at 7am CEST we will conduct some maintenance. You might encounter some issues using the Exasol User Portal and Issue Tracker! We will restore the Exasol User Portal and Issue Tracker before 9am CEST on Monday August 5th 2019.
The computation is usually based on raw data volume (the volume is comparable to the size your data would have if stored as CSV files). This value alone is sufficient to get a pretty sound estimation based on default parameters.
EXASOL database always compresses data, we assume a moderate compression factor of 2.5. EXASOL Database typically performs well with database ram (DB RAM) of 10% of raw data.
- RAW data volume: 2500GB
- Compression factor: 2.5
- DB RAM estimation: 10% (of raw data)
DB Disk Space
EXASOL Database stores 3 types of data, which has to be summed up to get an overall data volume to be stored:
- Compressed data (Tables, MEM_OBJECT_SIZE)
- Indexes (AUXILIARY_SIZE)
- Statistical and auditing data (STATISTICS_SIZE)
Indexes are automatically created and maintained by EXASOL database (see - SOL-6Getting issue details... STATUS for more details on that). The index volume highly depends on the chosen data model and queries and can range from 2 to over 100% of compressed data. A typical EXASOL system will have an index volume of about 15% of compressed data.
Statistical data itself is pretty small. If you switch auditing on, each login and each query is stored in the corresponding auditing tables. In this case, use a higher value for statistical data than default 5% or consider archiving historical auditing data offline and truncating the auditing tables on a regular basis.
The data is normally stored redundantly in the cluster to ensure, that after a node outage there is still a full set of data available in the cluster. Redundancy 2 means that there are 2 copies of each committed block in the cluster. Please note, that redundant copies are only be used in the case of server failures.
To avoid issues with insufficient disk space, we add some headroom for the following cases (without redundancy):
- If intermediate results of some queries don't fit into DB RAM, they are swapped out to a temporary volume.
- In addition, the persistent volume can be fragmented to some degree
- Indexes: 15% of compressed data
- Statistical and auditing data: 5% of compressed data
- Redundancy: 2
- Headroom for temp and fragmentation: 60% of compressed data (no redundancy)
Along with basic settings, we get the following numbers:
|Compressed data (net)||Overall data volume (net)||DB Disk Space|
As previously mentioned, EXASOL database typically performs well with DB RAM of 10% of raw data. To ensure that index maintenance does not affect the overall performance, indexes + some headroom should fit into DB RAM. The index scale factor below defines the minimum size of DB RAM. This is especially important in the case you have high index volume.
An additional factor is temporary data volume (e.g. intermediate query results). If your system creates large intermediate results, please consider adding some headroom for such temporary data to previously calculated DB RAM. Please refer to TEMP_DB_RAM-values in the statistical system tables.
DB RAM will be estimated according the following formula:
"Compressed Data"*"DB RAM Estimation %",
"Index Size"*"Index Scale Factor"
+ "Compressed Data"*"Temp DB RAM headroom %"
- Index scale factor: 1.3
- Temp DB RAM headroom: 0.0
MAX(1000GB*20%, 150GB*1.3) + 1000GB*0% = 200GB
|Compressed data (net)||Overall data volume (net)||DB RAM Estimation|
Please note, if you have a running system, you can use RECOMMENDED_DB_RAM_SIZE* columns of EXA_DB_SIZE* statistical system tables.
Backup Disk Space
EXASolution provides 2 types of backup: full and incremental one, which can be either stored cluster internally or written directly to an external backup storage.
A typical backup cycle is:
- Sunday: full backup with 10 days retention time
- Monday to Saturday: incremental backup with 3 days retention time
Cluster internal backups will be stored redundantly to ensure, that even in case of a node failure, there is still a valid backup. If you use an external storage, no backup redundancy is required.
Please note, that an incremental backup can vary in size depending on the change rate of your database, which is hard to predict. To ensure that there is enough disk space in the cluster, we calculate with the maximum incremental backup size. Alternatively, you can calculate only with full backups (e.g. 5 backups) to have a comfortable headroom as for disk space in the cluster.
- Full backup count: 2
- Incremental backup count: 3
- Maximum incremental backup size: 100% of full backup
- Cluster internal backup: Yes
- Backup redundancy: 2
The final backup space calculation includes also headroom of one backup size to avoid incidents during backup creation.
|Overall data volume (net)||Backup Disk Space|
|Compressed data (net)||Overall data volume (net)||DB RAM Estimation||DB Disk Space||Backup Disk Space|
Based on above assumptions about data and index size and about backup type and cycle, you'll need to provide at least 2 active + 1 standby server, each with
- 128GB RAM
At least 12% of physical RAM per node has to be reserved for operating system and process memory.
- 16x1.2TB SAS HDD (in RAID-1).