Benchkit: Create Your Own Reproducible Database Benchmarks
The Problem with Database Benchmarks
I have been running database benchmarks for years, and the same problems kept coming up. Missing configurations. Unreproducible results. Vendor claims that nobody could verify. No configuration details, no way to validate results months later with the latest software versions. Only showing single-user results that are useless as you need a system that handles hundreds of concurrent users.
Did you ever wish to reproduce yourself a database benchmark from a blog post that claimed “System X is 10x faster”? Or adjust a benchmark’s workload yourself as you suspect that vendors have picked only queries that favor their own systems over the competition? Or create your own benchmark with your own data and the ability to compare several vendors on different hardware setups and reproduce high concurrency results?
The Solution – Build a Flexible, Extensible Kit to Develop Benchmarks for Multiple Vendors
I’ve built a toolkit called Benchkit that was built to fix all these issues. One YAML file defines everything: databases, hardware, queries, concurrency, iterations. Run one command and cloud infrastructure spins up automatically. When the benchmark finishes, you get a package with all the data anyone needs to reproduce your results. No secrets. No missing configurations. The project is open sourced. Check it out yourself at github.com/exasol/benchkit in case you want to use it, extend it or report any issues or improvement ideas.
We’ve recently used it to compare Exasol with DuckDB and ClickHouse, and we will keep adding more comparisons that try to be fair, reproducible and helpful for the community to understand the different strengths and weaknesses of database technologies.
This post is part of an ongoing series on Benchkit that I’m going to publish over the coming weeks. Here I cover installation and running your first benchmark with Exasol vs. ClickHouse. Future posts will cover multinode clusters, concurrent users, debugging, and extending the framework. You can read them in any order as they come out, though starting here makes the most sense if you have never used Benchkit before.
What You Need Before Executing Your First Benchmark
If you want to run a Benchkit comparison between Exasol and ClickHouse, then getting it started for the first time takes about twenty minutes. Maybe less if your AWS credentials are already set up. Most of that time is waiting for cloud instances to boot and systems provisioning. Let me walk through what I had to install before this worked on a fresh laptop.
First thing you need is Python 3.12 because we use some of the latest syntax:
# Ubuntu/Debian
sudo apt update && sudo apt install python3.12 python3.12-venv python3-pip
# macOS with Homebrew
brew install python@3.12
Then Terraform for handling AWS provisioning for ClickHouse:
# Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
# macOS with Homebrew
brew tap hashicorp/tap && brew install hashicorp/tap/terraform
And obviously an AWS account with EC2 permissions, and an SSH key pair registered in AWS for the region you plan to use. The framework uses this to connect to instances after they boot up.
For Exasol, we are using our new free edition called Exasol Personal that manages its own infrastructure separately. Rather than requiring you to manage Terraform modules, Exasol Personal bundles everything into a single CLI. Point it at your AWS account, specify an instance type and cluster size, and it handles the rest. Exasol Personal is completely free and unlimited regarding data volume and hardware size.
Installation
Once all that is sorted, the actual installation takes about a minute.
git clone https://github.com/exasol/benchkit
cd benchkit && pip install -e .
Run benchkit --help to verify the installation worked.
Your First Benchmark Configuration with Exasol and ClickHouse
Below is a complete configuration comparing Exasol with ClickHouse on a benchmark that is based on 30GB TPC-H benchmark data and its standard 22 SQL queries, all executed 7 times to showcase the execution time variation.
Notice that we define two separate environments: one for Exasol (which manages its own infrastructure) and one for ClickHouse (which uses Terraform). This split approach lets each system use whichever deployment method works best for it.
# exasol_benchmark.yaml
project_id: "exasol_benchmark"
title: "Exasol vs ClickHouse Benchmark"
author: "Your Name"
execution:
parallel: true
environments:
exasol_env:
mode: "managed"
clickhouse_cloud:
mode: "aws"
region: "eu-west-1"
allow_external_database_access: true
instances:
clickhouse:
instance_type: "r5d.4xlarge"
disk:
type: "local"
os_image: "ubuntu-22.04"
ssh_key_name: "my-key"
ssh_private_key_path: "~/.ssh/my-key.pem"
systems:
- name: "exasol"
kind: "exasol"
environment: "exasol_env"
version: "2025.1.8"
setup:
method: "managed"
exasol_pe_version: "1.0.0"
instance_type: "r5d.4xlarge"
data_volume_size: 200
schema: "BENCHMARK"
- name: "clickhouse"
kind: "clickhouse"
environment: "clickhouse_cloud"
version: "25.10.2.65"
setup:
method: "native"
use_additional_disk: true
data_dir: "/data/clickhouse"
host: "$CLICKHOUSE_PUBLIC_IP"
port: 8123
username: "default"
password: "clickhouse123"
extra:
memory_limit: "64g"
max_threads: "16"
max_memory_usage: "60000000000"
workload:
name: "tpch"
scale_factor: 30
runs_per_query: 7
warmup_runs: 1
The configuration has four main parts:
- environments: defines infrastructure modes (managed for Exasol, aws for ClickHouse)
- systems: list of database systems under test with their environment references
- workload: the benchmark to run (TPC-H with scale factor)
- report: visualization options
Understanding the Hybrid Infrastructure Approach
What does mode: "managed" actually do? It tells Benchkit to get out of the way. Exasol’s own CLI takes over from there – spinning up the EC2 instance, attaching storage, setting up networking. Benchkit waits until everything is ready, then connects.
ClickHouse still goes through Terraform because it has no equivalent self-provisioning tool. So you end up with two parallel infrastructure paths:
- Exasol:
exasol init awscreates and manages its instance - ClickHouse: Benchkit’s Terraform creates and manages its instance
This sounds complicated, but in practice you run the same four commands regardless. The split happens behind the scenes.
The Four-Command Workflow
# 1. Provision cloud infrastructure (for ClickHouse; Exasol provisions itself during setup)
benchkit infra apply --config exasol_benchmark.yaml
# 2. Install databases, load data and execute queries
benchkit run --config exasol_benchmark.yaml --full
# 3. Generate reports
benchkit report --config exasol_benchmark.yaml
# 4. Clean up (important for costs!)
benchkit infra destroy --config exasol_benchmark.yaml
Both systems get provisioned during infra apply, but through different mechanisms. Terraform handles ClickHouse. The Exasol CLI handles Exasol. From your perspective, one command still brings up everything. The same applies to infra destroy: both systems get torn down, each through its respective tool.
What You Get
After the benchmark finishes you will see the following structure:
results/exasol_benchmark/
├── runs.csv # All query execution times
├── runs_exasol.csv # Results per system
├── runs_clickhouse.csv
├── summary.json # Statistical summary
├── system_exasol.json # Hardware specifications
├── system_clickhouse.json
├── setup_exasol.json # Setup commands
├── setup_clickhouse.json
├── managed/ # Exasol-specific state
│ └── exasol/
│ └── state/ # Exasol terraform state
└── reports/
├── 1-short/ # Short summary
├── 2-results/ # Detailed analysis
└── 3-full/ # Full reproduction guide
├── REPORT.md
├── attachments/ # All data files
└── exasol_benchmark-benchmark.zip
The managed/exasol/state/ directory contains the Exasol-specific infrastructure state. This is separate from Benchkit’s main Terraform state and is managed entirely by the Exasol CLI.
Core CLI Commands
Infrastructure Management
# Preview the resources to create (ClickHouse only for Exasol setups)
benchkit infra plan --config config.yaml
# Create all resources
benchkit infra apply --config config.yaml
# Tear everything down
benchkit infra destroy --config config.yaml
With Exasol configurations, apply runs both Terraform (for ClickHouse) and the Exasol CLI (for Exasol) in sequence.
Phase-wise Execution
For more control you can execute the benchmark phases separately:
# Phase 1: install databases (Exasol provisions its infrastructure here)
benchkit setup --config config.yaml
# Phase 2: generate and load data
benchkit load --config config.yaml
# Phase 3: run queries
benchkit run --config config.yaml
# Or everything in one go:
benchkit run --config config.yaml --full
Use cases for phase-wise execution:
- Run queries again without reinstalling databases
- Debug installation problems separately from query problems
- Load different datasets without reprovisioning
Filtering by System
Sometimes you want to skip one system entirely, maybe because it already finished or because you are debugging just one of them.
# Only run ClickHouse queries (skip Exasol)
benchkit run --config config.yaml --systems clickhouse
# Only set up Exasol
benchkit setup --config config.yaml --systems exasol
# Several systems
benchkit run --config config.yaml --systems "exasol,clickhouse"
Basic Configuration Explained
Let me walk through each section of the YAML file so you know what you can change.
Environment Configuration for Exasol
The environment block defines where and how infrastructure gets created.
environments:
# Exasol - self-managed infrastructure
exasol_env:
mode: "managed" # Delegates to Exasol CLI
# ClickHouse - Benchkit-managed infrastructure
clickhouse_cloud:
mode: "aws"
region: "eu-west-1"
os_image: "ubuntu-22.04"
ssh_key_name: "my-key"
ssh_private_key_path: "~/.ssh/my-key.pem"
allow_external_database_access: true
instances:
clickhouse:
instance_type: "r5d.4xlarge"
disk:
type: "local" # Local NVMe (fastest)
label: "bench-ch"
Each system references its environment by name, which lets you mix Terraform-based and self-managed deployments in the same configuration file.
Exasol Configuration
Here is what a typical Exasol system block looks like.
systems:
- name: "exasol"
kind: "exasol"
environment: "exasol_env" # References the managed environment
version: "2025.1.8" # Exasol version
setup:
method: "managed" # Uses Exasol CLI
exasol_pe_version: "1.0.0" # Exasol CLI version
cluster_size: 1 # Single node
instance_type: "r5d.4xlarge"
data_volume_size: 200 # GB for data storage
schema: "BENCHMARK"
ClickHouse Configuration
ClickHouse needs a few more settings because it does not have a managed deployment option.
systems:
- name: "clickhouse"
kind: "clickhouse"
environment: "clickhouse_cloud" # References the AWS environment
version: "25.10.2.65"
setup:
method: "native" # APT installation
use_additional_disk: true
data_dir: "/data/clickhouse"
host: "$CLICKHOUSE_PUBLIC_IP"
port: 8123
username: "default"
password: "clickhouse123"
extra:
memory_limit: "64g"
max_threads: "16"
max_memory_usage: "60000000000"
ClickHouse configuration uses the terraform-managed infrastructure defined in its environment.
Workload Configuration
workload:
name: "tpch"
scale_factor: 30 # 30 GB dataset (SF30)
runs_per_query: 7 # Statistical significance
warmup_runs: 1 # Stabilise caches
queries:
include: ["Q01", "Q06", "Q13"] # Optional: only run these
#exclude: ["Q22"] # Optional: skip this one
Practical Scenario: Quick Database Comparison
Chances are you need a quick answer to “which database is faster for our workload?” Here is the shortest config that gives you real numbers. Thirty minutes from start to finish.
project_id: "quick_comparison"
environments:
exasol_env:
mode: "managed"
clickhouse_cloud:
mode: "aws"
region: "eu-west-1"
ssh_key_name: "my-key"
ssh_private_key_path: "~/.ssh/my-key.pem"
instances:
clickhouse:
instance_type: "r5d.xlarge"
disk:
type: "local"
systems:
- name: "exasol"
kind: "exasol"
environment: "exasol_env"
version: "2025.1.8"
setup:
method: "managed"
exasol_pe_version: "1.0.0"
instance_type: "r5d.xlarge"
data_volume_size: 100
- name: "clickhouse"
kind: "clickhouse"
environment: "clickhouse_cloud"
version: "25.10.2.65"
setup:
method: "native"
workload:
name: "tpch"
scale_factor: 10
runs_per_query: 5
warmup_runs: 1
Expect roughly 30 minutes from start to finish, waiting for infrastructure provisioning and data loading. On r5d.xlarge instances, that costs around five dollars.
These results are directional, not definitive. Scale factor 10 fits comfortably in memory on both systems, which may not reflect production behaviour. For serious comparisons, bump the scale factor to 100 and run at least seven iterations per query to get stable medians.
What Next?
This article covered the basics of my new toolkit called Benchkit. Where you go from here depends on what you need.
If you want to scale up, other posts in the series will cover multi-node clusters, multi-user tests for simulating concurrent query execution, and the option to define database-specific query alternatives for being able to tune the SQL for particular database technologies. Even if it’s not ideal to have to tweak SQL for better performance, it allows you to compare the maximal speed for the query execution.
If you are running into problems, please check the upcoming debugging post which covers status commands, debug flags, and result verification. That post will also discuss standalone packages for long-term reproducibility, best practices for cost management, and how to extend Benchkit with new database systems.
The code is on GitHub if you want to dig into it: https://github.com/exasol/benchkit