Benchkit: Create Your Own Reproducible Database Benchmarks

Oleksandr Kozachuk

· 19th January 2026 · 10 mins read

The Problem with Database Benchmarks

I have been running database benchmarks for years, and the same problems kept coming up. Missing configurations. Unreproducible results. Vendor claims that nobody could verify. No configuration details, no way to validate results months later with the latest software versions. Only showing single-user results that are useless as you need a system that handles hundreds of concurrent users.

Did you ever wish to reproduce yourself a database benchmark from a blog post that claimed “System X is 10x faster”? Or adjust a benchmark’s workload yourself as you suspect that vendors have picked only queries that favor their own systems over the competition? Or create your own benchmark with your own data and the ability to compare several vendors on different hardware setups and reproduce high concurrency results?

The Solution – Build a Flexible, Extensible Kit to Develop Benchmarks for Multiple Vendors

I’ve built a toolkit called Benchkit that was built to fix all these issues. One YAML file defines everything: databases, hardware, queries, concurrency, iterations. Run one command and cloud infrastructure spins up automatically. When the benchmark finishes, you get a package with all the data anyone needs to reproduce your results. No secrets. No missing configurations. The project is open sourced. Check it out yourself at github.com/exasol/benchkit in case you want to use it, extend it or report any issues or improvement ideas.

We’ve recently used it to compare Exasol with DuckDB and ClickHouse, and we will keep adding more comparisons that try to be fair, reproducible and helpful for the community to understand the different strengths and weaknesses of database technologies.

This post is part of an ongoing series on Benchkit that I’m going to publish over the coming weeks. Here I cover installation and running your first benchmark with Exasol vs. ClickHouse. Future posts will cover multinode clusters, concurrent users, debugging, and extending the framework. You can read them in any order as they come out, though starting here makes the most sense if you have never used Benchkit before.

What You Need Before Executing Your First Benchmark

If you want to run a Benchkit comparison between Exasol and ClickHouse, then getting it started for the first time takes about twenty minutes. Maybe less if your AWS credentials are already set up. Most of that time is waiting for cloud instances to boot and systems provisioning. Let me walk through what I had to install before this worked on a fresh laptop.

First thing you need is Python 3.12 because we use some of the latest syntax:

# Ubuntu/Debian
sudo apt update && sudo apt install python3.12 python3.12-venv python3-pip
# macOS with Homebrew
brew install python@3.12

Then Terraform for handling AWS provisioning for ClickHouse:

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
# macOS with Homebrew
brew tap hashicorp/tap && brew install hashicorp/tap/terraform

And obviously an AWS account with EC2 permissions, and an SSH key pair registered in AWS for the region you plan to use. The framework uses this to connect to instances after they boot up.

For Exasol, we are using our new free edition called Exasol Personal that manages its own infrastructure separately. Rather than requiring you to manage Terraform modules, Exasol Personal bundles everything into a single CLI. Point it at your AWS account, specify an instance type and cluster size, and it handles the rest. Exasol Personal is completely free and unlimited regarding data volume and hardware size.

Installation

Once all that is sorted, the actual installation takes about a minute.

git clone https://github.com/exasol/benchkit

cd benchkit && pip install -e .

Run benchkit --help to verify the installation worked.

Your First Benchmark Configuration with Exasol and ClickHouse

Below is a complete configuration comparing Exasol with ClickHouse on a benchmark that is based on 30GB TPC-H benchmark data and its standard 22 SQL queries, all executed 7 times to showcase the execution time variation.

Notice that we define two separate environments: one for Exasol (which manages its own infrastructure) and one for ClickHouse (which uses Terraform). This split approach lets each system use whichever deployment method works best for it.

# exasol_benchmark.yaml
project_id: "exasol_benchmark"
title: "Exasol vs ClickHouse Benchmark"
author: "Your Name"

execution:
  parallel: true

environments:
  exasol_env:
    mode: "managed"

  clickhouse_cloud:
    mode: "aws"
    region: "eu-west-1"
    allow_external_database_access: true
    instances:
      clickhouse:
        instance_type: "r5d.4xlarge"
        disk:
          type: "local"
    os_image: "ubuntu-22.04"
    ssh_key_name: "my-key"
    ssh_private_key_path: "~/.ssh/my-key.pem"

systems:
  - name: "exasol"
    kind: "exasol"
    environment: "exasol_env"
    version: "2025.1.8"
    setup:
      method: "managed"
      exasol_pe_version: "1.0.0"
      instance_type: "r5d.4xlarge"
      data_volume_size: 200
      schema: "BENCHMARK"

  - name: "clickhouse"
    kind: "clickhouse"
    environment: "clickhouse_cloud"
    version: "25.10.2.65"
    setup:
      method: "native"
      use_additional_disk: true
      data_dir: "/data/clickhouse"
      host: "$CLICKHOUSE_PUBLIC_IP"
      port: 8123
      username: "default"
      password: "clickhouse123"
      extra:
        memory_limit: "64g"
        max_threads: "16"
        max_memory_usage: "60000000000"

workload:
  name: "tpch"
  scale_factor: 30
  runs_per_query: 7
  warmup_runs: 1

The configuration has four main parts:

environments: defines infrastructure modes (managed for Exasol, aws for ClickHouse)
systems: list of database systems under test with their environment references
workload: the benchmark to run (TPC-H with scale factor)
report: visualization options

Understanding the Hybrid Infrastructure Approach

What does mode: "managed" actually do? It tells Benchkit to get out of the way. Exasol’s own CLI takes over from there – spinning up the EC2 instance, attaching storage, setting up networking. Benchkit waits until everything is ready, then connects.

ClickHouse still goes through Terraform because it has no equivalent self-provisioning tool. So you end up with two parallel infrastructure paths:

Exasol: exasol init aws creates and manages its instance
ClickHouse: Benchkit’s Terraform creates and manages its instance

This sounds complicated, but in practice you run the same four commands regardless. The split happens behind the scenes.

The Four-Command Workflow

# 1. Provision cloud infrastructure (for ClickHouse; Exasol provisions itself during setup)
benchkit infra apply --config exasol_benchmark.yaml
# 2. Install databases, load data and execute queries
benchkit run --config exasol_benchmark.yaml --full
# 3. Generate reports
benchkit report --config exasol_benchmark.yaml
# 4. Clean up (important for costs!)
benchkit infra destroy --config exasol_benchmark.yaml

Both systems get provisioned during infra apply, but through different mechanisms. Terraform handles ClickHouse. The Exasol CLI handles Exasol. From your perspective, one command still brings up everything. The same applies to infra destroy: both systems get torn down, each through its respective tool.

What You Get

After the benchmark finishes you will see the following structure:

results/exasol_benchmark/
├── runs.csv                    # All query execution times
├── runs_exasol.csv             # Results per system
├── runs_clickhouse.csv
├── summary.json                # Statistical summary
├── system_exasol.json          # Hardware specifications
├── system_clickhouse.json
├── setup_exasol.json           # Setup commands
├── setup_clickhouse.json
├── managed/                    # Exasol-specific state
│   └── exasol/
│       └── state/              # Exasol terraform state
└── reports/
    ├── 1-short/                # Short summary
    ├── 2-results/              # Detailed analysis
    └── 3-full/                 # Full reproduction guide
        ├── REPORT.md
        ├── attachments/        # All data files
        └── exasol_benchmark-benchmark.zip

The managed/exasol/state/ directory contains the Exasol-specific infrastructure state. This is separate from Benchkit’s main Terraform state and is managed entirely by the Exasol CLI.

Core CLI Commands

Infrastructure Management

# Preview the resources to create (ClickHouse only for Exasol setups)

benchkit infra plan --config config.yaml

# Create all resources

benchkit infra apply --config config.yaml

# Tear everything down

benchkit infra destroy --config config.yaml

With Exasol configurations, apply runs both Terraform (for ClickHouse) and the Exasol CLI (for Exasol) in sequence.

Phase-wise Execution

For more control you can execute the benchmark phases separately:

# Phase 1: install databases (Exasol provisions its infrastructure here)

benchkit setup --config config.yaml

# Phase 2: generate and load data

benchkit load --config config.yaml

# Phase 3: run queries

benchkit run --config config.yaml

# Or everything in one go:

benchkit run --config config.yaml --full

Use cases for phase-wise execution:

Run queries again without reinstalling databases
Debug installation problems separately from query problems
Load different datasets without reprovisioning

Filtering by System

Sometimes you want to skip one system entirely, maybe because it already finished or because you are debugging just one of them.

# Only run ClickHouse queries (skip Exasol)

benchkit run --config config.yaml --systems clickhouse

# Only set up Exasol

benchkit setup --config config.yaml --systems exasol

# Several systems

benchkit run --config config.yaml --systems "exasol,clickhouse"

Basic Configuration Explained

Let me walk through each section of the YAML file so you know what you can change.

Environment Configuration for Exasol

The environment block defines where and how infrastructure gets created.

environments:
  # Exasol - self-managed infrastructure
  exasol_env:
    mode: "managed"  # Delegates to Exasol CLI

  # ClickHouse - Benchkit-managed infrastructure
  clickhouse_cloud:
    mode: "aws"
    region: "eu-west-1"
    os_image: "ubuntu-22.04"
    ssh_key_name: "my-key"
    ssh_private_key_path: "~/.ssh/my-key.pem"
    allow_external_database_access: true

    instances:
      clickhouse:
        instance_type: "r5d.4xlarge"
        disk:
          type: "local"      # Local NVMe (fastest)
        label: "bench-ch"

Each system references its environment by name, which lets you mix Terraform-based and self-managed deployments in the same configuration file.

Exasol Configuration

Here is what a typical Exasol system block looks like.

systems:
  - name: "exasol"
    kind: "exasol"
    environment: "exasol_env"     # References the managed environment
    version: "2025.1.8"           # Exasol version
    setup:
      method: "managed"           # Uses Exasol CLI
      exasol_pe_version: "1.0.0"  # Exasol CLI version
      cluster_size: 1             # Single node
      instance_type: "r5d.4xlarge"
      data_volume_size: 200       # GB for data storage
      schema: "BENCHMARK"

ClickHouse Configuration

ClickHouse needs a few more settings because it does not have a managed deployment option.

systems:
  - name: "clickhouse"
    kind: "clickhouse"
    environment: "clickhouse_cloud"  # References the AWS environment
    version: "25.10.2.65"
    setup:
      method: "native"               # APT installation
      use_additional_disk: true
      data_dir: "/data/clickhouse"
      host: "$CLICKHOUSE_PUBLIC_IP"
      port: 8123
      username: "default"
      password: "clickhouse123"
      extra:
        memory_limit: "64g"
        max_threads: "16"
        max_memory_usage: "60000000000"

ClickHouse configuration uses the terraform-managed infrastructure defined in its environment.

Workload Configuration

workload:
  name: "tpch"
  scale_factor: 30            # 30 GB dataset (SF30)
  runs_per_query: 7           # Statistical significance
  warmup_runs: 1              # Stabilise caches

  queries:
    include: ["Q01", "Q06", "Q13"]  # Optional: only run these
    #exclude: ["Q22"]               # Optional: skip this one

Practical Scenario: Quick Database Comparison

Chances are you need a quick answer to “which database is faster for our workload?” Here is the shortest config that gives you real numbers. Thirty minutes from start to finish.

project_id: "quick_comparison"

environments:
  exasol_env:
    mode: "managed"
  clickhouse_cloud:
    mode: "aws"
    region: "eu-west-1"
    ssh_key_name: "my-key"
    ssh_private_key_path: "~/.ssh/my-key.pem"
    instances:
      clickhouse:
        instance_type: "r5d.xlarge"
        disk:
          type: "local"

systems:
  - name: "exasol"
    kind: "exasol"
    environment: "exasol_env"
    version: "2025.1.8"
    setup:
      method: "managed"
      exasol_pe_version: "1.0.0"
      instance_type: "r5d.xlarge"
      data_volume_size: 100

  - name: "clickhouse"
    kind: "clickhouse"
    environment: "clickhouse_cloud"
    version: "25.10.2.65"
    setup:
      method: "native"

workload:
  name: "tpch"
  scale_factor: 10
  runs_per_query: 5
  warmup_runs: 1

Expect roughly 30 minutes from start to finish, waiting for infrastructure provisioning and data loading. On r5d.xlarge instances, that costs around five dollars.

These results are directional, not definitive. Scale factor 10 fits comfortably in memory on both systems, which may not reflect production behaviour. For serious comparisons, bump the scale factor to 100 and run at least seven iterations per query to get stable medians.

What Next?

This article covered the basics of my new toolkit called Benchkit. Where you go from here depends on what you need.

If you want to scale up, other posts in the series will cover multi-node clusters, multi-user tests for simulating concurrent query execution, and the option to define database-specific query alternatives for being able to tune the SQL for particular database technologies. Even if it’s not ideal to have to tweak SQL for better performance, it allows you to compare the maximal speed for the query execution.

If you are running into problems, please check the upcoming debugging post which covers status commands, debug flags, and result verification. That post will also discuss standalone packages for long-term reproducibility, best practices for cost management, and how to extend Benchkit with new database systems.

The code is on GitHub if you want to dig into it: https://github.com/exasol/benchkit