In-memory Database

10 Questions: the TPC-H Benchmark

11 Jul 2019 | Share

10 Questions: the TPC-H Benchmark

The Who, What, Why and When of the Transaction Processing Performance Council (TPC)

1. What is the TPC?

The Transaction Processing Performance Council (TPC) is “…a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.”

2. Isn’t it a bit dull and theoretical?

Far from it – the work of the TPC has led to increased competition between vendors, vastly faster databases and has been a motivation for manufacturers to push technology to the limits.

3. Who are members of the TPC?

Their members include many of the largest database manufacturers, including Microsoft, Oracle and IBM.

EXASOL is not a member and so has no influence over the benchmarks it produces.

4. What kind of benchmarks do they produce?

They started by producing benchmarks for OLTP (transaction processing) systems in the 1980s, but then decided to cater for a new wave of Decision Support Systems (DSS) that became popular in the 1990s.

Their first such benchmark, TPC-D, appeared in 1994 but was rendered obsolete by technological change and replaced by TPC-H (ad hoc decision support benchmark) in 1999.

5. Why did TPC-D become obsolete?

TPC-D became obsolete as new database features like join indices, summary tables and materialized views made the original feature benchmarks less relevant

Also, just to indicate how far such systems have come since then, the TPC-D benchmarks were designed for a maximum scale factor of 3TB, which today would be considered modest, even for a data mart.

6. TPC-H is the “ad hoc decision support” benchmark – what does that mean?

Decision Support Systems (DSS) are systems that support business and organizational decision-making activities.

Such activities can include:

  • Comparison of sales figures between one week and the next
  • Predicting revenue figures – based on new product sales assumptions
  • Evaluating the effect of different decisions, based on analysis of past experiences

“Institutional DSS” refers to those systems that deal with recurring decisions. “Adhoc DSS” is much more interesting – it involves problems that are not anticipated and not recurring.

7. Why is TPC-H a good test of such systems?

There have been over 220 recorded TPC-H benchmarks over the years. Many database vendors have posted results on a vast range of hardware and at various scale factors up to 100TB.

By running the same unbiased scripts you can compare database vendor with database vendor – and you can often see how well the same database runs on different hardware.

With the wide range of scale factors, you can also see how well certain databases scale (and often how badly they scale).

8. How do you run a TPC-H benchmark

The specification (download PDF) runs to 137 pages – but to summarize:

  • The database consists of a 3rd Normal Form (3NF) schema consisting of 8 tables.
  • The benchmarks can be run using pre-determined database sizes, referred to as “scale factors”. Each scale factor corresponds to the raw data size of the data warehouse.
  • 6 of the 8 tables grow linearly with the scale factor and are populated with data that is uniformly distributed.
  • 22 complex and long running query templates and 2 data refresh processes (insert and delete) are run in parallel to test concurrency.
  • The number of concurrent processes increases with the scale factor – for example, for the 100 TB benchmark you run 11 concurrent processes.
  • Most of the 137 pages are concerned with things you are not allowed to do to make queries run faster or to run at all on your database. No cheating!
  • An external audit is required of your hardware and processes before your benchmark can be published – just to be absolutely sure that you haven’t cheated.

9. What about TPC-DS ? Isn’t that supposed to be a better test?

That benchmark was certainly designed to test a wider range of features, but since nobody has yet posted any numbers, this benchmark is currently not useful as a comparison.

Some vendors are announcing internal numbers that suggest that they have run the benchmark, but often they have just chosen the easy queries from the 99 queries available. They rarely disclose how many of the 99 queries their database can run, unaided.

Exasol, by the way, can run all 99 queries.

10. How does our analytics database rank on the TPC-H benchmark?

For the past 11 years our analytics database has maintained its TPC-H benchmark position as the undisputed leader – by a significant margin – for both raw performance and price-performance.

These results demonstrate our speed, scalability and cost/performance which have consistently dominated the TPC-H benchmarks since 2008.

Erfahren Sie mehr über unseren TPC-H-Benchmark

10 trends impacting data analytics

Now that we’re well and truly in the age of data, what’s coming next? 

FREE WHITE PAPER