The Who, What, Why and When of the Transaction Processing Performance Council (TPC)
1. What is the TPC?
The Transaction Processing Performance Council (TPC) is “… a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.” (Source: Wikipedia)
2. Isn’t it a bit dull and theoretical?
Far from it – the work of the TPC has led to increased competition between vendors, vastly faster databases and has been a motivation for manufacturers to push technology to the limits.
3. Who are the TPC?
Their members include many of the largest database manufacturers, including Microsoft, Oracle and IBM.
EXASOL is not a member and so has no influence over the benchmarks they produce.
4. What kind of benchmarks do they produce?
They started by producing benchmarks for OLTP (transaction processing) systems in the 1980’s, but then decided to cater for a new wave of Decision Support Systems (DSS) that became popular in the 1990’s.
Their first such benchmark, TPC-D appeared in 1994 but was rendered obsolete by technological change and replaced by TPC-H (ad hoc decision support benchmark) in 1999.
5. Why did TPC-D become obsolete?
The development of such features as join indices, summary tables, materialized views, etc. led to the situation that queries that were designed to be difficult became trivial.
Also, just to indicate how far such systems have come since then, the TPC-D benchmarks were designed for a maximum scale factor of 3 TB, which today would be considered modest, even for a data mart.
6. TPC-H is the “ad hoc decision support” benchmark – what does that mean?
Decision Support Systems (DSS) are systems that support business and organizational decision-making activities.
Such activities can include:
- Comparison of sales figures between one week and the next
- Predicting revenue figures based on new product sales assumptions
- Evaluating the consequences of different decision alternatives, given past experience
“Institutional DSS” refers to those systems that deal with recurring decisions. “Adhoc DSS” is much more interesting – it involves problems that are not anticipated and not recurring.
7. Why is TPC-H a good test of such systems?
There have been over 220 recorded TPC-H benchmarks over the years. Many database vendors have posted results on a vast range of hardware and at various scale factors up to 100 TB.
By running the same unbiased scripts you can compare database vendor with database vendor – and you can often see how well the same database runs on different hardware.
With the wide range of scale factors, you can also see how well certain databases scale (and often how badly they scale).
8. How do you run a TPC-H benchmark
The specification (Download as PDF) runs to 137 pages – but to summarise:
- The database consists of a 3rd Normal Form (3NF) schema consisting of 8 tables.
- The benchmarks can be run using pre-determined database sizes, referred to as “scale factors”. Each scale factor corresponds to the raw data size of the data warehouse.
- 6 of the 8 tables grow linearly with the scale factor and are populated with data that is uniformly distributed.
- 22 complex and long running query templates and 2 data refresh processes (insert and delete) are run in parallel to test concurrency.
- The number of concurrent processes increases with the scale factor – for example, for the 100 TB benchmark you run 11 concurrent processes.
- Most of the 137 pages are concerned with things you are not allowed to do to make queries run faster or to run at all on your database. No cheating!
- An external audit is required of your hardware and processes before your benchmark can be published – just to be absolutely sure that you haven’t cheated.
9. What about TPC-DS ? Isn’t that supposed to be a better test?
That benchmark was certainly designed to test a wider range of features, but since nobody has yet posted any numbers, this benchmark is currently not useful as a comparison.
Some vendors are announcing internal numbers that (dishonestly?) suggest that they have run the benchmark, but often they have just chosen the easy queries from the 99 queries available. They do not ever disclose how many of the 99 queries their database can run unaided.
EXASolution, by the way, can run all 99 queries.
10. How does EXASOL rank on the TPC-H benchmark?
I’m so glad you asked that question!
The management summary is that we many times better than our nearest competitor and orders of magnitude better than some names that you might expect to be better.
In fact, only one thing has ever beaten our 2011 figures …
… and that’s our new 2014 figures, just released.
As you can see, we’ve now achieved a clean sweep in all categories – often with an extra zero on the performance of our nearest competitor, and in the top 100 TB class an extra two zeros.
I will get into the detail in a later blog article.