My recent post entitled Recent SPARC T4-4 TPC-H Benchmark Results. Proving Bandwidth! But What Storage? provoked the following comment/question from a reader:
Does this summarize your point(s)?
TPC-H produces a number which is a reflection of (hourly?!?) system throughput.
System throughput may not be indicative of system “performance” to its uses b/c users are typically most intersted in response time. Thus, TPC-H is a easily mis-used benchmark for comparing real world performance.
Oracle is using our misunderstanding of throughput as performnace to Market systems which are excellent throughput machines as excellent performance machines, when in fact their performance may be less then desirable.
I hope the reader took the time to read yesterday’s post entitled Recent SPARC T4-4 TPC-H Results Prove Oracle Can Do Better Than…Oracle as I think it goes a long way to address his comment/question. However, I do think the reader’s question deserves proper handling and thus I’m making this blog entry. So, dear reader, the following is my response to your comment/questions, but first I need to clear the air as it were.
There Is No Evil Lurking In This Thread
Let me first state categorically that Oracle is not “using our misunderstanding […] to Market systems […]” They are not doing anything under-handed with these TPC-H results. They are, however, conveniently failing to compare their results to their own prior results. I only brought up the HP Proliant DL980 SQL Server results because Oracle did so in their press release.
I really do not like to compare TPC-H results across database vendor lines. The benchmark is too tricked out, it is a 3rd normal form schema and many other things about it make it just a goofy benchmark—if you have data warehousing in mind. Nevertheless, comparisons between a given database vendor are useful for many purposes—such as suiting my ulterior motive which is to suggest that Oracle runs better on platforms other than their very own (recently acquired) SPARC processors.
Before I continue I’d like to interject a proclamation. In fact, I’ll quote myself if you’ll suffer me to do so:
Lack of published TPC-H results does not in any way disqualify any technology offering in the data warehousing space. There are no Oracle Exadata Database Machine TPC-H results and that does not amount to a hill of beans. There are also no Teradata, EMC Greenplum nor IBM Netezza results either and none of those beans form a hill.
— Kevin Closson
The point truly is that TPC-H does not reflect DW/BI/Big Data Analytics reality. However, if a vendor like Oracle chooses to publish results then by all means I’m going to use those results to make my point—but only comparing Oracle’s own results. That’s precisely what I did I in my post entitled Recent SPARC T4-4 TPC-H Results Prove Oracle Can Do Better Than…Oracle.
Now, on to address the readers’ questions.
Throughput is a performance metric and a valid one indeed. However, throughput is generally derived by a concurrent workload of individual units of work that are individually measurable. Consider disk throughput. If I tell you I have a storage configuration that satisfies, say, 500,000 I/O operations per second (IOPS) but don’t tell you the average service times I’m leaving out a critical piece of information.
How is the IOPS metric calculated? One samples I/O completions for a given period of time and then divides by the number of seconds sampled. It’s only measuring completions. If I have a tremendous number of I/O operations in flight concurrently, and sustained, I can get 500,000 IOPS even if the average completion time is 1 second. They overlap. The same goes for query workloads.
If you submit a continual, large stream of a variety of long running queries you get throughput. Simply run such a hypothetical workload for a long time, sum up the completions and divide by sample period (time) and you get queries per unit of time. Simple.
For example, if I have 10,000 concurrent queries requiring, on average, 61 minutes monitored for 2 hours I’ll get 10,000 completions or 5,000 queries per hour. So long as that meets my service requirement I’m fine. However, if even one of my users mandates a 20 minute completion time I’m not going to impress with hand-waving over the great 5,000 QpH throughput I’m pushing through the system. Users really don’t care about how much work the system is doing on behalf of others. Do they?
So, to continue in this three-part series I’ll have to refer once again to the TPH-H disclosures (cited below).
I’ll refer again to the SPARC T4-4 result. If you glance at the report you’ll see that when submitted serially the geometric mean of query completion times is about 20 seconds on the SPARC T4. On the other hand, when we look at the HP BladeSystem result of over 3 years ago (still with Oracle Database 11g) we see that the geometric mean of serially submitted queries is nearly indiscernible…a mere blip. Of course the astute reader will point out that these comparisons—while both Oracle Database 11g—are that of in-memory versus disk-based (since the HP BladeSystem result was an In-Memory Parallel Query result). To that I would reply that it is foremost an old, tired Harpertown Xeon (5400) result with front-side bus technology compared to a state of the art, modern CPU (SPARC T4). And let’s not forget that the SPARC T4 server was connected to solid state storage!
It’s Not Fair Comparing Oracle In-Memory Parallel Query To Flash Storage
Really? Even considering how primitive a Harpertown Xeon was compared to a modern processor like SPARC T4? OK, fine. We can also harken back further to nearly 5 years to a result achieved by the now-defunct systems vendor called PANTA Systems. The PANTA System configuration, at the same 1TB scale, carried the following baggage:
- Oracle Database 10g (with Real Application Clusters). So, old software.
- Really, really old AMD Opteron 8000’s (very, very slow by today’s standards).
- DDR400 DIMMs.
In spite of this aged bio, the configuration produced a geometric mean of 49 seconds for the serially submitted query stream compared to the 20 second result for the SPARC T4.
That’s a vintage 5 year old system, 10g versus 11g, AMD 8000 versus SPARC T4, DDR400 (not even DDR2) versus DDR3 memory and, lest I forget, the PANTA System memory controller was located across a front-side bus compared to the on-die SPARC memory controller. Tally up all of those contrasting system attributes and the resultant benefit to SPARC T4-4 is about 2.5-fold improvement in the geometric mean of query response times (serial). And, yes, time and technology did bring a a 7x increase in the throughput metric…but…once again, I encourage you to look at the disclosures I link to below and see how the completion times stack up in the throughput tests. If you do so then we will have come full circle.
No, Oracle is not misleading anyone with these recent SPARC T4 results.