Oracle Versus “The Competition”
There is a thread in asktom.oracle.com that started back in 2001 about how Oracle compares to other database servers. While I think today’s competitive landscape in RDBMS technology proves that is a bit of a moot question, I followed the thread to see how it would go. The thread is a very long point/counter-point ordeal that Tom handles masterfully. There are bits and pieces about WinFS, MySQL, DB2, PostgresSQL, multi-block read consistency (thank heaven above—and Andy Mendelsohn et al. of course), writers blocking readers, page lock escalation (boo Sybase) and so on. It’s a fun read. The part that motivated me to blog is the set of TPC-C results posted by a participant in the thread. The TPC-C results showed record-breaking non-clustered TPC-C v5 results on a flagship RS/6000 (System p5 I believe it is now branded) server running DB2. Tom replies to the query with:
…do YOU run a TPC-C workload, or do YOU run YOUR system with YOUR transactions?
That was an excellent reply. In fact, that has been the standard remark about TPC-C results given by all players—when their result is not on top. I’m not taking a swipe at Tom. What I’m saying is that TPC-C results have very little to do with the database being tested. As an aside, it is a great bellwether for what software features actually offer enhanced performance and are stable enough to sustain the crushing load. More on that later.
TPC-C is a Hardware Benchmark
See, most people don’t realize that TPC-C is a hardware benchmark first and foremost. The database software approach to getting the best number is to do the absolute least amount of processing per transaction. Don’t get me wrong, all the audited results do indeed comply with the specification, but the “application” code is written to reduce instructions (and more importantly cycles) per transaction any way possible. If ever there was a “perfect” Oracle OLTP workload, it would be the TPC-C kit that Oracle gives to the hardware vendors in order to partake in a competitive, audited TPC-C. That application code, however, looks nothing like any application out in front of any Oracle database instance in production today. Don’t get me wrong, all the database vendors do the same thing because they know that it is a hardware benchmark, not a database benchmark.
TPC-C—Keeping it Real
The very benchmark itself is moot—a fact known nearly since the ratification of the specification. I remember sitting in a SIGMOD presentation of this paper back in 1995 where Tandem proved that the workload is fully partitionable and scaled linearly, thus ridiculously large results are only impeded by physical constraints. That is, if you can find the floor space, power, cooling and cable it you too can get a huge result. If only the industry would have listened! What followed has been years of the “arms race” that is TPC-C. How many features have gone into database server products for these benchmarks that do nothing for real datacenter customers? That is a good question. Having worked on the “inside” I could say, but men in trench coats would sweep me away never to be heard from again. Here’s a hint, the software being installed does not have to come from a shipping distribution medium (wink, wink). In fact, the software installation is not a part of the audit. Oops.
History and Perspective
In 1998 I was part of a team that delivered the first non-clustered, Oracle TPC-C result to hover next to what seemed like a sound barrier at the time—get this, 100,000 TpmC! Wow. We toiled, and labored and produced 93,901 TpmC on a Sequent NUMA-Q 2000 with 64 processors as can be seen in these archived version 3 TPC-C results. This and other workloads were the focus of my attention back then. What does this have to do with the AskTom thread?
The thread asking Tom about TPC-C cited a recent DB2 result on IBM’s System p5 595 of 4,033,378 TpmC. I think the point of that comment on the thread was that DB2 must surely be “better” since the closest result with Oracle is 1,601,784 TpmC. For those who don’t follow the rat race, TPC-C results between vendors constantly leap-frog each other. Yes, 4,033,378 is a huge number for a 64 processor (32-socket 64 core) system when compared to that measly number we got some 8 years ago. Or is it?
Moore’s Law?
There have been about 6 Moore’s Law units of time (18 months), since that ancient 93,901 TpmC result. A lot has changed. The processors used for that result were clocked at 405MHz, had 7.5 million transistors (250nm) and tiny little 512KB L2 caches. With Moore’s Law, therefore, we should expect processors with some 480 million transistors today. Well, somewhere along the way Moore’s Law sloped a bit so the IBM System p5 595 processors (POWER5+) “only” have 276 million transistors. Packaging, on the other hand, is a huge factor that generally trumps Moore’s Law. The POWER5+ are packaged in multi-chip modules (MCM) where there is some 144MB of L3 cache (36MB/socket, off-chip yes, but 80GB/s) backing up a full 7.6MB L2 cache—all this with 90nm technology. Oh, and that huge TpmC was obtained on a system configured with 2048GB (yes, 2 Terabytes) of memory whereas the “puny” Sequent at 93,901 TpmC had 64GB. And, my oh my, how much faster loading memory is on this modern IBM! Exponentially faster—about 6 fold in fact!
The Tale of The Tape
Is 4,033,378 really an astronomical TPC result? Let’s compare the IBM system to the old Sequent:
- 43x more throughput (TpmC)
- 32x more memory configured ( with 6x better latency )
- 37x more transistors per processor (and clocked 6x faster)
- 15x more processor L2 cache + 36x in L3
So, regardless of the fact that I just did a quick comparison of a DB2 result to an old Oracle8 result, I think there should be little surprise that these huge numbers are being produced especially when you factor in how partitionable TPC-C is. Let’s not forget disk. IBM used 6,400 hard disk drives for this result. If they get the floor space, power, do the cabling and add a bunch more drives and hook up the upcoming POWER6-based System p server, I’m quite certain they will get a bigger TPC-C result. There’s no doubt in my mind—hint, POWER6 has nearly 3 fold more transistors (on 65nm technology) than POWER5+ and clocked at 5GHz too.
Tom Kyte is Right
The question is, what does it mean to an Oracle shop? Nothing. So, as usual, Tom Kyte is right. Huge TPC-C results using DB2 don’t mean a thing.
Retrospect and Respect
I’m still proud of that number we got way back when. It was actually quite significant since the closest Oracle result at the time was a clustered 96-CPU 102,542 TpmC Compaq number using Alpha processors. That reminds me, I had a guy argue with me last year at OOW suggesting that result was the first 100K+ non-clustered Oracle result. I couldn’t blackberry the tpc.org results spreadsheet quick enough I guess. I digress. As I was saying, the effort to get that old result was fun. Not to mention the 510-pin gallium arsenide data pump that each set of 4 processors linked to in the NUMA system was pretty cool looking. It was a stable system too. The Operating System technology was a marvel. I’m fortunate to still be working with a good number of those Sequent kernel engineers right here at PolyServe where one of our products is a general purpose, fully symmetric, distributed cluster filesystem for Windows and Linux.
not many have used zfs either?…
Man, I do recall that Sequent tpc-c! It was a landmark and came out in a number of ads, IIRC.
The thing that rattles me nowadays with tpc-c is how lax they are in the definition of what is a “database”. The db2 benchmark description clearly shows they are using split databases, with each one assigned to its own app server. All they have to do to show big numbers is add nodes. There can never be any possible contention when databases are split that way. That is NOT what tpc-c was supposed to measure…
But “right as usual” Tom Kyte said himself they are using ASM now with TPC:
http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:31037981026575#31140361765489
now what? Gotta more “Hmmm…” 🙂 Or isn’t they audited to say that…
But “right as usual” Tom Kyte said himself they are using ASM now with TPC:
…I said to guess how many audited TPC results used ASM, I didn’t say none had. I’ll be blogging about this soon, but if you share the same zeal for such things go to the tpc.org website and read the FDRs of the last 2 Oracle results…there’s a C and an H
So, Tom was not wrong….
Kevin – i think you’re not in love with this thing called ASM…
Maybe your blog someday on “why” (we love that stuff so counter points would be very useful…)?
“Kevin – i think you’re not in love with this thing called ASM…
Maybe your blog someday on “why” (we love that stuff so counter points would be very useful…)?”
…yes, indeed… trying to build up interest first…although my writings are full of my rationale on the matter, I shall blog on the topic soon…
see: https://kevinclosson.wordpress.com/2006/10/19/oracle-espouses-tiered-storage-asm-who/
Quite from Tom: “…hard to compare to old runs as the hardware itself changes over time and adds to the numbers.”
I bet Oracle will attribute a lot of gains to ASM. 🙂
On ZFS – I was recently testing ZFS clones functionality while preparing for my UKOUG presentation on managing refreshes (couldn’t get Veritas environment available). I was running it in VMware so no benchmarking. However, I’ve got a nice reproducible case when the whole Solaris VM crushed – I bet it’s ZFS effect.
And that’s after I promoting that ZFS rocks and just works, period.
I was running it in VMware so no benchmarking. However, I’ve got a nice reproducible case when the whole Solaris VM crushed – I bet it’s ZFS effect.
…and you’re sure VMware has nothing to do with it?
well, to be entirely fair: it’s early days for zfs, I reckon.
Takes a while for something as complex as an advanced file system to be stable and perfectly tuned. There is no such thing as perfect coding upfront, we all know that.
Or rather: some don’t know. But those usually work in marketing…
The basic idea behind zfs makes a lot of sense. It’ll get there eventually, but it’ll need tweaking along the way. The important thing IMHO will be not to exceed expectations and end up with a lot of frustrated early adopters giving the thing a bad rap.
“…and you’re sure VMware has nothing to do with it?”
Well, it happened every time I drop large table in a tablespace from ZFS clone and right after running CTAS to recreate it. That causes massive IO to the clone and, probably, ZFS hit the problem somewhere. VMware Server survives… it’s just Solaris virtual machine that resets – effect just like power on/off. I might try to reproduce it with simple files one day.
On day I read this very interesting blog entry.
However, the first paragraph disappointed me quite a lot – if I understood correctly, nobody though of database workload while designing ZFS and only now people started to think how would it work with databases. I hope I misunderstood it as I quite like simplicity of managing ZFS and the idea behind. 🙂
it’s just Solaris virtual machine that resets – effect just like power on/off. I might try to reproduce it with simple files one day.
I’ve got Solaris running native on my server at home if you want me to try any tests out for you. Probably after UKOUG, though 😉
Sure. 😉