In my recent post entitled Oracle Executives Underestimate SPARC SuperCluster I/O Capability–By More Than 90 Percent! I offered some critical thinking regarding the nonsensical performance claims attributed to the Sun SPARC SuperCluster T4 in one of the keynotes at Oracle Openworld 2011. In the comment thread of that post a reader asks:
All – has anyone measured the actual IOPS from disk as well as from flash in your Exadata (production) environment and compare with what the Oracle white paper or CXO presentations claimed?
That is a good question. It turns out the reader is in luck. There happens to be really interesting public information that can answer his question. According to this searchoracle.techtarget.com article, Campbell Webb, Oracle’s VP of Product Development IT refers to Oracle’s Beehive email and collaboration database as “Oracle’s largest backend database.” Elsewhere in the article, the author writes:
Oracle’s largest in-house database is a 101-terabyte database running Beehive, the company’s in-house email and collaboration software, running on nine Oracle Exadata boxes.
As it turns out, Oracle’s Campbell Webb delivered a presentation on the Beehive system at Oracle Openworld 2011. The presentation (PDF) can be found here. I’ll focus on some screenshots of that PDF to finish out this post.
According the following slide from the presentation we glean the following facts about Oracle’s Beehive database:
- The Beehive database is approximately 93TB
- Redo generation peaks at 20MB/s—although that is unclear if that is per instance or the aggregate of all instances of this Real Application Clusters Database
- User SQL executions peaks at roughly 30,000 per second
The techtarget.com piece quotes Campbell Webb as stating the configuration is 9 racks of Exadata gear with 24 instances of the database—but “only” 16 are currently active. That is a lot of Oracle instances and, indeed, a lot of instances can drive a great deal of physical I/O. Simply put, a 9-rack Exadata system is gargantuan.
The following is a zoom-in photo of slide 12 from the presentation. It spells out that the configuration has the standard 14 Exadata Storage Servers per rack (126 / 14 == 9) and that the hosts are X2-2 models. In a standard configuration there would be 72 database hosts in a 9-rack X2-2 configuration but the techtarget.com article quotes Webb as stating 16 are active and there only 24 in total. More on that later.
With this much gear we should expect astounding database throughput statistics. That turns out to not be the case. The following slide shows:
- 4,000,000 logical I/O per second at peak utilization. That’s 250,000 db block gets + db block consistent gets (cache-buffers chain walks) per second per active host (16 hosts). That’s a good rate of SGA buffer pool cache activity—but not a crushing load for 2S Westmere EP.
- The physical read to write ratio is 88:12.
- Multiblock physical I/Os are fulfilled by Exadata Storage Servers on average at 6 or less milliseconds
- Single block reads are largely satisfied in Exadata Smart Flash Cache as is evidenced by the 1ms waits
- Finally, database physical I/O peaks at 176,000 per second
With 126 storage servers there is roughly 47TB of Exadata Smart Flash Cache. Considering the service times for single block reads there is clear evidence that the cache management is keeping the right data in the cache. That’s a good thing.
On the other hand, I see a cluster of 16 2U dual-socket Westmere-EP Real Application Clusters servers driving peak IOPS of 176,000. Someone please poke me with a stick because I’m bored to death—falling asleep. Nine racks of Exadata is capacity for 13,500,000 IOPS (read operations only of course). This database is driving 1% of that.
Nine racks of Exadata should have 72 database hosts. I understand not racking them if you don’t need them, but the configuration is using less than 2 active hosts per rack—but, yes, there are 24 cabled (less than 3 per rack). Leaving out 48 X2-2 hosts is 96U—more than a full rack worth of aggregate wasted space. I don’t understand that. The servers are likely in the racks—powered off. You, the Oracle customer, can’t do that because you aren’t Oracle Product Development IT. You’ll be looking at capex—or a custom Exadata configuration if you need 16 hosts fed by 126 cells.
It is not difficult to configure a Real Application Clusters system capable of beating 16 2-socket Westmere EP servers, with their 176,000 IOPS demand, with far, far less than 9 racks of hardware. It would be Oracle software just the same—just no Exadata bragging rights. And, once a modern, best-of-breed system is happily steaming along its way hustling 176,000 IOPS, you could even call it an “Engineered System.” There’d be no Exadata bragging rights though. Just a good system handling a moderate workload. There is nothing about this workload that can’t be easily handled with conventional, best-of-breed storage. EMC technology with FAST quickly comes to mind.
Beehive is Oracle’s largest database and it runs on a huge Exadata configuration. Those two facts put together do not make any earth-shattering proof point when you study the numbers.
I don’t get it. Well, actually I do.
By the way, did I mention that 176,000 IOPS is not a heavy IOPS load–especially when only 12% of them are writes?