Reading Data Sheets
If you are in a position of influence affecting technology adoption in your enterprise you likely spend a lot of time reading data sheets from vendors. This is just a quick blog entry about something I simply haven’t taken the time to cover even though the topic at hand has always be a “problem.” Well, at least since the release of the Oracle Exadata Database Machine X2-8.
In the following references and screenshots you’ll see that Oracle cites 1.5 million flash read IOPS as an expected limit for both the full-rack Oracle Exadata Database Machine X3-2 and the Oracle Exadata Database Machine X3-8. All machines have limits and Exadata is no exception. Notice how I draw attention to the footnote that accompanies the flash read IOPS claim. Footnote number 3 says that both of these Exadata models are limited in flash read IOPS by the database host CPU. Let me repeat that last bit for anyone scrutinizing my words for reasons other than education: The Oracle Exadata Database Machine data sheets explicitly state flash read IOPS are limited by host CPU.
Oracle’s numbers in this case are SQL-driven from Oracle instances. I have no doubt these systems are both capable of achieving 1.5 million read IOPS from flash because, truth be told, that isn’t really all that many IOPS–especially when the IOPS throughput numbers are not accompanied by service times. In the 1990s it was all about “how much” but in modern times it’s about “how fast.” Bandwidth is an old, tired topic. Modern platforms are all about latency. Intel QPI put the problem of bandwidth to rest.
So, again, I don’t doubt the 1.5 million flash read IOPS citation. Exadata has a lot of flash cards and a lot of host processors to drive concurrent I/O. Indeed, with the concurrent processing capabilities of both of these Exadata models, Oracle would be able to achieve 1.5 million IOPS even if the service times were more in line with what one would expect with mechanical storage. Again, we never see service time citations so in actuality the 1.5 million number is just a representation of how much in-flight I/O the platform can handle.
Here is the new truth: IOPS is a storage bandwidth metric.
Host CPU Limited! How Many CPUs?
Here’s the stinger: Oracle blames host CPU for the 1.5 million flash read IOPS number. The problem with that is the X3-2 has 128 Xeon E5-2690 processor cores and the X3-8 has 160 Xeon E7-8870 processor cores. So what is Oracle’s real message here? Is it that the cores in the X3-8 are 20% slower than those in the X3-2 model? I don’t know. I can’t put words in Oracle’s mouth. However, if the data sheet is telling the truth then one of two things is true, either a) the E5-2690 processors are indeed 20% faster on a per-core basis than the E7-8870 or b) there is a processing asymmetry problem.
Not All CPU Bottlenecks Are Created Equal
Oracle would likely not be willing to dive into technical detail to the same level I do. Life is a series of choices–including who you chose to buy storage and platforms from. However, Oracle’s literature is clear about the number of active 40Gb QDR Infiniband ports there are in each configuration and this is where the asymmetry comes in. There are 8 active ports in both of these models. That means there are 8 streams of interrupt handling in both cases–regardless of how many cores there are in total.
As is the case with any networked storage, I recommend you monitor mpstat -P ALL output on database hosts to see whether there are cores nailed to the wall with interrupt processing at levels below total CPU-saturation. Never settle for high-level aggregate CPU utilization monitoring. Instead, drill down to the per-core level to watch out for asymmetry. Doing so is just good platform scientist work.
Between now and the time you should find yourself in a proof of concept test situation with Exadata, don’t hesitate to ask Oracle why–by their own words–both 128 cores and 160 cores are equally saturated when delivering maximum read IOPS in the database grid. After all, they charge the same per core (list price) to license Oracle Database on either of those processors.
Nice and Concise?
By the way, is there anyone who actually believes that both of these platforms top out at precisely 1.5 million flash read IOPS?
Oracle Exadata Database Machine X3-2 Datasheet
Oracle Exadata Database Machine X3-8 Datasheet
DISCLAIMER: This post tackles citations straight from Oracle published data sheets and published literature.
The 1000000 write iops are accompanied by a footnote indicating that they are measured from the Exadata “storage servers”. I imagine that the read IOPs may also be measured at the storage servers. The X3-2 and X3-8 datasheets indicate that full racks of both have 14 storage servers with 12 cores each, for 168 cores of “storage server” CPU driving the flash IOPs.
Why bother with an X3-8 if it has the same max flash throughput to the storage servers as the X3-2? If “query processing proper” requires the additional “database server” CPU – sorting, comparisons, calculations, etc – it may deliver value. Otherwise, folks may buy an X3-8 without reading too carefully… figuring that it MUST have much more performance capacity than its “little brother”.
@sql_handle : actually the 1,000,000 is flash writes issued by hosts. That million includes the ASM duplex writes. None of the write IOPS spoken of in the data sheet are any sort of synthetic storage-level test. Oracle is stating IOPS measured from instances running SQL workloads. The reason they have to measure writes at the storage level is because–believe it or not–Oracle instances do not account for ASM duplexed writes! That is, read an AWR report pushing writes to normal redundancy ASM storage and you’ll have to double the number that is reported. All that said, this particular post is about scrutinizing the CPU-limiting aspect of host read IOPS.
Thanks for stopping by.
Apparently they have since changed the footnotes as the links you reference don’t have footnote 3 mentioning the CPU, rather only footnote 7 mentions the CPU. In the 3-8 you have 2 nodes sharing 44TB of flash and in the 3-2 you have 8 nodes sharing 22.4 TB of flash both from the same 14 storage servers – the flash I/O number is likely the same due to the storage servers being identical in their CPU/IO capability and the ability to serve data from flash IMHO.