Oracle Exadata Database Machine I/O Bottleneck Revealed At… 157 MB/s! But At Least It Scales Linearly Within Datasheet-Specified Bounds!

It has been quite a while since my last Exadata-related post. Since I spend all my time, every working day, on Exadata performance work this blogging dry-spell should seem quite strange to readers of this blog. However, for a while it seemed to me as though I was saturating the websphere on the topic and Exadata is certainly more than a sort of  Kevin’s Dog and Pony Show. It was time to let other content filter up on the Google search results. Now, having said that, there have been times I’ve wished I had continued to saturate the namespace on the topic because of some of the totally erroneous content I’ve seen on the Web.

Most of the erroneous content is low-balling Exadata with FUD, but a surprisingly sad amount of content that over-hypes Exadata exists as well. Both types of erroneous content are disheartening to me given my profession. In actuality, the hype content is more disheartening to me than the FUD. I understand the motivation behind FUD, however, I cannot understand the need to make a good thing out to be better than it is with hype. Exadata is, after all, a machine with limits folks. All machines have limits. That’s why Exadata comes in different size configurations  for heaven’s sake! OK, enough of that.

FUD or Hype? Neither, Thank You Very Much!
Both the FUD-slinging folks and the folks spewing the ueber-light-speed, anti-matter-powered warp-drive throughput claims have something in common—they don’t understand the technology.  That is quickly changing though. Web content is popping up from sources I know and trust. Sources outside the walls of Oracle as well. In fact, two newly accepted co-members of the OakTable Network have started blogging about their Exadata systems. Kerry Osborne and Frits Hoogland have been posting about Exadata lately (e.g., Kerry Osborne on Exadata Storage Indexes).

I’d like to draw attention to Frits Hoogland’s investigation into Exadata. Frits is embarking on a series that starts with baseline table scan performance on a half-rack Exadata configuration that employs none of the performance features of Exadata (e.g., storage offload processing disabled). His approach is to then enable Exadata features and show the benefit while giving credit to which specific aspect of Exadata is responsible for the improved throughput. The baseline test in Frits’ series is achieved by disabling both Exadata cell offload processing and Parallel Query Option! To that end, the scan is being driven by a single foreground process executing on one of the 32 Intel Xeon 5500 (Nehalem EP) cores in his half-rack Database Machine.

Frits cited throughput numbers but left out what I believe is a critical detail about the baseline result—where was the bottleneck?

In Frits’ test, a single foreground process drives the non-offloaded scan at roughly 157MB/s. Why not 1,570MB/s (I’ve heard everything Exadata is supposed to be 10x)? A quick read of any Exadata datasheet will suggest that a half-rack Version 2 Exadata configuration offers up to 25GB/s scan throughput (when scanning both HDD and FLASH storage assets concurrently). So, why not 25 GB/s? The answer is that the flow of data has to go somewhere.

In Frits’ particular baseline case the data is flowing from cells via iDB (RDS IB) into heap-buffered PGA in a single foreground executing on a single core on a single Nehalem EP processor. Along with that data flow is the CPU cost paid by the foreground process in its marshalling all the I/O (communicating with Exadata via the intelligent storage layer) which means interacting with cells to request the ASM extents as per its mapping of the table segments to ASM extents (in the ASM extent map). Also, the particular query being tested by Frits performs a count(*) and predicates on a column. To that end, a single core in that single Nehalem EP socket is touching every row in every block for predicate evaluation. With all that going on, one should not expect more than 157MB/s to flow through a single Xeon 5500 core. That is a lot of code execution.

What Is My Point?
The point is that all systems have bottlenecks somewhere. In this case, Frits is creating a synthetic CPU bottleneck as a baseline in a series of tests. The only reason I’m blogging the point is that Frits didn’t identify the bottleneck in that particular test. I’d hate to see the FUD-slingers suggest that a half-rack Version 2 Exadata configuration bottlenecks at 157 MB/s for disk throughput related reasons about as badly as I’d hate to see the hype-spewing-light-speed-anti-matter-warp rah-rah folks suggest that this test could scale up without bounds. I mean to say that I would hate to see someone blindly project how Frits’ baseline test would scale with concurrent invocations. After all, there are 8 cores, 16 threads on each host in the Version 2 Database Machine and therefore 32/64 in a half rack (there are 4 hosts). Surely Frits could invoke 32 or 64 sessions each performing this query without exhibiting any bottlenecks, right? Indeed, 157 MB/s by 64 sessions is about 10 GB/s which fits within the datasheet claims. And, indeed, since the memory bandwidth in this configuration is about 19 GB/s into each Nehalem EP socket there must surely be no reason this query wouldn’t scale linearly, right? The answer is I don’t have the answer. I haven’t tested it. What I would not advise, however, is dividing maximum theoretical arbitrary bandwidth figures (e.g., the 25GB/s scan bandwidth offered by a half-rack) by a measured application throughput requirement  (e.g., Frits’ 157 MB/s) and claim victory just because the math happens to work out in your favor. That would be junk science.

Frits is not blogging junk science. I recommend following this fellow OakTable member to see where it goes.

5 Responses to “Oracle Exadata Database Machine I/O Bottleneck Revealed At… 157 MB/s! But At Least It Scales Linearly Within Datasheet-Specified Bounds!”

  1. 1 joel garry August 30, 2010 at 5:23 pm

    The FUD and hype come directly from the fact that the people who have the dollars and make the decisions need to work with a highly distilled set of facts. So please, never feel a proper technical explanation will contribute to web saturation – we are all going to have too much information to filter, and having trustable people like the Oakies makes it much easier for us techies to contribute some rational input to the decision makers when we have the chance, which often means at evaluation time.

    In other words, keep up the good works!

    I’m constantly amazed at how many technical people don’t understand how to benchmark (I’m referring to forums and such, not posts you are referencing here). Of course, the bottlenecking issue is critical – a slight change in configuration or load characteristics can mean a large chaotic change in performance. That’s the shortcoming of benchmarking – even if your load properly reflects the real world, when you get to the real world, things can change drastically. Do that with two similar products, and the result could be, you might as well have just listened to the hype. Aargh!

    Greg Rahn’s post was a real eye-opener, too.

  2. 2 John Hurley August 30, 2010 at 11:09 pm

    Personally I would be fired up to be working on projects involving Exadata or ones where it is at least being remotely considered. Nothing wrong with you being excited about this technology and the level of performance it can offer big shops.

    It seems at least to me that the price tag however even for a partial box is going to be way beyond the budget of most medium size shops on down. Maybe beyond many of the larger ones?

    It is probably hard for you to break out of the Exadata side at Oracle but considering how little quality Oracle blogging is going on these days ( it seems to me to be mostly over ) please keep trying to volley some non Exadata stuff over when you can!

    Many of us are running on Linux systems using intel hardware and finding the performance quite acceptable for our employers.

  3. 4 Craig February 23, 2012 at 12:19 pm

    Hey Kevin,

    I am not sure where to post these questions. It is in regard to Exadata and workload management. I am hearing that Exadata is a data warehousing platform, but I always thought of Oracle (including Exadata) as being an OLTP platform. At Open World, they kept suggesting turning off indexes on Exadata to take advantage of Smart Scan. Doesn’t this greatly affect OLTP, and vice versa? How do they tune for both? They say that Exadata is so easy to manage now, but to be honest it seems harder. You have to worry about RAC, ASM, the Infiniband network, as well as the database. I am not convinced yet that it can handle multiple mixed workloads. Oracle never has before, and Exadata is still Oracle and RAC. Is it really a data warehousing platform or are they just trying to play catchup to the Netezzas, Greenplums, and Teradatas of the world?


    • 5 kevinclosson February 23, 2012 at 1:46 pm

      Hi Craig,

      Exadata adds no value to OLTP. DW/BI is a different story. Indeed, Exadata feeds RAC at good rates of throughput but since the architecture is asymmetrical complex queries will leave you with a bottlenecked database grid long before you achieve Exadata datasheet throughput. Neither the X2-8 nor the X2-2 can process complex joins while ingesting full IB payload from storage (25.6 GB/s) …the CPUs simply cannot handle it.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,974 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories


All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: