Archive Page 20

I Like Chen Shapira’s Blog

Just a quick entry to point to Chen Shapira’s blog. I recommend it. Oh, that means I need to take a moment to update my blog roll!

I Ain’t Not Too Purdy Smart, But I Know This for a Fact: MAA Literature is Required Reading!

You Need to See What These Folks Have to Say

I must put out a plug for Oracle’s Maximum Availability Architecture (MAA) team and the fruits of their labor now that I have personally worked with them on projects for over a year…I’m sure it’s no credit to them that I should say so,  but honestly, this team is really, really sharp!

Not only is this paper covering migration to Exadata Storage Server helpful for the actual act of deploying Exadata into an existing Oracle DW/BI environment, but it also goes a long way to suggest how much simpler it surely must be than dumping out an Oracle database and loading it.

Go get some of those papers!

RMOUG 2009. A Great Show…and I Get To Go!

I just took a look at the RMOUG 2009 Training Days Schedule to find out what day/time I’m speaking about Exadata performance and architecture. I have to say that RMOUG is always one of my favorite conferences and I’m looking forward to this year. I’ll have 90 minutes ending at noon on Wednesday and if all goes well I won’t have ruined attendees’ appetites just in time for lunch!

I also noticed an unfortunate schedule conflict. During the same time slot friends and fellow Oaktable Network memebers Tim Gorman and Tanel Poder are also speaking. Choices, choices…

Details

Title: Oracle Exadata Storage Server. Introducing Extreme Performance for Data Warehousing.

Abstract: Kevin Closson will present an overview of the Oracle Exadata Storage Server and HP Oracle Database Machine architecture and internals including performance analysis of typical workloads the Exadata Storage Server is solely capable of accelerating. The presentation will include comparisons to the typical storage architectures supporting Oracle Database workloads today. Attendees will leave with a good understanding of why, and how, Exadata is what it is. Kevin will leave ample time for a fruitful question and answer session.

Oracle Ace, or Oracle Dunce…err, I mean Deuce.

Oracle Ace or Oracle Dunce?

For quite some time the “about” section of my blog stated that I am an Oracle Ace. Well, I was getting pelted with email from people that got the scoop on the fact that I am not an Oracle Ace. I was an Oracle Ace. Moreover, I was actually an Oracle Ace who wasn’t self-nominated! But, no matter, I am no longer an Oracle Ace and I have updated my blog accordingly.

Why?

Oracle employees are not eligible for participation in the Oracle Ace program. So the Oracle Ace-dom that I had prior to joining Oracle was revoked. I still have the vest though!

Unstructured Data. Lots and Lots of It.

Yes, there is unstructured data and if you have an awful lot of it, the HP StorageWorks 9100 Extreme Data Storage System looks like a really great place to put it. I’m biased though because the software that drives the StorageWorks 9100 is PolyServe-my former company. I’m glad to see HP doing good things with PolyServe since the acquisition in 2007. Too many large corporate mergers end up in a mess. I’m glad to see that isn’t happening to my old friends and former PolyServe colleagues!

This offering is geared more towards density and cost than performance from what I can see. Nonetheless, having over 3 GB/s NFS bandwidth will come in handy given the capacities this offering supports.

Cool technology!

Oracle Exadata Storage Server: 485x Faster Than…Oracle Exadata Storage Server. Part II.

In my blog entry entitled Exadata Storage Server: 485x Faster than…Exadata Storage Server. Part I., I took issue with the ridiculous multiple orders of magnitude performance improvement claims routinely made by the DW Appliance vendors. These claims are usually touted as comparisons to “Oracle” (without any substantive accounting for what sort of Oracle configuration they are comparing to) and never seem to include any accounting for where the performance improvement comes from. After learning a bit about marketing from a breakfast cereal commercial, I decided to share with my readers how easy it is to do what these guys generally do-compare apples to bicycles. To make it more interesting I decided to show a 485-fold performance increase of Oracle Exadata versus Oracle Exadata. A long comment thread ensued and ultimately ended with a reader posting the following:

Block density perhaps?

You didn’t mention that the number of records per block
was a constant. So it would be possible that in the first scenario you created a table with a low amount of records per block, resulting in a large segment, needing a lot of io’s. (you could have used 1 row/block for example)

While in the second scenario you could have used a high number of blocks per record, resulting in a smaller segment, and thus needing a lower amount of io’s to fulfill the query.

BINGO.

Here’s the deal. I chose my words carefully and took a huge dose of Semantic-a-sol(tm). I set the stage as a follows:

  • I said, “There are no … partitioning or any sort of data elimination.” True, there were no forms of data elimination. I didn’t say anything about eliminating unused space.
  • I said the data in the table was the same; I never said it was the same table.
  • I said there was the same storage bandwidth and the same number of CPUs and that was true.

The 485x was a product of querying a table with PCTFREE 0 versus PCTFREE 99. When I queried the vacuous blocks I also did so with a normal scan instead of a Smart Scan. So it is true that storage bandwidth remained constant but I created an artificial bottleneck upwind by forcing the single database host (used in both cases) to ingest the full 1.6 TB which is how much round-brown spinning stuff needed to store the vacuous blocks (PCTFREE 99). That took 970 seconds.

With ~107 million rows, and a query that cited only the PURCHASE_AMT column, the amount of data actually needed by the SQL layer is a measly 86 MB. So, when I “magically” switched the card_trans synonym to point to the PCTFREE 0 table (which is only 8.4 GB) and scanned it with the full power of 14 Exadata Storage Servers, the data was off disk and the PURCHASE_AMT column plucked from the middle of each row and DMAed into the address space of the Parallel Query Processes on the database host in 1.96 seconds….485x speed up.

So, does anyone else hate it when these DW Appliance guys go around spewing ridiculous multiple orders of magnitude performance increases over who-knows-what without any accounting? It truly is an insult on your intelligence.

There is no reason to be mystified. If DW Appliance vendor XYZ is spouting off about a query processing speed-up of, say, X, just plug the values into the following magic decoder ring. Quote me on this, performance increase X is the product of:

  1. Executing on a platform with X-fold storage bandwidth, or
  2. Executing on a platform with X-fold processor bandwidth, or
  3. The query being measured manipulated 1/Xth the amount of data, or
  4. Some combination of items 1 through 3

Any reasonable vendor will gladly itemize for you where they get their magical performance gains. Just ask them, you might learn more about them than you thought.

Part II in these series can be found here.

Oracle Exadata Storage Server: 485x Faster Than…Oracle Exadata Storage Server. Part I.

I recently read an article by Curt Monash entitled Interpreting the results of data warehouse proofs-of-concept (POCs).  Curt’s post touched on a topic that continually mystifies me. I’m not sure when the phenomenon started, but I’ve witnessed a growing trend towards complete lack of scrutiny when it comes to the performance claims made by most vendors in the data warehousing space. For example, Netezza makes a blanket claim that their appliance is 100-fold faster than Oracle. Full stop. Er, not full stop… Netezza doesn’t stop there. They claim:

While Netezza makes claims of 100x performance gains, it is not uncommon to see performance differences as large as 200x to even 400x or more when compared to existing Oracle systems.

100x Speed-up: Child’s Play

But, honestly, 100x is child’s play. Forget for a moment that there is no itemization of where that speedup would come from in Netezza’s high-level messaging. Such information would be technical marketing and I wouldn’t expect Netezza to disclose any sort of justification for where 100x speedup comes from. Lowered expectations. Shucks, these DW arms-race marketing claims remind me of that famous Saturday Night Live skit that seems to have served as the play book for these marketing guys-in more ways than one!

Ok, chuckles aside, Curt’s post on the topic included a link to a spreadsheet of recent Proof of Concept results where “the incumbent” was trounced to the tune of 335x in reporting tasks. Like I said, 100x is child’s play.

Intellectual Curiosity

Nobody should look at a claim such as 335x without wondering where in the world such a speedup comes from and shame on any vendor that isn’t willing to itemize the benefit. After all, without some knowledge of what produces such astounding speedup, how is the dutiful DW practitioner to expect the speedup to remain intact over time or, moreover, how to replicate the “magic” elsewhere. I’m more than willing to itemize to anyone any claim of Oracle Exadata Storage Server speed up on any query. Exadata is not “magic” so accounting for its benefit is very easy to do. But, back to the 335x for a moment. This is actually quite simple. To get 335x speedup one of the following is true:

  1. The query was executed on a platform with 335x storage bandwidth
  2. The query was executed on a platform with 335x processor bandwidth
  3. The query manipulated 1/335th the amount of data
  4. Some combination of items 1 through 3

Number 3 in the list is achieved through things like partition elimination, indexing, materialized views, more efficient joins, and so forth. This is what Oracle refers to as the “Brainy Approach” to improved data warehouse query performance. Of course Oracle has, and retains all these “Brainy” optimization approaches, and more, when Exadata is in play. Exadata is a solution offering both “Brainy” and, most importantly, “Brawny” technology.

Let’s think this 335x thing through for a moment.  Imagine that the 335x was a Netezza 10100 and the 335x was an improvement over a traditional Oracle incumbent (no Exadata). One of Netezza’s main value propositions is that they are able to utilize full bandwidth of all the disks in the system in parallel-just like Exadata. That’s the “Brawny” approach. As I point out in my post about “arcane” disk technology, this value proposition is the least we deserve, but because of typical storage provisioning most Oracle deployments don’t benefit from the aggregate bandwidth their drives could actually offer. So kudos to Netezza for that.  What if this was Netezza and the 335x was due to the NPS 10100 “Brawny”  disk bandwidth capability? Well, that chalks the win to item 1 in the list and therefore the incumbent system was configured with 1/335th the amount of disk bandwidth of the NPS system. If I grant the NPS system 70 MB/s per disk drive I get roughly 7.5 GB/s (108 * 70MB). Does that mean the incumbent was ingesting only 22 MB/s (7.5 GB/335)? Would anyone care about that result? Would you be proud if you got more performance from 108 SATA drives than a single USB 2.0 drive? I shouldn’t think the 335x came solely from list item 1.

The NPS 10100 has 108 processors pounding on the data as it comes off the drives. Can we get 335x over our imaginary incumbent from sheer processing power? Sure, so long as the incumbent was running Oracle on a processor with 1/3rd the bandwidth of a single PowerPC processor (the embedded CPU on a Netezza SPU). Would anyone be excited to beat 1/3rd a CPU with 108 CPUs?

No, folks, the 335x was certainly the product of item 4 on the list-with a very heavy slant towards item 3-regardless of which appliance vendor it actually was.

A 335 Fold Improvement is Child’s Play? I want 485 Fold!

Humor me as I walk through a little exercise to elaborate more on this topic. In the following session I’ll demonstrate a query accessing precisely the same amount of data using the same SQL, in the same Oracle session, attached to the same Oracle database. You’ll see that I execute a host command to prove that within the scope of 15 seconds I am able to demonstrate a 485x speedup. You can choose to believe me or not, but the facts are as follows:

  • The amount of data in the table is the same in each case.
  • The data in every column of every row is the same.
  • The order of rows in the table is the same.
  • There is no compression involved at any point.
  • The table datatypes are the same.
  • The query plan is the same.
  • The Oracle Parallel Query Degree of Parallelism remains constant. That means equal CPUs attacking the data.
  • There are no indexes, materialized views, partitioning or any sort of data elimination.
  • The Oracle Results Cache feature was not used.
  • The data in each case resided on the same disks.

And, oh, before I forget to say so, this is Exadata. So, can Oracle market Exadata as 485x faster than Exadata without the use of any data elimination techniques? See for yourself and fill out a comment with your explanation for what I have shown here.

First, a listing of the “demo” script:

SQL> !cat demo.sql

set echo off
set timing off
col sum_sales format 999,999,999,999,999,999
host date

desc card_trans

set echo on
select count(*) from card_trans;
set timing on

select sum(purchase_amt) sum_sales from card_trans;
host date

In the following screen capture I’ll show that the query took 970 seconds to complete. I used the SUM aggregate against the 100+ million purchase_amt column values as a means to show I’m querying the same content in both cases.

SQL> @demo
Wed Dec 10 08:59:48 PST 2008

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
CARD_NO                                   NOT NULL VARCHAR2(20)
CARD_TYPE                                          CHAR(20)
MCC                                       NOT NULL NUMBER(6)
PURCHASE_AMT                              NOT NULL NUMBER(6)
PURCHASE_DT                               NOT NULL DATE
MERCHANT_CODE                             NOT NULL NUMBER(7)
MERCHANT_CITY                             NOT NULL VARCHAR2(40)
MERCHANT_STATE                            NOT NULL CHAR(3)
MERCHANT_ZIP                              NOT NULL NUMBER(6)

SQL> select count(*) from card_trans;

COUNT(*)
----------
107389152

SQL>
SQL> set timing on
SQL>
SQL> select sum(purchase_amt) sum_sales from card_trans;

SUM_SALES
------------------------
6,443,502,770

Elapsed: 00:16:10.15
SQL>
SQL> host date
Wed Dec 10 09:32:08 PST 2008

The first pass of the script ended in the same session at 9:32:08 and 11 seconds later I executed the script again. The session capture shows that there was a 485x speed up (970 seconds down to 2 seconds). Like I said, “100x is childs play.” Well, at least it is when there is no accounting offered for the improvement. Pshaw, it seems I learned a lot from that “training” video I reference above.

SQL>  @demo
SQL>
SQL> set echo off
Wed Dec 10 09:32:19 PST 2008

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
CARD_NO                                   NOT NULL VARCHAR2(20)
CARD_TYPE                                          CHAR(20)
MCC                                       NOT NULL NUMBER(6)
PURCHASE_AMT                              NOT NULL NUMBER(6)
PURCHASE_DT                               NOT NULL DATE
MERCHANT_CODE                             NOT NULL NUMBER(7)
MERCHANT_CITY                             NOT NULL VARCHAR2(40)
MERCHANT_STATE                            NOT NULL CHAR(3)
MERCHANT_ZIP                              NOT NULL NUMBER(6)

SQL> select count(*) from card_trans;

COUNT(*)
----------
107389152

SQL>
SQL> set timing on
SQL>
SQL> select sum(purchase_amt) sum_sales from card_trans;

SUM_SALES
------------------------
6,443,502,770

Elapsed: 00:00:01.96
SQL>
SQL> host date
Wed Dec 10 09:32:23 PST 2008

SQL> select count(*) from user_indexes ;

COUNT(*)
----------
0

Elapsed: 00:00:00.11

Part II in this series: click here.

FISHy Network Attached Storage

Back in October 2007, I had a series of very interesting and enjoyable technology briefings/discussions with Bryan Cantrill, the inventor of Sun’s Dtrace dynamic tracing toolkit. Bryan had been following my blog and realized I moved on to the Oracle Server Technologies group after my tenure as Chief Software Architect of Oracle Database Platform Solutions at PolyServe. Who knows, he may have been speaking to Glenn Fawcett, my old friend and former Database Engineer at Sequent, as well. Bryan was investigating whether there could be some synergy between Oracle and the project he was currently working on codenamed FISH (Fully Integrated Software and Hardware). I lost track of how that project was moving along as I became too engrossed in my work on Oracle Exadata Storage Server. Nonetheless, it is nice to hear of this product coming to market as I recall being very impressed by the early technical details of the product.

From the FISHworks project comes the Sun Storage 7000 Unified Storage System.

Bryan Cantrill goes into some of the history of the Sun Storage 7000 in this blog entry.

It will be interesting to watch the happenings between Sun and NetApp over this

…and, no, I don’t think I’m special because Bryan and I had a chat about FISH back in Oct 2007 as Ashlee Vance had scooped the news 7 months prior.

Exadata Related Posts. Losing Posts in the Mosh Pit.

I’ve had several people ask me questions recently about topics I’ve covered in at least a few of my Exadata Storage Server related posts. When I asked them if they’d seen a particular post, they’ve generally responded, “No, I didn’t know that post existed”, or words to that effect. So, this may seem odd, but in addition to updating my Exadata Posts page, I’ve pasted handy URLs to my Exadata related posts below. I wish there was time to add to the set of posts. I do have plenty of material queued up, but I’m scrambling for time. In the meantime, perhaps a few readers will appreciate the content pointed to by these URLs:

Oracle Exadata Storage Server. Part I.

Oracle Exadata Storage Server. Part II.

Oracle Exadata Server Related Web News Media and Blog Errata. Part I.

HP Oracle Database Machine. A Thing of Beauty Capable of “Real Throughput!”

I know Nothing About Data Warehouse Appliances and Now, So Won’t You – Part V. Why GreenPlum is Better Than Oracle Exadata Storage Server.

Podcast: Pythian Group Oracle Exadata Storage Server Q&A with Kevin Closson.

Pessimistic Feelings About New Technology. Oracle Exadata Storage Server – A JBOD That Can Swamp A Single Server.

Oracle Exadata Storage Server: Beaten by FLASH SSD and Worthless for OLTP.

Oracle Exadata Storage Server. No Magic in an Imperfect World. Excellent Tools and Really Fast I/O Though.

Oracle Exadata Storage Server: A Black Box with No Statistics.

Blog Anniversary

Sing along with me now…

Happy Anniversary to my blog. It’s been two years as of today. And as Tom Kyte foretold, there has been a lot about “disk” on this blog.

This blog has had nearly 1 million page views to date and while I’d like to say it has been all fun and games it has been a lot of work.

My favorite topics include, of course, my Exadata Related Posts, but the Oracle on Opteron, NUMA (etc) posts have had tremendous readership as have the “famous” Oracle Over NFS (most notably the “Manly Man” series) posts.

How about a poll?

Pessimistic Feelings About New Technology. Oracle Exadata Storage Server – A JBOD That Can Swamp A Single Server.

In my recent blog entry entitled “Oracle Exadata Storage Server. No Magic in an Imperfect World. Excellent Tools and Really Fast I/O Though“, I concluded with a reference to some anti-Exadata comments made by EMC’s Chuck Hollis in his blog entry entitled “Oracle Does Hardware.” I directed my readers to that blog post by writing the following:

In spite of how many times EMC’s Chuck Hollis may claim that there is “nothing new” or “no magic” when referring to Oracle Exadata Storage Server, I think it is painfully obvious that there is indeed “something new” here. Is it magic? No, and we don’t claim that it’s magic.

Six days after I posted that blog entry, Chuck submitted a lengthy comment on the post. Instead of responding to Chuck’s comments in the comment thread I’ve decided to do so here.

Readers please don’t confuse this as some sort of Kevin versus Chuck thread because it isn’t. What you’ll see in this post is an analysis of the words of someone representing one of the (if not the premier) conventional storage providers (EMC). My motives are to provide useful information in this analysis.

If you read Chuck’s assessment of Oracle Exadata Storage Server, you’ll see a positioning piece with an overtly anti-Exadata slant. Chuck’s words in that post are aimed at conveying facts. My first handling of that anti-Exadata piece was very light. I aimed to capitalize foremost on the repetitious use of the words “nothing new” and “magic.” Chuck likely saw my post where I called this out. Chuck answered my calling-out in the comment thread of this post. Chock wrote:

Sorry, Kevin, didn’t mean to come across as too pessimistic in my blog.

Asserting Beliefs
I need to point out that one cannot be pessimistic about facts. The word pessimistic only applies to beliefs and emotions. Chuck’s piece wasn’t pessimistic–it was flawed based on technical grounds. Chuck continued with:

Leaving hardware issues aside, how much of the software functionality shown here is available on generic servers, operating systems and storage that Oracle supports today? I was under the impression that most of this great stuff was native to Oracle products, and not a function of specific tin …

Last Chance for a First Impression

Chuck’s “pessimistic” post came out the day after the Oracle Exadata Storage Server launch so harboring such questions in his mind at that time would have been understandable. However, Chuck visited my blog some 22 days later and continued to ask questions that clearly demonstrate a lack of understanding of Oracle Exadata Storage Server. Chuck may have been “under the impression” that the underpinnings of Exadata are Oracle-generic (“native to Oracle products”), but he is wrong. Oracle Exadata Storage Server software is not a scalpel-job on the Oracle Database server. It is a totally new storage server software package.

To answer Chuck’s question about the software, none of the software functionality (Exadata) is available on generic servers, operations systems or storage. Chuck continued with:

If the Exadata product has unique and/or specialized Oracle logic, well, that’s a different case.

Yes, it is unique and specialized and a different case. Even light reading of the available material (e.g., the Exadata paper and, shucks, maybe a few of my blog posts) would have made that glaringly obvious. Chuck continued with:

Speaking strictly as a storage guy, here’s what I know.

– using commodity servers and storage arrays, we can usually feed in more data than a server can process, specifically true in an Oracle DW environment.

Chuck, swamping a commodity server is not the goal. Of course it’s easy to produce more raw, streaming data from even a midrange storage array than can be ingested by a single commodity server. Even the best commodity servers choke at less than 2GB/s data ingest rate when Oracle is performing data-rich functionality (e.g., joins, sorts, aggregation, etc). The design goal of Exadata was not to swamp commodity servers more efficiently. That would be a storage-only, bigger hose, speed-and-feed mentality-the “brute force only approach.”

You Don’t Always Get What You Want-Enter Exadata
The value proposition of Exadata is to scan disk without bottlenecks and return to the Database grid only the data the query wants, not blocks of disk. It’s a feature we call Smart Scan. Chuck needs to have his folks brief him on that. However, Exadata is more than capable of holding its own in the pure “brute force” camp.

Even Without Smart Scan, Exadata is Faster Than Conventional Storage
As a simple block server, Exadata is able to deliver 1GB/s per cell to the Database grid. If you don’t think that is “brute force”, consider a moderate Oracle Database Machine configuration consisting of a single rack serving 14 GB/s to the Database grid. If those numbers don’t speak loudly enough, just investigate what sort of conventional SAN array configuration it would take to deliver 14 GB/s to a Database grid. So, yes, Exadata is both “brute force” and intelligent and that is why I had to call out Chuck’s blog remarks about how Exadata is “nothing new.”

Chuck finished that paragraph with:

I’m having a hard time seeing the advantages of pairing a commodity Xeon-based server with JBOD and claiming a performance advantage for this part of the equation.

Oh my, where to start. Chuck, I understand why you would have difficulty seeing the advantage in what you just described, but what on earth does any of that have to do with Oracle Exadata Storage Server? First, where did you get “JBOD?” An Oracle Exadata Storage Server cell is not just a Xeon processor sitting in front of some disks (JBOD). The disks are down-wind of an intelligent HP P400 Smart Array with 512MB battery-backed write cache. And, what’s so terrible about fronting some disks with Xeon technology anyway? There are a few conventional storage arrays on the market that use Xeon in the array head.

It’s All About Balance
Fingering the fact that Intel Xeon processors execute storage intelligence software in the Oracle Exadata Storage Server doesn’t hold water–especially since the ratio is 2 sockets per 12 hard drives. Perhaps Chuck will tell us the maximum number of Xeon processors EMC supports in front of 960 drives in a fully loaded midrange EMC array (e.g., CX)?

Oracle has purpose-built a balanced system by coupling the power of 2 Xeon processors (Harpertown quad-core) in front of 12 drives.

Infiniband: The Exadata “Secret Sauce?”

Chuck continued with:

– you may be more knowledgeable than I, but we are under the impression that the IB compute node connection doesn’t bring much to the party. When we looked at many clustered Oracle DW implementations, there was plenty of bandwidth available between the compute nodes, using multiple 1Gb/sec links.

That’s why we don’t talk about it much. Infiniband is not why Exadata is so fast. Infiniband is one of the reasons why Exadata is not bottlenecked. First, I’ll point out that Infiniband is a unified fabric for both disk and inter-node communications with Exadata. I’ve been writing about storage up to this point and now the focus is shifted to Real Application Clusters (RAC) interconnect technology. I’ll be brief on this topic. I don’t doubt, nor do I care, that there are clustered Oracle DW systems currently deployed that are able to get by with multiple UDP Gigabit Ethernet networks configured as the RAC interconnect. That’s just fine with me. Does that somehow negate the value of Exadata because Oracle so foolhardy engineered a zero-copy RDMA interconnect for RAC while unifying interconnect and storage networking into a single fabric? I shouldn’t think so. UDP costs some (lots) of cycles compared to ZDP over Infiniband. Just because a network has headroom left over doesn’t mean resources are otherwise being utilized efficiently. Oracle didn’t aim to engineer bottlenecks into the Exadata architecture.

Chuck continued with:

And, I know this only matters to storage people, but there’s the minor matter of having two copies of everything, rather than the more efficient parity RAID approaches. Gets your attention when you’re talking 10-40TB usable, it does.

Yes, the initial release of Exadata requires 1:1 mirroring. Does that somehow insinuate that Exadata will never offer the more space-saving RAID approached Chuck is alluding to? Life is, after all, an unending series of choices.

Everyone Includes EMC
Chuck continued with:

Bottom line – what does the hardware bring to the party, rather than software? And if you can get the same benefits without dictating that customers buy a specific piece of tin, isn’t that a win for everyone?

Chuck, and my blog readers alike, should know by now what the hardware brings to the party. Oracle Exadata Storage Server hardware is-unlike conventional storage arrays-not configured with guaranteed throughput bottlenecks built in. That warrants a party. On the other hand, the software is the secret sauce. The choice of which “tin” gets to run the software is, of course, someone else’s decision. I will say, however, if you were to execute the software on systems less balanced than the current platform (HP Proliant DL180 G5), you would not realize the benefit. It’s all about balance.

Chuck finished with:

Finally, I’d be interested in your thoughts on how enterprise flash drives fit into all of this. Yes, they’re rather expensive now, but this won’t be the case before too long.

I’ve bored you all to death already. I’ll hit FLASH SSD in my next blog entry.

Linux is Perfect So Why Would You Monitor Performance?

If you have Oracle Database deployed on Linux and have not yet settled upon collectl as your primary tool for system-level performance data collection, well, we need to have a long talk!

Read This Blog: Christian Antognini.

Christian Antognini is a fellow member of the Oaktable Network and while I’ve only (briefly) met him once face-to-face, I’ll still recommend his blog! No, honestly, all joking aside I must say that I appreciate Christian’s blogging.

Oracle Exadata Storage Server: Beaten by FLASH SSD and Worthless for OLTP.

I’ve never met Mike Ault, but some friends of mine, who are fellow OakTable Network members, say he’s a great guy and I believe them. Mike works at Texas Memory Systems and I know some of those guys as well (Hi Woody, Jamon). Pleasantries aside, I have to call out some of the content Mike posted on a recent blog entry about HP Oracle Database Machine and Exadata Storage Server. Just because I blog about someone else’s posted information doesn’t mean I’m “out to get them.” Mike’s post made it clear I need to address a few things. Mike’s post was not vicious anti-Technical Marketing by any means, but it was riddled with inaccuracies that deserve correction.

Errata

While these first two errata I will point out may seem moot to many readers, I think accuracy is important if one intends to contrast one technology offering against another. After all, you won’t find me posting blog entries about Texas Memory Systems SSD being based on core memory or green Jell-O.

The first error I need to point out is that Mike refers to Oracle Exadata Storage Server cells as “block” or “blocks” 8 times in his post.

The second error I need to point out is rooted in the following quote from Mike’s blog entry:

These new storage and database devices offer up to 168 terabytes of raw storage with 368 gigabytes of caching and 64 main CPUs

That is partially true. With the SATA option, the HP Oracle Database Machine does offer 168TB of gross disk capacity. The error is in the “368 gigabytes of cache” bit. The HP Oracle Database machine does indeed come with 8 Real Application Clusters hosts in the Database grid configured with 32GB RAM and 14 Exadata Storage Server cells with 8GB each. However, it is entirely erroneous to suggest that the entirety of physical memory across both the Database grid and Storage grid somehow work in unison as “cache.” It’s not that the gross 368GB (8×32 + 14×8) isn’t usable as cache. It’s more the fact that none of it is used as user-data cache–at least not cache that somehow helps out with DW/BI workloads. The notion that it makes sense to put 368GB of cache in front of, say, a 10TB table scan, and somehow boost DW/BI query performance, is the madness that Exadata aims to put to rest. Here’s a rule:

If you can’t cache the entirety of a dataset you are scanning, don’t cache at all.

– Kevin Closson

Cache, Gas or a Full Glass. Nobody Rides for Free.

No, we don’t use the 8x32GB physical memory in the Database grid as cache because cycling, say, the results of a 2TB table scan through 368GB aggregate cache would do nothing but impede performance. Caching costs, and if there are no cache hits there is no benefit. Anyone who claims to know Oracle would know that parallel query table scans do not pollute the shared cache of Oracle Database. A more imaginative, and correct, use for the 32GB RAM in each of the hosts in the Database grid would be for sorting, joins (hash, etc) and other such uses. Of course you don’t get the entire 32GB anyway as there is an OS and other overhead on the server. But what about the 8GB RAM on each Oracle Exadata Storage cell?

One of the main value propositions of Oracle Exadata Storage Server is the fact that lower-half query functionality has been offloaded to the cells (e.g., filtering, column projection, etc). Now, consider the fact that we can scan disks in the SAS-based Exadata Storage Server at the rate of 1GB/s. We attack the drives with 1MB physical reads and buffer the read results in a shared cache visible to all threads in the Oracle Exadata Storage Server software. To achieve 1GB/s with 1MB I/O requests requires 1000 physical I/Os per second. OK, now I’m sure all the fully-cached-conventional-array guys are going to point out that 1000 IOPS isn’t worth talking about, and I’d agree. Forget for the moment that 1GB/s is in fact very close to the limit of data transfer many mid-range storage arrays have to offer. No, I’m not trying to get you excited about the 1GB/s because if that isn’t enough you can add more. What I’m pointing out is the fact that the results of 1000 IOPS (each 1MB in size) must be buffered somewhere while the worker threads rip through the data blocks applying filtration and plucking out cited columns. That’s 125 1MB filtration and projection operations per second per processor core. There is a lot going on and we need ample buffering space to do the offload processing.

Mike then moved on to make the following statement:

The Oracle Database Machine was actually designed for large data warehouses but Larry assured us we could use it for OLTP applications as well. Performance improvements of 10X to 50X if you move your application to the Database Machine are promised.

I’m not going to write guarantees, but no matter, that statement only lead in to the following:

This dramatic improvement over existing data warehouse systems is provided through placing an Oracle provided parallel processing engine on each Exadata building block so instead of passing data blocks, results are returned. How the latency of the drives is being defeated wasn’t fully explained.

Exadata Storage Server Software == Oracle Parallel Query

Folks, the Storage Server software running in the Exadata Storage Server cell is indeed parallel software and threaded, however, it is not entirely correct to state that there is a “parallel processing engine” that returns “results” from Exadata cells. More correctly, we offload scans (a.k.a. Smart Scan) to Exadata cells. Smart Scan technology embodies I/O, filtration, column projection and rudimentary joins. Insinuating otherwise makes Exadata out to be more of a database engine than intelligent storage and there is more than a subtle difference between the two concepts. So, no, “results” aren’t returned, filtered rows and projected columns are. That is not a nit-pick.

DW/BI and I/O Latency

Mike finished that paragraph with the comment about how Oracle Exadata Storage Server “defeats” (or doesn’t) drive latency. I’ll simply point out that drive latency is not an issue with DW/BI workloads. The problem (addressed by Exadata) is the fact that attaching just a few modern hard drives to a conventional storage array leaves you with a throughput bottleneck. Exadata doesn’t do anything for drive latency because, shucks, the disks are still round, brown spinning thingies.  Exadata does, however, make a balanced offering that that doesn’t bottleneck the drives.

Mike continued with the following observation:

So in a full configuration you are on the tab for a 64 CPU Oracle and RAC license and 112 Oracle parallel query licenses

Yes, there are 64 processor cores in the Database grid component of the HP Oracle Database Machine, but Mike mentioning the 112 processor cores in the Exadata Storage Server grid is clearly indicative of the rampant misconception that Exadata Storage Server software is either some, most, or all of an Oracle Parallel Query instance. People who have not done their reading quickly jump to this conclusion and it is entirely false. So, mentioning the 112 Exadata Storage Server grid processors and “Oracle parallel query licenses” in the same breath is simple ignorance.

Mike continues with the following assertion:

Targeting the product to OLTP environments is just sloppy marketing as the system will not offer the latency needed in real OLTP transaction intensive shops.

While Larry Ellison and other important people have stated that Exadata fits in OLTP environments as well as DW/BI, I wouldn’t say it has been marketed that way, and certainly not sloppily. Until you folks see our specific OLTP numbers and value propositions I wouldn’t set out to craft any positioning pieces. Let me just say the following about OLTP.

OLTP Needs Huge Storage Cache, Right?

OLTP is I/O latency sensitive, but mostly for writes. Oracle offers a primary cache in the Oracle System Global Area disk buffer cache. Applications generaly don’t miss SGA blocks and immediately re-read them at a rate that requires sub-millisecond service times. Hot blocks don’t age out of the cache. Oracle SGA cache misses generally access wildly random locations, or result in scanning disk. So, for storage cache to offer read benefit it must cover a reasonable amount of the wildly, randomly accessed blocks. The SGA and intelligent storage arrays share a common characteristic: the same access patterns that blow out the SGA cache also blow out storage array cache. After all, architecturally speaking, the storage array cache serves as a second-level cache behind the SGA. If it is the same size as the SGA it is pretty worthless. If it is, say, 10 times the size of the SGA but only 1/50th the size of the database it is also pretty useless-with the exception of those situations when people use storage array cache to make up for the fact that they are using, say, 1/10th the number of drives they actually need. Under-provisioning spindles is not good but that is an entirely different topic.

I know there are SAN array caches in the terabyte range and Mike speaks of multi-terabyte FLASH SSD disk farms. I suppose these are options-for a very select few.

Most Oracle OLTP deployments will do just fine running against non-bottlenecked storage with a reasonable amount of write-cache. Putting aside the idea of an entirely FLASH SSD deployment for a moment, the argument about storage cache helping OLTP boils down to what percentage of the SGA cache misses can be satisfied in the storage array cache and what overall performance increase that yields.

The Eye of a Needle

Recently, I was looking at the specification sheet for a freshly released mid-range Fibre Channel SAN storage array that supports up to 960 disk drives plumbed through a two-headed controller. The specification sheet shows a maximum of 16GB cache per storage processor (up to two of them). I should think the cache is mirrored to accommodate storage processor failure-maybe it isn’t, I don’t know. If it is mirrored, let’s pretend for a moment that mirroring storage processor cache is free even with modify-intensive workloads (subliminal man says it isn’t). Given this example, I have to ask who thinks 16GB of storage array in front of hundreds of drives offers any performance increase? It doesn’t, so let’s put to rest the OLTP storage cache benefit argument.

But Mike Wasn’t Talking About Storage Array Cache

Right, Mike wasn’t talking about storage array cache benefit in an OLTP environment, but he was talking about nosebleed IOP rates from FLASH SSD. When referring to Exadata, Mike stated (quote):

What might be an alternative? Well, how about keeping your existing hardware, keep your existing licenses, and just purchase solid state disks to supplement your existing technology stack? For that same amount of money you will shortly be able to get the same usable capacity of Texas Memory Systems RamSan devices. By my estimates that will give you 600,000 IOPS, 9 GB/sec bandwidth (using fibre Fibre Channel , more withor Infiniband), 48 terabytes of non-volatile flash storage[S1] , 384 GB of DDR cache and a speed up of 10-50X depending on the query (based on tests against the TPCH data set using disks and the equivalent Ram-San SSD configuration).

OK, there is a lot to dissect in that paragraph. First there is the attractive sounding 600,000 IOPS with sub-millisecond response time. But wait, Mike suggests keeping your existing hardware. Folks, if you have existing hardware that is capable of driving OLTP I/O at the rate of 600,000 IOPS I want to shake your hand. Oracle OLTP doesn’t just issue I/O. It performs transactions that hammer the SGA cache and suffer some cache misses (logical to physical I/O ratio). The CPU cost wrapped around the physical I/O is not trivial. Indeed, the idea is to drive up CPU utilization and reduce physical I/O through schema design and proper SGA caching. Those of you who are current Oracle practitioners are invited to analyze your current production OLTP workload and assess the CPU utilization associated with your demonstrated physical I/O rate. If you have an OLTP workload that is doing more than, say, 5000 IOPS (physical) per processor core and you are not 100% processor-bound, tell us about it.

Yes, there are tricked out transactional benchmarks that shave off real-world features and code path and hammer out as much as 10,000 IOPS per processor core (on very powerful CPUs), but that is not your workload, or anyone else’s workload that reads this blog. So, if real OLTP saturates CPU at, say, 5000 IOPS I have to wonder what your “existing hardware” would look like if it were also able to take advantage of 600,000 IOPS. That would be a very formidable Database grid with something like 120 CPUs. Remember, Mike was talking about using existing hardware to take advantage of SSD instead of Exadata. If you have a 120 CPU Database grid, I suspect it is so critical that you wouldn’t be migrating it to anything. It is simply too critical to mess with. I should hope. Oh, it’s actually more like about 2000 IOPS per processor core in real life anyway, but that that doesn’t change the point much. And Exadata isn’t really about OLTP.

Let’s focus more intently on Mike’s supposition that an alternative to Exadata is “keeping your existing hardware” and feeding it 9GB/s from SSD. OK, first, that is 36% less I/O bandwidth than a single HP Oracle Database Machine can do, but let’s think about this for a moment. The Fibre Channel plumbing required for the Database grid to ingest 9GB/s is 23 active 4GFC FC HBAs at max theoretical throughput. That’s a lot of HBAs, and you need systems to plug them into. Remember, this is your “existing system.”

How much CPU does your “existing hardware” require to drive the 23 FC HBAs? Well, it takes a lot. Yes, I know you can use just a blip of CPU to mindlessly issue I/O in such a fashion as Orion or some pure I/O subsystem invigoration like dd if=/dev/zero of=/dev/sda bs=1024k, but we are talking about DW/BI and Oracle. Oracle actually does stuff with the data returned from an I/O call. With non-Exadata storage, the CPU cost associated with I/O (e.g., issuing, reaping), filtration, projection, joining, sorting, aggregation, etc is paid by the Database grid. So your “existing system” has to be powerful enough to do the entirety of SQL processing at a rate of 9GB/s. Let’s pretend for a moment that there existed on the market a 4-socket server that could accommodate 23 FC HBAs. Does anyone think for a moment that the 4 processors (perhaps 8 or 16 cores) can actually do anything reasonable with 9GB/s I/O bandwidth? A general rule is to associate approximately 4 processor cores with each 4GFC HBA (purposefully ignoring trick benchmark configurations). I think it looks like “your existing system” has about 96 processor cores.

A Chump Challenge

I’d put a HP Oracle Database Machine (64-core/14 cell) up against a 96-core/9GBPS FLASH SSD system any day of the week. I’d even give them 128 Database tier CPUs and not worry.

People keep forgetting that scans are offloaded to Exadata with the HP Oracle Database Machine. People shouldn’t craft their position pieces against Exadata by starting at the storage-regardless of the storage speeds and feeds.

It will always take more Database grid horsepower, in a non-Exadata environment, to drive the same scan rates offered by the HP Oracle Database Machine.

FLASH SSD

Did I mention that there is nothing (technically) preventing us from configuring Exadata Storage Server with 3.5″ FLASH SSD drives? Better late than never, but it isn’t really worth mentioning at this time.

The “City of Brotherly Love” Loves Exadata. I Love That.

BLOG CORRECTION: I misread the announcement. Oracle’s own Robert Stackowiac is presenting Exadata at the PAOUG meeting. I have all the confidence in the world that he’ll do a fantastic job.

According to this post on blogs.oracle.com, the excitement level over Exadata Storage Server is increasing. It seems the Philadelphia Area Oracle Users Group is featuring Mark Rittman to discuss the future of Oracle BI/DW architecture. There will also be a presentation on Oracle Database Machine and Exadata Storage Server. I’m glad to hear interest is building in the product, but I have to admit I’m a bit confused because I think Exadata is the future (present) of Oracle BI/DW. I also don’t know how anyone outside Oracle Corporation could so quickly amass the required expertise to present Exadata. I suppose I’m being petty. Nonetheless, I’m excited Exadata is deemed important enough to make the keynote address at this gathering. The abstract for the keynote reads:

Data volumes are exploding generating larger and larger databases, and getting to the right data instantly requires a new way to manage today¿s systems. New and revolutionary solutions and methodologies are converging to address this need, and taking a fresh look at the challenge reveals new insights. This keynote will examine the intersection of new database, data warehouse, and storage solutions that deliver on these requirements.

Hmmm…”new database, data warehouse, and storage solutions…”

I hope they get it right.


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 741 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.