I recently read an article by Curt Monash entitled Interpreting the results of data warehouse proofs-of-concept (POCs). Curt’s post touched on a topic that continually mystifies me. I’m not sure when the phenomenon started, but I’ve witnessed a growing trend towards complete lack of scrutiny when it comes to the performance claims made by most vendors in the data warehousing space. For example, Netezza makes a blanket claim that their appliance is 100-fold faster than Oracle. Full stop. Er, not full stop… Netezza doesn’t stop there. They claim:
While Netezza makes claims of 100x performance gains, it is not uncommon to see performance differences as large as 200x to even 400x or more when compared to existing Oracle systems.
100x Speed-up: Child’s Play
But, honestly, 100x is child’s play. Forget for a moment that there is no itemization of where that speedup would come from in Netezza’s high-level messaging. Such information would be technical marketing and I wouldn’t expect Netezza to disclose any sort of justification for where 100x speedup comes from. Lowered expectations. Shucks, these DW arms-race marketing claims remind me of that famous Saturday Night Live skit that seems to have served as the play book for these marketing guys-in more ways than one!
Ok, chuckles aside, Curt’s post on the topic included a link to a spreadsheet of recent Proof of Concept results where “the incumbent” was trounced to the tune of 335x in reporting tasks. Like I said, 100x is child’s play.
Nobody should look at a claim such as 335x without wondering where in the world such a speedup comes from and shame on any vendor that isn’t willing to itemize the benefit. After all, without some knowledge of what produces such astounding speedup, how is the dutiful DW practitioner to expect the speedup to remain intact over time or, moreover, how to replicate the “magic” elsewhere. I’m more than willing to itemize to anyone any claim of Oracle Exadata Storage Server speed up on any query. Exadata is not “magic” so accounting for its benefit is very easy to do. But, back to the 335x for a moment. This is actually quite simple. To get 335x speedup one of the following is true:
- The query was executed on a platform with 335x storage bandwidth
- The query was executed on a platform with 335x processor bandwidth
- The query manipulated 1/335th the amount of data
- Some combination of items 1 through 3
Number 3 in the list is achieved through things like partition elimination, indexing, materialized views, more efficient joins, and so forth. This is what Oracle refers to as the “Brainy Approach” to improved data warehouse query performance. Of course Oracle has, and retains all these “Brainy” optimization approaches, and more, when Exadata is in play. Exadata is a solution offering both “Brainy” and, most importantly, “Brawny” technology.
Let’s think this 335x thing through for a moment. Imagine that the 335x was a Netezza 10100 and the 335x was an improvement over a traditional Oracle incumbent (no Exadata). One of Netezza’s main value propositions is that they are able to utilize full bandwidth of all the disks in the system in parallel-just like Exadata. That’s the “Brawny” approach. As I point out in my post about “arcane” disk technology, this value proposition is the least we deserve, but because of typical storage provisioning most Oracle deployments don’t benefit from the aggregate bandwidth their drives could actually offer. So kudos to Netezza for that. What if this was Netezza and the 335x was due to the NPS 10100 “Brawny” disk bandwidth capability? Well, that chalks the win to item 1 in the list and therefore the incumbent system was configured with 1/335th the amount of disk bandwidth of the NPS system. If I grant the NPS system 70 MB/s per disk drive I get roughly 7.5 GB/s (108 * 70MB). Does that mean the incumbent was ingesting only 22 MB/s (7.5 GB/335)? Would anyone care about that result? Would you be proud if you got more performance from 108 SATA drives than a single USB 2.0 drive? I shouldn’t think the 335x came solely from list item 1.
The NPS 10100 has 108 processors pounding on the data as it comes off the drives. Can we get 335x over our imaginary incumbent from sheer processing power? Sure, so long as the incumbent was running Oracle on a processor with 1/3rd the bandwidth of a single PowerPC processor (the embedded CPU on a Netezza SPU). Would anyone be excited to beat 1/3rd a CPU with 108 CPUs?
No, folks, the 335x was certainly the product of item 4 on the list-with a very heavy slant towards item 3-regardless of which appliance vendor it actually was.
A 335 Fold Improvement is Child’s Play? I want 485 Fold!
Humor me as I walk through a little exercise to elaborate more on this topic. In the following session I’ll demonstrate a query accessing precisely the same amount of data using the same SQL, in the same Oracle session, attached to the same Oracle database. You’ll see that I execute a host command to prove that within the scope of 15 seconds I am able to demonstrate a 485x speedup. You can choose to believe me or not, but the facts are as follows:
- The amount of data in the table is the same in each case.
- The data in every column of every row is the same.
- The order of rows in the table is the same.
- There is no compression involved at any point.
- The table datatypes are the same.
- The query plan is the same.
- The Oracle Parallel Query Degree of Parallelism remains constant. That means equal CPUs attacking the data.
- There are no indexes, materialized views, partitioning or any sort of data elimination.
- The Oracle Results Cache feature was not used.
- The data in each case resided on the same disks.
And, oh, before I forget to say so, this is Exadata. So, can Oracle market Exadata as 485x faster than Exadata without the use of any data elimination techniques? See for yourself and fill out a comment with your explanation for what I have shown here.
First, a listing of the “demo” script:
SQL> !cat demo.sql set echo off set timing off col sum_sales format 999,999,999,999,999,999 host date desc card_trans set echo on select count(*) from card_trans; set timing on select sum(purchase_amt) sum_sales from card_trans; host date
In the following screen capture I’ll show that the query took 970 seconds to complete. I used the SUM aggregate against the 100+ million purchase_amt column values as a means to show I’m querying the same content in both cases.
SQL> @demo Wed Dec 10 08:59:48 PST 2008 Name Null? Type ----------------------------------------- -------- ---------------------------- CARD_NO NOT NULL VARCHAR2(20) CARD_TYPE CHAR(20) MCC NOT NULL NUMBER(6) PURCHASE_AMT NOT NULL NUMBER(6) PURCHASE_DT NOT NULL DATE MERCHANT_CODE NOT NULL NUMBER(7) MERCHANT_CITY NOT NULL VARCHAR2(40) MERCHANT_STATE NOT NULL CHAR(3) MERCHANT_ZIP NOT NULL NUMBER(6) SQL> select count(*) from card_trans; COUNT(*) ---------- 107389152 SQL> SQL> set timing on SQL> SQL> select sum(purchase_amt) sum_sales from card_trans; SUM_SALES ------------------------ 6,443,502,770 Elapsed: 00:16:10.15 SQL> SQL> host date Wed Dec 10 09:32:08 PST 2008
The first pass of the script ended in the same session at 9:32:08 and 11 seconds later I executed the script again. The session capture shows that there was a 485x speed up (970 seconds down to 2 seconds). Like I said, “100x is childs play.” Well, at least it is when there is no accounting offered for the improvement. Pshaw, it seems I learned a lot from that “training” video I reference above.
SQL> @demo SQL> SQL> set echo off Wed Dec 10 09:32:19 PST 2008 Name Null? Type ----------------------------------------- -------- ---------------------------- CARD_NO NOT NULL VARCHAR2(20) CARD_TYPE CHAR(20) MCC NOT NULL NUMBER(6) PURCHASE_AMT NOT NULL NUMBER(6) PURCHASE_DT NOT NULL DATE MERCHANT_CODE NOT NULL NUMBER(7) MERCHANT_CITY NOT NULL VARCHAR2(40) MERCHANT_STATE NOT NULL CHAR(3) MERCHANT_ZIP NOT NULL NUMBER(6) SQL> select count(*) from card_trans; COUNT(*) ---------- 107389152 SQL> SQL> set timing on SQL> SQL> select sum(purchase_amt) sum_sales from card_trans; SUM_SALES ------------------------ 6,443,502,770 Elapsed: 00:00:01.96 SQL> SQL> host date Wed Dec 10 09:32:23 PST 2008 SQL> select count(*) from user_indexes ; COUNT(*) ---------- 0 Elapsed: 00:00:00.11
Part II in this series: click here.