I’ve met people before who’d rather drink muriatic acid than admit a mistake. I don’t cotton to such folks none too purdy good. This is a post about perfection and imperfection.
Cost Based Optimizer
There is no such thing as a perfect cost-based optimizer (CBO) and the expectations placed on cost-based optimizers run rampant. If the Oracle Database 11g CBO was perfect there would be no need for an excellent performance tuning tool like SQL Tuning Advisor and Reactive Tuning. In those cases where CBO does not generate a perfect plan, Oracle Database equips administrators with a tuning toolset that generally gets it right.
What’s This Have to do with Oracle Exadata Storage Server?
In my blog entry entitled Oracle Exadata Storage Server: A Black Box with No Statistics, I discussed the Affinity Card Program Test Database (ACPTD). One of the reasons I use this schema and query set is that, unlike a benchmark kit, it is not perfect. I expect Exadata to function as advertised in an imperfect world.
Imagine an organization that inherited some imperfect data with an imperfect schema. Nah, that would never happen. What about queries that aren’t perfect? Ever seen any of those? What about imperfect query plans?
One of the business questions, in English, that I test with the Affinity Card Program Test Database reads as follows:
List our customers with Club Cards who purchased more than $1,000.00 worth of goods/services in the west retail region. Omit non-affinity card purchases and all restaurant, travel and gasoline purchases.
The following text box has the SQL for this business question. Now, before you get all excited about the usage of the mcc.misc column (line 5 in the text box), please remember what I just explained in the previous paragraph. Sometimes I like imperfection when I’m analyzing performance. But what does this have to do with Exadata Storage Server? Well, I could certainly use a codes table and anti-join the purchases that are not restaurant, travel or gasoline. No question there. However, for this particular query I wanted imperfection, not a lesson in schema design. So, the purpose behind the not like ‘restaurant%travel%gasoline%’ predicate is to test a more heavily weighted filtration effort in the Exadata Storage Server. It’s synthetic, I know, but at least it is imperfect-which I like.
with act as
(
select act.card_no, act.purchase_amt from zipcodes z ,all_card_trans act ,mcc m
where (act.card_no like '4777%' or act.card_no like '3333%') and
act.mcc = m.mcc and m.misc not like 'restaurant%travel%gasoline%' and
act.merchant_zip = z.zip and z.state in ('CA', 'OR', 'WA')
)
select
cf.custid,
sum (act.purchase_amt) sales from act,cust_fact cf
where act.card_no = cf.aff_cc_num and cf.club_card_num not like '0000%'
group by cf.custid
having sum (act.purchase_amt) > 1000;
Notice in the following text box how the plan shows Exadata Storage Server is filtering out all_card_trans, cust_fact, zipcodes and mcc rows. The filtering effort on mcc is not colossal, but certainly heftier than the filtration on zip.state, since there are synthetic data-cleanliness values in mcc.misc such as ‘restaurant%travel%gas oline’ and so forth.
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(SUM("ACT"."PURCHASE_AMT")>1000)
7 - access("ACT"."CARD_NO"="CF"."AFF_CC_NUM")
10 - access("ACT"."MERCHANT_ZIP"="Z"."ZIP")
13 - access("ACT"."MCC"="M"."MCC")
18 - storage("ACT"."CARD_NO" LIKE '4777%' OR "ACT"."CARD_NO" LIKE '3333%')
filter("ACT"."CARD_NO" LIKE '4777%' OR "ACT"."CARD_NO" LIKE '3333%')
23 - storage("M"."MISC" NOT LIKE 'restaurant%travel%gasoline%' AND SYS_OP_BLOOM_FILTER(:BF0000,"M"."MCC"))
filter("M"."MISC" NOT LIKE 'restaurant%travel%gasoline%' AND SYS_OP_BLOOM_FILTER(:BF0000,"M"."MCC"))
25 - storage("Z"."STATE"='CA' OR "Z"."STATE"='OR' OR "Z"."STATE"='WA')
filter("Z"."STATE"='CA' OR "Z"."STATE"='OR' OR "Z"."STATE"='WA')
27 - storage("CF"."CLUB_CARD_NUM" NOT LIKE '0000%')
filter("CF"."CLUB_CARD_NUM" NOT LIKE '0000%')
What Does This Have to do with Cost Based Optimizer and SQL Tuning Advisor?
The fact that Exadata does not leave any of the core Oracle Database 11g value propositions off the table is a point that has been made in the press and blogging community over and over again. I’m sure I’ll get spanked for saying this, but occasional bad plans are a fact of life with cost-based optimizers-and that includes Oracle. Those of us willing to accept and live with this reality are very appreciative of the goodness offered by the SQL Tuning Advisor. There are several canned queries in the ACPDT. The other day I loaded the scale that leaves me with an ALL_CARD_TRANS table of roughly 3TB. Of the several queries, I was disappointed with the performance of only one-the query listed above with the imperfect method for weeding out merchant codes. Since this is Exadata you can rest assured that the I/O was not a problem, but the query took 721 seconds to complete! I knew that was in error because there is partition elimination on this query along all_card_trans.card_no and 721 seconds of Oracle Exadata Storage Server Smart Scan throughput on this 6-cell configuration would rack up over 4TB of total I/O (6GB/s*721). Oracle Database does not mistakingly read an entire table when partition pruning is possible, so what was the issue?
The following text box shows the plan. Notice the estimate in step 18. The plan, it seems, didn’t have the best information about the relationship between ALL_CARD_TRANS and CUST_FACT.
------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 15 | 1395 | 207K (2)| 00:48:32 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10005 | 15 | 1395 | 207K (2)| 00:48:32 | Q1,05 | P->S | QC (RAND) | |* 3 | FILTER | | | | | | Q1,05 | PCWC | | | 4 | HASH GROUP BY | | 15 | 1395 | 207K (2)| 00:48:32 | Q1,05 | PCWP | | | 5 | PX RECEIVE | | 15 | 1395 | 207K (2)| 00:48:32 | Q1,05 | PCWP | | | 6 | PX SEND HASH | :TQ10004 | 15 | 1395 | 207K (2)| 00:48:32 | Q1,04 | P->P | HASH | |* 7 | HASH JOIN | | 15 | 1395 | 207K (2)| 00:48:32 | Q1,04 | PCWP | | | 8 | PX RECEIVE | | 2 | 104 | 206K (2)| 00:48:11 | Q1,04 | PCWP | | | 9 | PX SEND BROADCAST | :TQ10003 | 2 | 104 | 206K (2)| 00:48:11 | Q1,03 | P->P | BROADCAST | |* 10 | HASH JOIN | | 2 | 104 | 206K (2)| 00:48:11 | Q1,03 | PCWP | | | 11 | PX RECEIVE | | 2 | 88 | 206K (2)| 00:48:11 | Q1,03 | PCWP | | | 12 | PX SEND BROADCAST | :TQ10002 | 2 | 88 | 206K (2)| 00:48:11 | Q1,02 | P->P | BROADCAST | |* 13 | HASH JOIN BUFFERED | | 2 | 88 | 206K (2)| 00:48:11 | Q1,02 | PCWP | | | 14 | JOIN FILTER CREATE | :BF0000 | 2 | 58 | 206K (2)| 00:48:09 | Q1,02 | PCWP | | | 15 | PX RECEIVE | | 2 | 58 | 206K (2)| 00:48:09 | Q1,02 | PCWP | | | 16 | PX SEND HASH | :TQ10000 | 2 | 58 | 206K (2)| 00:48:09 | Q1,00 | P->P | HASH | | 17 | PX BLOCK ITERATOR | | 2 | 58 | 206K (2)| 00:48:09 | Q1,00 | PCWC | | |* 18 | TABLE ACCESS STORAGE FULL| ALL_CARD_TRANS | 2 | 58 | 206K (2)| 00:48:09 | Q1,00 | PCWP | | | 19 | PX RECEIVE | | 276 | 4140 | 83 (0)| 00:00:02 | Q1,02 | PCWP | | | 20 | PX SEND HASH | :TQ10001 | 276 | 4140 | 83 (0)| 00:00:02 | Q1,01 | P->P | HASH | | 21 | JOIN FILTER USE | :BF0000 | 276 | 4140 | 83 (0)| 00:00:02 | Q1,01 | PCWP | | | 22 | PX BLOCK ITERATOR | | 276 | 4140 | 83 (0)| 00:00:02 | Q1,01 | PCWC | | |* 23 | TABLE ACCESS STORAGE FULL| MCC | 276 | 4140 | 83 (0)| 00:00:02 | Q1,01 | PCWP | | | 24 | PX BLOCK ITERATOR | | 3953 | 31624 | 3 (0)| 00:00:01 | Q1,03 | PCWC | | |* 25 | TABLE ACCESS STORAGE FULL | ZIPCODES | 3953 | 31624 | 3 (0)| 00:00:01 | Q1,03 | PCWP | | | 26 | PX BLOCK ITERATOR | | 6278K| 245M| 1527 (1)| 00:00:22 | Q1,04 | PCWC | | |* 27 | TABLE ACCESS STORAGE FULL | CUST_FACT | 6278K| 245M| 1527 (1)| 00:00:22 | Q1,04 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------
So, I executed the query under the SQL Tuning Advisor API as follows.
var tname varchar2(500);
exec :tname := dbms_sqltune.create_tuning_task(sql_text=>' with act as ( select act.card_no, act.purchase_amt from zipcodes z ,all_card_trans act ,mcc m where
(act.card_no like ''4777%'' or act.card_no like ''3333%'') and act.mcc = m.mcc and m.misc not like ''restaurant%travel%gasoline%'' and act.merchant_zip = z.zip a
nd z.state in (''CA'', ''OR'', ''WA'')) select cf.custid, sum (act.purchase_amt) sales from act,cust_fact cf where act.card_no = cf.aff_cc_num and cf.club_card_n
um not like ''0000%'' group by cf.custid having sum (act.purchase_amt) > 1000');
exec dbms_sqltune.execute_tuning_task(:tname);
select dbms_sqltune.report_tuning_task(:tname) AS recommendations from dual;
SQL Tuning Advisor did recommend a different plan and, as you can see in the following text box, the primary difference is in the cardinality estimate on ALL_CARD_TRANS. With this learned information the join order changed and the result was a 6.5X speedup as the query completed in 110 seconds.
--------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 93 | 208K (2)| 00:48:33 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10005 | 1 | 93 | 208K (2)| 00:48:33 | Q1,05 | P->S | QC (RAND) | |* 3 | FILTER | | | | | | Q1,05 | PCWC | | | 4 | HASH GROUP BY | | 1 | 93 | 208K (2)| 00:48:33 | Q1,05 | PCWP | | | 5 | PX RECEIVE | | 86M| 7640M| 207K (2)| 00:48:32 | Q1,05 | PCWP | | | 6 | PX SEND HASH | :TQ10004 | 86M| 7640M| 207K (2)| 00:48:32 | Q1,04 | P->P | HASH | |* 7 | HASH JOIN BUFFERED | | 86M| 7640M| 207K (2)| 00:48:32 | Q1,04 | PCWP | | | 8 | PX RECEIVE | | 6278K| 245M| 1527 (1)| 00:00:22 | Q1,04 | PCWP | | | 9 | PX SEND HASH | :TQ10002 | 6278K| 245M| 1527 (1)| 00:00:22 | Q1,02 | P->P | HASH | | 10 | PX BLOCK ITERATOR | | 6278K| 245M| 1527 (1)| 00:00:22 | Q1,02 | PCWC | | |* 11 | TABLE ACCESS STORAGE FULL | CUST_FACT | 6278K| 245M| 1527 (1)| 00:00:22 | Q1,02 | PCWP | | | 12 | PX RECEIVE | | 6802K| 337M| 206K (2)| 00:48:11 | Q1,04 | PCWP | | | 13 | PX SEND HASH | :TQ10003 | 6802K| 337M| 206K (2)| 00:48:11 | Q1,03 | P->P | HASH | |* 14 | HASH JOIN | | 6802K| 337M| 206K (2)| 00:48:11 | Q1,03 | PCWP | | | 15 | PX RECEIVE | | 276 | 4140 | 83 (0)| 00:00:02 | Q1,03 | PCWP | | | 16 | PX SEND BROADCAST | :TQ10000 | 276 | 4140 | 83 (0)| 00:00:02 | Q1,00 | P->P | BROADCAST | | 17 | PX BLOCK ITERATOR | | 276 | 4140 | 83 (0)| 00:00:02 | Q1,00 | PCWC | | |* 18 | TABLE ACCESS STORAGE FULL | MCC | 276 | 4140 | 83 (0)| 00:00:02 | Q1,00 | PCWP | | |* 19 | HASH JOIN | | 7220K| 254M| 206K (2)| 00:48:10 | Q1,03 | PCWP | | | 20 | PX RECEIVE | | 3953 | 31624 | 3 (0)| 00:00:01 | Q1,03 | PCWP | | | 21 | PX SEND BROADCAST | :TQ10001 | 3953 | 31624 | 3 (0)| 00:00:01 | Q1,01 | P->P | BROADCAST | | 22 | PX BLOCK ITERATOR | | 3953 | 31624 | 3 (0)| 00:00:01 | Q1,01 | PCWC | | |* 23 | TABLE ACCESS STORAGE FULL| ZIPCODES | 3953 | 31624 | 3 (0)| 00:00:01 | Q1,01 | PCWP | | | 24 | PX BLOCK ITERATOR | | 78M| 2179M| 206K (2)| 00:48:09 | Q1,03 | PCWC | | |* 25 | TABLE ACCESS STORAGE FULL | ALL_CARD_TRANS | 78M| 2179M| 206K (2)| 00:48:09 | Q1,03 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------------------
Summary
After a brief bout with an imperfect query plan, the Oracle Database 11g SQL Tuning Advisor fixed what ailed me. With the plan offered me by SQL Tuning Advisor I was able to push my imperfect query through my imperfect schema and a perfect I/O rate of 6 GB/s for my small 6-cell Oracle Exadata Storage Server test configuration.
In spite of how many times EMC’s Chuck Hollis may claim that there is “nothing new” or “no magic” when referring to Oracle Exadata Storage Server, I think it is painfully obvious that there is indeed “something new” here. Is it magic? No, and we don’t claim that it’s magic. What we do claim is that we have moved filtration and projection down to storage and, more importantly, we don’t bottleneck disks (see Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?). So while we don’t claim Exadata is magic, there is one thing this blog post shows-all the pre-existing magic, like SQL Tuning Advisor, is still in the arsenal. When you deploy Exadata you leave nothing at the door.
Finally, a no-magic approach to something really new.




Recent Comments