Oracle Exadata Storage Server: A Black Box with No Statistics.

A question came in about whether it is possible to measure how much data is filtered out when running a query serviced by a Smart Scan in the Exadata Storage Server grid. The following is the long answer.

An Example of Offload Processing Effectiveness Accounting

I’d like to answer this question by taking real information from a test configuration consisting of 4 Real Application Clusters nodes attached to 6 Exadata Storage Server cells with the SAS disk option (12 x 300GB 3.5″ 15K RPM).

Test Workload – The Affinity Card Program Test Database

The Affinity Card Program Test Database (ACPTD) is a test schema and set of queries that mimics the type of queries performed by a marketing group responsible for promoting the use of a retail club card and affinity credit card for a fictitious company. In this example deployment of the ACPTD, the group responsible for promoting club card and affinity credit card activity for Acme Inc has built a data mart with data from several sources, including the main corporate ERP and CRM systems; partner merchant ERP systems; and, outside syndicated data.

For this blog post I’ll focus on the main syndicated card transaction table called all_card_trans. However, there are other tables in the schema and for reference sake the following text box shows the table sizes in my current deployment of the test kit. As the query shows, the all_card_trans table is 613 GB. Yes, I know that’s small, but this is a test system and I don’t like watching 6 Exadata cells hammering 1 GB/s each for tens of minutes when a couple of minutes will do. I hope you’ll understand.

SQL> set echo on
SQL> @tab_sizes
SQL> col segment_name for a32
SQL> select segment_name, sum(bytes)/1024/1024 mb from user_segments
  2  where segment_name not like 'BIN%' and segment_name not like 'SYS%'
  3  group by segment_name;

SEGMENT_NAME                             MB
-------------------------------- ----------
PARTNER_MERCHANTS                         8
PARTNER_MERCHANT_SALES                67780
ZIPCODES                                  8
OUR_SALES                             21956
ALL_CARD_TRANS                       629032
MCC                                       8
OUR_STORES                                8
CUST_SERVICE                             76
TM                                   .03125
CUST_FACT                              4708

10 rows selected.

The table definition for the all_card_trans table is shown in the following text box:

SQL> desc all_card_trans
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CARD_NO                                   NOT NULL VARCHAR2(20)
 CARD_TYPE                                          CHAR(20)
 MCC                                       NOT NULL NUMBER(6)
 PURCHASE_AMT                              NOT NULL NUMBER(6)
 PURCHASE_DT                               NOT NULL DATE
 MERCHANT_CODE                             NOT NULL NUMBER(7)
 MERCHANT_CITY                             NOT NULL VARCHAR2(40)
 MERCHANT_STATE                            NOT NULL CHAR(3)
 MERCHANT_ZIP                              NOT NULL NUMBER(6)

The following queries show the total row count in the all_card_trans table as well as the number of rows representing transactions with credit cards with leading 4 digits of 3333 and 4447, which represent the card block granted to our fictitious company, Acme Inc, for their affinity card program. This is a needle in a haystack query since only .3% of the data matches the where predicate. Now, before you look at the text box please note that this is a single instance of Real Application Clusters driving this query. And, yes, the RAC node is a small 8-core DL360. Both scans are driving storage at 6 GB/s (613GB/102s). Folks, remember that to do what I just did in the following text box with Fibre Channel I’d have to have a system with 15 FC HBAs attached to several huge SAN arrays. Oh, and I should point out that it would also require about 85 Netezza Snippet Processing Units to match this throughput.

SQL> select count(*) from all_card_trans where  card_no like '4777%' or card_no like '3333%’;

  COUNT(*)
----------
  22465830

Elapsed: 00:01:42.71
SQL> select count(*) from all_card_trans;

  COUNT(*)
----------
7897611641

Elapsed: 00:01:42.47

I Can’t Believe I Ate the Whole Thing

The following query uses a method for measuring cell activity by forcing the database to ingest all rows and all columns. Don’t try to make sense of the query because it isn’t supposed to make sense. It is only supposed to drive Exadata to return all rows and all columns, which it does.

$ more demo_offload_bytes.sql

drop view ingest;
create view ingest as
select avg(length(CARD_NO)) c1,
avg(length(CARD_TYPE)) c2,
max(MCC + PURCHASE_AMT) c3,
max(length(PURCHASE_DT)) c4,
max(MERCHANT_CODE) c5,
avg(length(MERCHANT_CITY)) c6,
avg(length(MERCHANT_STATE)) c7,
max(MERCHANT_ZIP) c8 from all_card_trans;

col MB format 99999999.9
select NAME,VALUE/(1024*1024*1024) GB from gv$sysstat where STATISTIC# in (196, 44, 43 );

select c1+c2+c3+c4+c5+c6+c7 from ingest;

select NAME,VALUE/(1024*1024*1024) GB from gv$sysstat where STATISTIC# in (196, 44, 43);

The following text box shows that when I ran the query Exadata (in aggregate) scanned 613 GB of disk and returned all but 6% to the RDBMS instance (delta cell physical IO interconnect bytes). Also recorded by v$sysstat (bytes eligible) is the fact that there was nothing peculiar about the data being scanned-peculiar in any such fashion that would have interfered with offload processing (e.g., chained rows, etc). Since I asked for all the data, that is what I got. It is nice to know, however, that the entirety of the data was a candidate for offload processing.

Notice I didn’t time the query. I’ll offer more on that later in this blog post.

SQL> @demo_offload_bytes

View dropped.

View created.

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           1227.80814
cell physical IO interconnect bytes                              1151.52453
cell physical IO bytes eligible for predicate offload            1227.31451

C1+C2+C3+C4+C5+C6+C7
--------------------
          9010106.59

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                            1841.5774
cell physical IO interconnect bytes                              1727.15227
cell physical IO bytes eligible for predicate offload            1840.97177

So let’s take a look at some offload processing. The following change to the “ingest” query reduces the amount of data ingested by the database tier by only selecting rows where the credit card number started with 4777 or 3333. We know from the scan of the table that this whittles down the dataset by 99.7%.

$ more demo_offload_bytes1.sql

create view ingest_less as
select avg(length(CARD_NO)) c1,
avg(length(CARD_TYPE)) c2,
max(MCC + PURCHASE_AMT) c3,
max(length(PURCHASE_DT)) c4,
max(MERCHANT_CODE) c5,
avg(length(MERCHANT_CITY)) c6,
avg(length(MERCHANT_STATE)) c7,
max(MERCHANT_ZIP) c8 from all_card_trans act
where card_no like '4777%' or card_no like '3333%';

col MB format 99999999.9
select NAME,VALUE/(1024*1024*1024) GB from gv$sysstat where STATISTIC# in (196, 44, 43 );

set timing on
select c1+c2+c3+c4+c5+c6+c7 from ingest_less;

select NAME,VALUE/(1024*1024*1024) GB from gv$sysstat where STATISTIC# in (196, 44, 43);

The following text box shows that that Exadata scanned and filtered out the uninteresting rows. The projected columns accounted for only 1.7 GB across the iDB intelligent I/O fabric. That is, the database tier only had to ingest 1.7 GB, or 17 MB/s since the query completed in 103 seconds.

SQL> @demo_offload_bytes_1

View dropped.

Elapsed: 00:00:00.21

View created.

Elapsed: 00:00:00.00

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           1841.13743
cell physical IO interconnect bytes                              43.1782733
cell physical IO bytes eligible for predicate offload            1840.97177

Elapsed: 00:00:00.01

C1+C2+C3+C4+C5+C6+C7
--------------------
           9010106.9

Elapsed: 00:01:43.31

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           2454.80203
cell physical IO interconnect bytes                              44.8776852
cell physical IO bytes eligible for predicate offload            2454.62903

Elapsed: 00:00:00.00
SQL>

So, yes, Exadata does let you see how effectively Smart Scans are working. In this example we saw a single-table scan on a small Oracle Exadata Storage Server grid weeding out 99.7% of the data.

Sneaky, Sneaky.

You’ll notice when I executed demo_offload_bytes.sql (the non-filtering example) I was not tracking query completion time. That’s because up to this point I’ve only been showing examples driven by a single Real Application Clusters node. Now, let’s think about this. I have 8 Intel processor cores and a single inbound Infiniband path on the database server. From an inbound I/O bandwidth perspective the host can “only” ingest 1.6GB/s, but the CPUs may also further throttle that since the database is doing its own projection in this case.

I’ve shown, in this post, that this test Exadata Storage grid configuration can scan disk at 6 GB/s. The question you might ask-and I’m about to answer-is how much does this single database host throttle storage and why does that matter? Well, it matters because having ample storage bandwidth with limited database server bandwidth is the classic imbalance addressed by the HP Oracle Database Machine. So, let me run it again-with timing turned on. As you’ll see, bottlenecks are bottlenecks. This single database host cannot keep up with storage, so storage is throttled and the result is 18X increase in query completion time when compared to the heavily filtered case. Both queries had to read the same amount of data but in this case there was an imbalance in the upstream ability to ingest the data both from an I/O bandwidth and CPU perspective.

SQL> @demo_offload_bytes

View dropped.

Elapsed: 00:00:00.02

View created.

Elapsed: 00:00:00.01

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           2454.81084
cell physical IO interconnect bytes                               44.886501
cell physical IO bytes eligible for predicate offload            2454.62903

Elapsed: 00:00:00.00

C1+C2+C3+C4+C5+C6+C7
--------------------
          9010106.59

Elapsed: 00:29:26.85

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           3068.55792
cell physical IO interconnect bytes                              620.492259
cell physical IO bytes eligible for predicate offload            3068.28629

Elapsed: 00:00:00.00

Need More RDBMS Bandwidth? OK, Use It.

So, let’s see what the full-ingestion case looks like with 4 Real Application Clusters nodes. And please forgive that I forgot to aggregate the v$sysstat output. I’ve added 4-fold database grid resources so I should complete the full-ingestion query in 75% less time. The text box starts out showing that the number of RAC instances was increased from 1 to 4.

SQL> host date
Fri Oct  3 16:27:52 PDT 2008

SQL> select instance_name from gv$instance;

INSTANCE_NAME
----------------
test1

Elapsed: 00:00:00.02
SQL> select instance_name from gv$instance;

INSTANCE_NAME
----------------
test1
test4
test3
test2

Elapsed: 00:00:00.08
SQL> host date
Fri Oct  3 16:32:06 PDT 2008
SQL>
SQL> @demo_offload_bytes

View dropped.

Elapsed: 00:00:00.03

View created.

Elapsed: 00:00:00.01

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           3068.69843
cell physical IO interconnect bytes                              620.632769
cell physical IO bytes eligible for predicate offload            3068.28629
physical IO disk bytes                                           .013385296
cell physical IO interconnect bytes                              .013385296
cell physical IO bytes eligible for predicate offload                     0
physical IO disk bytes                                           .010355473
cell physical IO interconnect bytes                              .010355473
cell physical IO bytes eligible for predicate offload                     0
physical IO disk bytes                                           .031275272
cell physical IO interconnect bytes                              .031275272

NAME                                                                     GB
---------------------------------------------------------------- ----------
cell physical IO bytes eligible for predicate offload                     0

12 rows selected.

Elapsed: 00:00:00.02

C1+C2+C3+C4+C5+C6+C7
--------------------
          9010106.59
Elapsed: 00:07:25.01

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           3222.13893
cell physical IO interconnect bytes                              764.535856
cell physical IO bytes eligible for predicate offload            3221.70425
physical IO disk bytes                                            154.31714
cell physical IO interconnect bytes                              144.732708
cell physical IO bytes eligible for predicate offload            154.280945
physical IO disk bytes                                           152.860648
cell physical IO interconnect bytes                              143.367385
cell physical IO bytes eligible for predicate offload            152.828033
physical IO disk bytes                                           153.195674
cell physical IO interconnect bytes                                143.6831

NAME                                                                     GB
---------------------------------------------------------------- ----------
cell physical IO bytes eligible for predicate offload             153.13031

12 rows selected.

Elapsed: 00:00:00.02
SQL>

Well, that was predictable, but cool nonetheless. What about the filtered ingest-query? Should there be much of a speedup given the fact that storage bandwidth remains constant and the ingest rate of the filtered query was only roughly 17 MB/s with a single Real Application Clusters node? I could save the cut and paste effort and just tell you that adding Real Application Clusters nodes in this case will, of course, not reduce the query time, but since I’ve bored you this long, here it is:

SQL> @demo_offload_bytes_1

View dropped.

Elapsed: 00:00:00.01

View created.

Elapsed: 00:00:00.01

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           3222.15758
cell physical IO interconnect bytes                              764.554509
cell physical IO bytes eligible for predicate offload            3221.70425
physical IO disk bytes                                           154.334342
cell physical IO interconnect bytes                               144.74991
cell physical IO bytes eligible for predicate offload            154.280945
physical IO disk bytes                                           153.213285
cell physical IO interconnect bytes                               143.70071
cell physical IO bytes eligible for predicate offload             153.13031
physical IO disk bytes                                           152.878347
cell physical IO interconnect bytes                              143.385084

NAME                                                                     GB
---------------------------------------------------------------- ----------
cell physical IO bytes eligible for predicate offload            152.828033

12 rows selected.

Elapsed: 00:00:00.01

C1+C2+C3+C4+C5+C6+C7
--------------------
           9010106.9

Elapsed: 00:01:42.79

NAME                                                                     GB
---------------------------------------------------------------- ----------
physical IO disk bytes                                           3375.27835
cell physical IO interconnect bytes                              764.974159
cell physical IO bytes eligible for predicate offload            3374.81995
physical IO disk bytes                                             308.0746
cell physical IO interconnect bytes                              144.133529
cell physical IO bytes eligible for predicate offload            307.986572
physical IO disk bytes                                           310.634206
cell physical IO interconnect bytes                              145.189768
cell physical IO bytes eligible for predicate offload              310.5755
physical IO disk bytes                                           302.274069
cell physical IO interconnect bytes                              143.805239

NAME                                                                     GB
---------------------------------------------------------------- ----------
cell physical IO bytes eligible for predicate offload            302.218781

12 rows selected.

Elapsed: 00:00:00.01
SQL> host date
Fri Oct  3 16:49:32 PDT 2008

Summary

Exadata works and has statistics so that the solution doesn’t feel so much like a black box. In the end, however, none of the statistcs stuff really matters. What matters is whether your queries complete in the desired service times.

16 Responses to “Oracle Exadata Storage Server: A Black Box with No Statistics.”

Feed for this Entry Trackback Address

1 salem October 7, 2008 at 9:44 pm

Kevin,
couldn’t you get the stats from v$sql? did they change sqlplus ‘trace mode’ to report these, as it does report logical and physical reads?
rgds

Reply
2 kevinclosson October 7, 2008 at 10:19 pm

Hi Salem,

Yes, IO_CELL_OFFLOAD_ELIGIBLE_BYTES,IO_INTERCONNECT_BYTES and IO_DISK_BYTES are accounted for in v$sql

It is the same data as gv$sysstat in this regard though.

Reply
3 David Aldridge October 8, 2008 at 1:16 pm

No such thing as too many statistics: very good stuff.

The example you use raises another imortant point, I think, which is that storage-based filtering and projection is not a cure for every problem, and that disk i/o is not the only component in a balanced system. It’s a reminder that Oracle has long had a set of features specifically to mitigate the inefficiency of repeating calculations — materialised views, query rewrite, and in 11g various result caching features — and that even when you couple the database to a fast storage system there is still a role to be played by these.

Reply
4 kevinclosson October 8, 2008 at 3:51 pm

David,

Let me see if I get your point. I think you are saying that all the optimizations Oracle has implemented in the past (to mitigate I/O pain) are likely still helpful. Yes, you are right. The full arsenal is still there. The main thing about Exadata is that it can do I/O and in the absence of Exadata the ***only** tools you have are those which mitigate the impact of painful I/O.

Reply
5 David Aldridge October 8, 2008 at 5:49 pm

Exactly.

As well as precalculating aggregates, in the past I have created “skinny” versions of very wide fact tables in order to improve full scan time, and used query rewrite to transparantly redirect queries to them. Works great. I’ve even made completely redundant copies of large tables with different partitioning schemes to provide more effective partition pruning for targeted sets of queries.

So one appeal of about 11g+Exadata is that nothing _necessarily_ invalidates those approaches, they just change the associated cost-benefit analysis so that the number of cases where you resort to them is greatly reduced, or even eliminated.

Reply
6 kevinclosson October 8, 2008 at 6:05 pm

David,

Don’t forget the results cache 🙂

Like we say, Exadata is brute force computing …. but it doesn’t negate the “brainy” features of Oracle Database. You can use brains, brawn or both with Exadata…your choice 🙂

Reply
7 David Aldridge October 8, 2008 at 6:54 pm

It would be interesting to compare 11g+Exadata with the other vendors to see how these brainy features compare. Query rewrite, results caching, partitioning, in-database OLAP, in-database Data Mining, um … flashback is also a good feature for ETL support, for rolling back a complex failed load without needing any lines of code.

Reply
8 David Aldridge October 8, 2008 at 7:34 pm

Oh, and Apex as well

Reply
9 Doug Burns October 8, 2008 at 8:26 pm

Er, ummm, I’m trying to avoid a ‘me too’ fan-boy comment, but I can’t help myself.

You’ve both hit the nail square on the head. Exadata gives you all the DW appliance style stuff, *but* on top of a top notch, mature feature-rich DBMS. (Crikey, maybe I should work for Oracle marketing. Perhaps not. 😉 )

That’s a very attractive combination from a technical point of view, regardless of industry analysis.

Yeah, ok, I admit it. (Educated) Fan-boy comment.

Reply
10 kevinclosson October 8, 2008 at 8:35 pm

Look, the deal is that a high percentage of Netezza installs are sitting on the same raised floor as the large Oracle ERP/OLTP system from which they extract the data. I consider this madness that no longer has merit.

Reply
11 Freek October 11, 2008 at 1:16 pm

Kevin,

You made a typo in: “Both scans are driving storage at 6 GB/s (613MB/102s).”

Should probably be 613GB/102s

Reply
12 kevinclosson October 12, 2008 at 12:47 am

Freek,

Ah, if only that was the only typo…I just walked through the post again and cleaned up a few more…whew, that feels better…

Reply
13 Waseem Parkar January 4, 2011 at 9:19 am

Hi Kevin,
Was wondering if there is a way to simulate working on an Exadata storage machine.
Something like interaction with:
dcli
ipmitool
cellcli
(I mean something besides R.A.Testing.)
Believe this can be implemented in same manner as we have for Oracle Unified Storage 7000 Simulator.
This will allow for a pseudo-realistic experience.
Thanks.
Waseem.
(Please do allow for it to work on AMD too)

Reply
- 14 kevinclosson January 4, 2011 at 4:00 pm
  
  Hi Waseem,
  
  I’m sorry. What you’re looking for is not available outside the walls of Oracle. BTW, Exadata ran on AMD before it ran on Intel.
  
  Reply

1 Oracle Exadata Storage Server and Oracle Database Machine Related Posts « Everything Exadata Trackback on February 23, 2009 at 9:02 pm
2 Exadata « Oracle Scratchpad Trackback on September 14, 2010 at 7:00 pm

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage