Oracle Exadata Storage Server FAQ – Part V. Sweet and Sour Disk. | Kevin Closson's Blog: Platforms, Databases and Storage

This is installment number five in my series on Oracle Exadata Storage Server and HP Oracle Database Machine frequently asked questions. I recommend you also visit The Index of my other Exadata posts. I’m mostly cutting and pasting questions from the comment threads of my blog posts about Oracle Exadata Storage Server and the HP Oracle Database Machine and mixing in some assertions I’ve seen on the web (and re-phrasing them as questions). If they read as questions when I see them I cut and paste them without modification.

Q. […] for joining large datasets, such as a fact tables anda very large dimension, we currently encourage the use of equipartitioning on the join key to reduce messaging (CPU) and the volume of data distributed to PQ slaves (memory/storage).Would there be (a) a benefit and (b) a mechanism for ensuring that matching partitions of commonly joined tables are colocated on the same cell, so that the join can be performed entirely at the storage level?

A. The answer to both (a) and (b) is no. There is no way to co-locate table or indexes on a cell given the fact that we use ASM to stripe and mirror between cells. To do this we would have to replicate partitions (in entirety) across cells-for redundancy. ASM is unaware of what type of data is stored in blocks. Besides, the only type of “join” we do at this stage is based on bloom filters. Bloom filters are created in the database tier (grid) and then pushed down to storage.

Q. Where can I find the official documentation for the Exadata Product Family? I could only find datasheet and white paper.

A. Documentation is only provided to licensed customers.

Q. How [does] the optimizer decide whether to use an index or a Smart Scan for predicate filtering? I guess he will choose the faster one, but what’s faster and when?

A. Ah, but we don’t call is Smart Table Scan. Exadata Storage Server uses Smart Scans on indexes with the same brute force and intelligence (e.g., filtration) as it does with tables.

Q. Why use the outer part of the disk for everything? I.e., why not use the slower part for mirroring?

A. This question is in reference to the fact that we recommend folks allocate the outer portions of their disks to disk groups that will contain “hot” data. Strictly speaking, there is a way to do this with Automatic Storage Management Failure Groups. I tend not to use Failure Groups to address what the reader is asking because the origin of that feature was to equip administrators with the necessary tools to do the hard work of ensuring ASM mirrors between separate controllers and/or cabinets, etc. Exadata is much more aware of the underlying storage so this level of admin effort is not needed.

Automatic Storage Management mirrors the contents of the logical management unit (a disk group) and does not mirror between disk groups unless you have Failure Groups. If we supported a RAID 0+1 (with BCL) style approach between disk groups then I would put a hot-mirror-side disk group on the sweet sectors and a cold-mirror-side disk group closer to the spindles and only read the cold-mirror-side in the event of a failure. But, we don’t do that with singleton disk groups. Instead, we do a quasi-RAID 1+0 **within** the disk group. As such we consume sweet disk for both the primary and secondary extents. In the event of a loss, I’d say our current approach is better because the application will continue to be serviced with I/O from sweet sectors. On the other hand, if there is never a loss we are consuming sweet disk for naught. It is a trade-off.

The other problem with a RAID 0+1 (hot-to-cold) striped-then-mirrored approach is suffered by OLTP workloads because with the typical read/write ratio of OLTP we’d be wildly flailing between the sweet and sour sectors to satisfy the writes. Remember, we are not a one-pony show.

2 Responses to “Oracle Exadata Storage Server FAQ – Part V. Sweet and Sour Disk.”

Feed for this Entry Trackback Address

1 Daniel October 5, 2008 at 4:01 pm

Hi Kevin

Thanks a lot for answering my questions and correcting my spelling mistakes (…sometimes my dyslexia plays tricks on me). Especially the sentence “Results from a Smart Scan are not “real” database blocks thus the results do not get cached in the SGA. ” helped me to straighten things out, again.
I think it’s sad that the documentation is only available for licensed customers. I always liked the way Oracle shared documentation and software (for test purposes) with the community.

Cheers Daniel

1 Oracle Exadata Storage Server and Oracle Database Machine Related Posts « Everything Exadata Trackback on February 23, 2009 at 9:02 pm

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage