BLOG UPDATE (22-MAR-10): Readers, please be aware that this blog entry is about the HP Oracle Database Machine (V1).
BLOG UPDATE (01-JUN-09). According to my blog statistics, a good number of new readers find my blog by being referred to this page by google. I’d like to draw new readers’ attention to the sidebar at the right where there are pages dedicated to indexing my Exadata-related posts. The original blog post follows:
I expected Oracle Exadata Storage Server to make an instant splash, but the blogosphere has really taken off like a rocket with the topic. Unfortunately there is already quite a bit of misinformation out there. I’d like to approach this with routine quasi-frequently asked question posts. When I find misinformation, I’ll make a blog update. So consider this installment number one.
Q. What does the word programmable mean in the product name Exadata Programmable Storage Server?
A. I don’t know, but it certainly has nothing to do with Oracle Exadata Storage Server. I have seen this moniker misapplied to the product. An Exadata Storage Server “Cell”-as we call them-is no more programmable than a Fibre Channel SAN or NAS Filer. Well, it is of course to the Exadata product development organization, but there is nothing programmable for the field. I think, perhaps, someone may have thought that Exadata is a field programmable gate array (FPGA) approach to solving the problem of offloading query intelligence to storage. Exadata is not field-“programmable” and it doesn’t use or need FPGA technology.
Q. How can Exadata be so powerful if there is only a single 1gb path from the storage cells to the switch?
A. I saw this on a blog post today and it is an incorrect assertion. In fact, I saw a blogger state, “1gb/s???? that’s not that good.” I couldn’t agree more. This is just a common notation blunder. There is, in fact, 20 Gb bandwidth between each Cell and each host in the database grid, which is close to 2 gigabytes of bandwidth (maximum theoretical 1850MB/s due to the IB cards though). I should point out that none of the physical plumbing is “secret-sauce.” Exadata leverages commodity components and open standards (e.g., OFED ).
Q. How does Exadata change the SGA caching dynamic?
A. It doesn’t. Everything that is cached today in the SGA will still be cached. Most Exadata reads are buffered in the PGA since the plan is generally a full scan. That is not to say that there is no Exadata value for indexes, because there can be. Exadata scans indexes and tables with the same I/O dynamic.
Q. This Exadata stuff must be based on NAND FLASH Solid State Disk
A. No, it isn’t and I won’t talk about futures. Exadata doesn’t really need Solid State Disk. Let’s think this one through. Large sequential read and write speed is about the same on FLASH SSD as rotating media, but random I/O is very fast. 12 Hard Disk Drives can saturate the I/O controller so plugging SSD in where the 3.5″ HDDs are would be a waste.
Q. Why mention sequential disk I/O performance since sequential accesses will only occur in rare circumstances (e.g., non-concurrent scans).
A. Yes, and the question is what? No, honestly. I’ll touch on this. Of course concurrent queries attacking the same physical disks will introduce seek times and rotational delays. And the “competition” can somehow magically scan different table extents on the same disks without causing the same drive dynamic? Of course not. If Exadata is servicing concurrent queries that attack different regions of the same drives then, yes, by all means there will be seeks. Those seek, by the way, are followed by 4 sequential 1MB I/O operations so the seek time is essentailly amortized out.
Q. Is Exadata I/O really sequential, ever?
A. I get this one a lot and it generally comes from folks that know Automatic Storage Management (ASM). Exadata leverages ASM normal redundancy mirroring which mirrors and stripes the data. Oh my, doesn’t that entail textbook random I/O? No, not really. ASM will “fill” a disk from the “outside-in. ” This does not create a totally random I/O pattern since this placement doesn’t randomize from the outer edge of the platters to the spindle and back. In general, the “next” read on any given disk involved in a scan will be at a greater offset in the physical device and not that “far” from the previous sectors read. This does not create the pathological seek times that would be associated with a true random I/O profile.
When Exadata is scanning a disk that is part of an ASM normal redundancy disk group and needs to “advance forward” to get the next portion of the table, Exadata directs the drive mechanics to position at the specific offset where it will read an ASM allocation unit of data, and on and on it goes. Head movements of this variety are considered “short seeks.” I know what the competition says about this topic in their positioning papers. Misinformation will be propagated.
Let me see if I can handle this topic in a different manner. If HP Oracle Exadata Storage Server was a totally random I/O train wreck then it wouldn’t likely be able to drive all the disks in the system at ~85MB/s. In the end, I personally think the demonstrated throughput is more interesting than an academic argument one might stumble upon in an anti-Exadata positioning paper.
Well, I think I’ll wrap this up as installment one of an on-going thread of Q&A on HP Oracle Exadata Storage Server and the HP Oracle Database Machine.
Don’t forget to read Ron Weiss’ Oracle Exadata Storage Server Technical Product Whitepaper. Ron is a good guy and it is a very informative piece. Consider it required reading-especially if you are trolling my site in the role of competitive technical marketing. <smiley>