Posts Tagged 'Exadata'

SLOB 2.3 Data Loading Failed? Here’s a Quick Diagnosis Tip.

The upcoming SLOB 2.4 release will bring improved data loading error handling. While still using SLOB 2.3, users can suffer data loading failures that may appear–on the surface–to be difficult to diagnose.

Before I continue, I should point out that the most common data loading failure with SLOB in pre-2.4 releases is the concurrent data loading phase suffering lack of sort space in TEMP. To that end, here is an example of a SLOB 2.3 data loading failure due to shortage of TEMP space. Please notice the grep command (in Figure 2 below) one should use to begin diagnosis of any SLOB data loading failure:


Figure 1

And now, the grep command:


Figure 2


Exadata Database Machine X2-2 or X2-8? Sure! Why Not? Part II.

In my recent post entitled Exadata Database Machine X2-2 or X2-8? Sure! Why Not? Part I, I started to address the many questions folks are sending my way about what factors to consider when choosing between Exadata Database Machine X2-8 versus Exadata Database Machine X2-2. This post continues that thread.

As my friend Greg Rahn points out in his recent post about Exadata, the latest Exadata Storage Server is based on Intel Xeon 5600 (Westmere EP) processors. The Exadata Storage Server is the same whether the database grid is X2-2 or X2-8. The X2-2 database hosts are also based on Intel Xeon 5600. On the other hand, the X2-8 database hosts are based on Intel Xeon 7500 (Nehalem EX). This is a relevant distinction when thinking about database encryption.

Transparent Database Encryption

In his recent post, Greg brings up the topic of Oracle Database Transparent Data Encryption (TDE). As Greg points out, the new Exadata Storage Server software is able to leverage Intel Advanced Encryption Standard New Instructions (Intel AES-NI) found in the Intel Integrated Performance Primitives (Intel IPP) library because the processors in the storage servers are Intel Xeon 5600 (Westmere EP). Think of this as “hardware-assist.” However, in the case of the database hosts in the X2-8, there is no hardware-assist for TDE as Nehalem EX does not offer support for the necessary instructions. Westmere EX will—someday. So what does this mean?

TDE and Compression? Unlikely Cousins?

At first glance one would think there is nothing in common between TDE and compression. However, in an Exadata environment there is storage offload processing and for that reason roles are important to understand. That is, understanding what gets done is sometimes not as important as who is doing what.

When I speak to people about Exadata I tend to draw the mental picture of an “upper” and “lower” half. While the count of servers in each grid is not split 50/50 by any means, thinking about Exadata in this manner makes understanding certain features a lot simpler. Allow me to explain.


In the case of compressing data, all work is done by the upper half (the database grid). On the other hand, decompression effort takes place in either the upper or lower half depending on certain criteria.

  • Upper Half Compression. Always.
  • Lower Half Compression. Never
  • Lower Half Decompression. Data compressed with Hybrid Columnar Compression (HCC) is decompressed in the Exadata Storage Servers when accessed via Smart Scan. Visit my post about what triggers a Smart Scan for more information.
  • Upper Half Decompression. With all compression types, other than HCC, decompression effort takes place in the upper half. When accessed without Smart Scan, HCC data is also decompressed in the upper half.


In the case of encryption, the upper/lower half breakout is as follows:

  • Upper Half Encryption. Always. Data is always encrypted by code executing in the database grid. If the processors are Intel Xeon 5600 (Westmere EP), as is the case with X2-2, there is hardware assist via the IPP library. The X2-8 is built on Nehalem EX and therefore does not offer hardware-assist encryption.
  • Lower Half Encryption. Never.
  • Lower Half Decryption. Smart Scan only. If data is not being accessed via Smart Scan the blocks are returned to the database host and buffered in the SGA (see the Seven Fundamentals). Both the X2-2 and X2-8 are attached to Westmere EP-based storage servers. To that end, both of these configurations benefit from hardware-assist decryption via the IPP libarary. I reiterate, however, that this hardware-assist lower-half decryption only occurs during Smart Scan.
  • Upper Half Decryption. Always in the case of data accessed without Smart Scan. In the case of X2-2, this upper-half decryption benefits from hardware-assist via the IPP library.

That pretty much covers it and now we see commonality between compression and encryption. The commonality is mostly related to whether or not a query is being serviced via Smart Scan.

That’s Not All

If HCC data is also stored in encrypted form, a Smart Scan is able to filter out vast amount of encrypted data without even touching it. That is, HCC short-circuits a lot of decryption cost. And, even though Exadata is really fast, it is always faster to not do something at all than to shift into high gear and do it as fast as possible.

Oracle Exadata Storage Server Version 1. A “FAQ” is Born. Part I.

BLOG UPDATE (22-MAR-10): Readers, please be aware that this blog entry is about the HP Oracle Database Machine (V1).

BLOG UPDATE (01-JUN-09). According to my blog statistics, a good number of new readers find my blog by being referred to this page by google. I’d like to draw new readers’ attention to the sidebar at the right where there are pages dedicated to indexing my Exadata-related posts.  The original blog post follows:

I expected Oracle Exadata Storage Server to make an instant splash, but the blogosphere has really taken off like a rocket with the topic. Unfortunately there is already quite a bit of misinformation out there. I’d like to approach this with routine quasi-frequently asked question posts. When I find misinformation, I’ll make a blog update. So consider this installment number one.

Q. What does the word programmable mean in the product name Exadata Programmable Storage Server?

A. I don’t know, but it certainly has nothing to do with Oracle Exadata Storage Server. I have seen this moniker misapplied to the product. An Exadata Storage Server “Cell”-as we call them-is no more programmable than a Fibre Channel SAN or NAS Filer. Well, it is of course to the Exadata product development organization, but there is nothing programmable for the field. I think, perhaps, someone may have thought that Exadata is a field programmable gate array (FPGA) approach to solving the problem of offloading query intelligence to storage. Exadata is not field-“programmable” and it doesn’t use or need FPGA technology.

Q. How can Exadata be so powerful if there is only a single 1gb path from the storage cells to the switch?

A. I saw this on a blog post today and it is an incorrect assertion. In fact, I saw a blogger state, “1gb/s???? that’s not that good.” I couldn’t agree more. This is just a common notation blunder. There is, in fact, 20 Gb bandwidth between each Cell and each host in the database grid, which is close to 2 gigabytes of bandwidth (maximum theoretical 1850MB/s due to the IB cards though). I should point out that none of the physical plumbing is “secret-sauce.” Exadata leverages commodity components and open standards (e.g., OFED ).

Q. How does Exadata change the SGA caching dynamic?

A. It doesn’t. Everything that is cached today in the SGA will still be cached. Most Exadata reads are buffered in the PGA since the plan is generally a full scan. That is not to say that there is no Exadata value for indexes, because there can be. Exadata scans indexes and tables with the same I/O dynamic.

Q. This Exadata stuff must be based on NAND FLASH Solid State Disk

A. No, it isn’t and I won’t talk about futures. Exadata doesn’t really need Solid State Disk. Let’s think this one through. Large sequential read and write  speed is about the same on FLASH SSD as rotating media, but random I/O is very fast. 12 Hard Disk Drives can saturate the I/O controller so plugging SSD in where the 3.5″ HDDs are would be a waste.

Q. Why mention sequential disk I/O performance since sequential accesses will only occur in rare circumstances (e.g., non-concurrent scans).

A. Yes, and the question is what? No, honestly. I’ll touch on this. Of course concurrent queries attacking the same physical disks will introduce seek times and rotational delays. And the “competition” can somehow magically scan different table extents on the same disks without causing the same drive dynamic? Of course not. If Exadata is servicing concurrent queries that attack different regions of the same drives then, yes, by all means there will be seeks. Those seek, by the way, are followed by 4 sequential 1MB I/O operations so the seek time is essentailly amortized out.

Q. Is Exadata I/O really sequential, ever?

A. I get this one a lot and it generally comes from folks that know Automatic Storage Management (ASM). Exadata leverages ASM normal redundancy mirroring which mirrors and stripes the data. Oh my, doesn’t that entail textbook random I/O? No, not really. ASM will “fill” a disk from the “outside-in. ” This does not create a totally random I/O pattern since this placement doesn’t randomize from the outer edge of the platters to the spindle and back. In general, the “next” read on any given disk involved in a scan will be at a greater offset in the physical device and not that “far” from the previous sectors read. This does not create the pathological seek times that would be associated with a true random I/O profile.

When Exadata is scanning a disk that is part of an ASM normal redundancy disk group and needs to “advance forward” to get the next portion of the table, Exadata directs the drive mechanics to position at the specific offset where it will read an ASM allocation unit of data, and on and on it goes. Head movements of this variety are considered “short seeks.” I know what the competition says about this topic in their positioning papers. Misinformation will be propagated.

Let me see if I can handle this topic in a different manner. If HP Oracle Exadata Storage Server was a totally random I/O train wreck then it wouldn’t likely be able to drive all the disks in the system at ~85MB/s. In the end, I personally think the demonstrated throughput is more interesting than an academic argument one might stumble upon in an anti-Exadata positioning paper.

Well, I think I’ll wrap this up as installment one of an on-going thread of Q&A on HP Oracle Exadata Storage Server and the HP Oracle Database Machine.

Don’t forget to read Ron Weiss’ Oracle Exadata Storage Server Technical Product Whitepaper. Ron is a good guy and it is a very informative piece. Consider it required reading-especially if you are trolling my site in the role of competitive technical marketing. <smiley>

Oracle Exadata Storage Server. Part I.

Brute Force with Brains.
Here is a brief overview of the Oracle Exadata Storage Server key performance attributes:

  • Intelligent Storage. Ship less data due to query intelligence in the storage.
  • Bigger Pipes. Infiniband with Remote Direct Memory Access. 5x Faster than Fibre Channel.
  • More Pipes. Scalable, redundant I/O Fabric.

Yes, it’s called Oracle Exadata Storage Server and it really was worth the wait. I know it is going to take a while for the message to settle in, but I would like to take my first blog post on the topic of Oracle Exadata Storage Server to reiterate the primary value propositions of the solution.

  • Exadata is fully optimized disk I/O. Full stop! For far too long, it has been too difficult to configure ample I/O bandwidth for Oracle, and far too difficult to configure storage so that the physical disk accesses are sequential.
  • Exadata is intelligent storage. For far too long, Oracle Database has had to ingest full blocks of data from disk for query processing, wasting precious host processor cycles to discard the uninteresting data (predicate filtering and column projection).

Oracle Exadata Storage Server is Brute Force. A Brawny solution.
A single rack of the HP Oracle Database Machine (based on Oracle Exadata Storage Server Software) is configured with 14 Oracle Exadata Storage Server “Cells” each with 12 3.5″ hard drives for a total of 168 disks. There are 300GB SAS and 1TB SATA options. The database tier of the single-rack HP Oracle Database Machine consists of 8 Proliant DL360 servers with 2 Xeon 54XX quad-core processors and 32 GB RAM running Oracle Real Application Clusters (RAC). The RAC nodes are interconnected with Infiniband using the very lightweight Reliable Datagram Sockets (RDS) protocol. RDS over Infiniband is also the I/O fabric between the RAC nodes and the Storage Cells. With the SAS storage option, the HP Oracle Database Machine offers roughly 1 terabyte of optimal user addressable space per Storage Cell-14 TB total.

Sequential I/O
Exadata I/O is a blend of random seeks followed by a series of large transfer requests so scanning disk at rates of nearly 85 MB/s per disk drive (1000 MB/s per Storage Cell) is easily achieved. With 14 Exadata Storage Cells, the data-scanning rate is 14 GB/s. Yes, roughly 80 seconds to scan a terabyte-and that is with the base HP Oracle Database Machine configuration. Oracle Exadata Storage Software offers these scan rates on both tables and indexes and partitioning is, of course, fully supported-as is compression.

Comparison to “Old School”
Let me put Oracle Exadata Storage Server performance into perspective by drawing a comparison to Fibre Channel SAN technology. The building block of all native Fibre Channel SAN arrays is the Fibre Channel Arbitrated Loop (FCAL) to which the disk drives are connected. Some arrays support as few as 2 of these “back-end” loops, larger arrays support as many as 64. Most, if not all, current SAN arrays support 4 Gb FCAL back-end loops which are limited to no more than 400MB/s of read bandwidth. The drives connected to the loops have front-end Fibre Channel electronics and-forgetting FC-SATA drives for a moment-the drives themselves are fundamentally the same as SAS drives-given the same capacity and rotational speed. It turns out that SAS and Fibre drives, of the 300GB 15K RPM variety, perform pretty much the same for large sequential I/O. Given the bandwidth of the drives, the task of building a SAN-based system that isn’t loop-bottlenecked requires limiting the number of drives per loop to 5 (or 10 for mirroring overhead). So, to match a single rack configuration of the HP Oracle Database Machine with a SAN solution would require about 35 back-end drive loops! All of this math boils down to one thing: a very, very large high-end SAN array.

Choices, Choices: Either the Largest SAN Array or the Smallest HP Oracle Database Machine
Only the largest of the high-end SAN arrays can match the base HP Oracle Database Machine I/O bandwidth. And this is provided the SAN array processors can actually pass through all the I/O generated from a full complement of back-end FCAL loops. Generally speaking, they just don’t have enough array processor bandwidth to do so.

Comparison to the “New Guys on the Block”
Well, they aren’t really that new. I’m talking about Netezza. Their smallest full rack has 112 Snippet Processing Units (SPU) each with a single SATA disk drive-and onboard processor and FPGA components-for a total user addressable space of 12.5 TB. If the data streamed off the SATA drives at, say, 70 MB/s, the solution offers 7.8 GB/s-42% slower than a single-rack HP Oracle Database Machine.

Big, Efficient Pipes
Oracle Exadata Storage Server delivers I/O results directly into the address space of the Oracle Database Parallel Query Option processes using the Reliable Datagram Sockets (RDS) protocol over Infiniband. As such, each of the Oracle Real Application Clusters nodes are able to ingest a little over a gigabyte of streaming data per second at a CPU cost of less than 5%, which is less than the typical cost of interfacing with Fibre Channel host-bus adaptors via traditional Unix/Linux I/O calls. With Oracle Exadata Storage Server, the Oracle Database host processing power is neither wasted on filtering out uninteresting data, nor plucking out columns from the rows. There would, of course, be no need to project in a colum-oriented database but Oracle Database is still row-oriented.

Oracle Exadata Storage Server is Intelligence Storage. Brainy Software.
Oracle Exadata Storage Server truly is an optimized way to stream data to Oracle Database. However, none of the traditional Oracle Database features (e.g., partitioning, indexing, compression, Backup/Restore, Disaster Protection, etc) are lost when deploying Exadata. Combining data elimination (via partitioning) with compression further exploits the core architectural strengths of Exadata. But what about this intelligence? Well, as we all know, queries don’t join all the columns and few queries ever run without a WHERE predicate for filtration. With Exadata that intelligence is offloaded to storage. Exadata Storage Cells execute intelligent software that understands how to perform filtration as well as column projection. For instance, consider a query that cites 2 columns nestled in the middle of a 100-column  row and the WHERE predicate filters out 50% of the rows. With Exadata, that is exactly what is returned to the Oracle Parallel Query processes.

By this time it should start to make sense why I have blogged in the past the way I do about SAN technology, such as this post about SAN disk/array bottlenecking.  Configuring a high-bandwidth SAN requires a lot of care.

Yes, this is a very short, technically-light blog entry about Oracle Exadata Storage Server, but this is day one. I didn’t touch on any of the other really exciting things Exadata does in the areas of I/O Resource Management, offloaded online backup and offloaded join filters, but I will.


I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 747 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories


All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: