Oracle Exadata Storage Server | Kevin Closson's Blog: Platforms, Databases and Storage

Posts Tagged 'Oracle Exadata Storage Server'

SLOB Physical I/O Randomness. How Random Is Random? Random!

I recently read a blog post by Kyle Hailey regarding some lack of randomness he detected in the Orion I/O generator tool. Feel free to read Kyle’s post but in short he used dtrace to detect Orion was obliterating a very dense subset of the 96GB file Orion was accessing.

I’ve used Orion for many years and, in fact, wrote my first Orion related blog entry about 8 years ago. I find Orion to be useful for some things and of course DBAs must use Orion’s cousin CALIBRATE_IO as a part of their job. However, neither of these tools perform database I/O. If you want to see how database I/O behaves on a platform it’s best to use a database. So, SLOB it is. But wait! Is SLOB is just another storage cache-poking randomness-challenged distraction from your day job? No, it isn’t.

But SLOB Is So Very Difficult To Use

It’s quite simple actually. You can see how simple SLOB is to set up and test by visiting my picture tutorial.

How Random Is Random? Random!

SLOB is utterly random. However, there are some tips I’d like to offer in this post to show you how you can choose even higher levels of randomness in your I/O testing.

Kyle used dtrace and some shell commands to group block visits into buckets. Since I’m analyzing the randomness of SLOB I’ll use a 10046 trace on the database sessions. First I’ll run a 96 user SLOB test with slob.conf->UPDATE_PCT=0.

After the SLOB test was completed I scrambled around to find the trace files and worked out a simple set of sed(1) expressions to spit out the block numbers being visited by each I/O of type db file sequential read:

I then grouped the blocks being visited into buckets much the same way Kyle did in his post:

I’ll show some analysis of the those buckets later in the post. Yes, SLOB is random as analysis of 96u.blocks.txt will show but it can be even more random if one configures a RECYCLE buffer pool. One of the lesser advertised features of SLOB is the fact that all working tables in the SLOB schemas are created with BUFFER_POOL RECYCLE in the storage clause. The idea behind this is to support the caching of index blocks in the SGA buffer pool. When no RECYCLE pool is allocated there is a battle for footprint in the SGA buffer pool causing even buffers with index blocks to be reused for buffering table/index blocks of the active transactions. Naturally when indexes are not cached there will be slight hot-spots for constant, physical, re-reads of the index blocks. The question becomes what percentage of the I/O do these hot blocks account for?

To determine how hot index blocks are I allocated a recycle buffer pool and ran another 2 minute SLOB test. As per the following screen shot I again grouped block visits into buckets:

After having both SLOB results (with and without RECYCLE buffer pool) I performed a bit of text processing to determine how different the access patterns were in both scenarios. The following shows:

The vast majority of blocks are visited 10 or less times in both models
The RECYCLE pool model clearly flattens out the re-visit rates as the hotest block is visited only 12 times compared to the 112 visits for the hottest block in the default case
If 12 is the golden standard for sparsity (as per the RECYCLE pool test case) then even the default is quite sparse because dense buckets accounted for only 84,583 physical reads compared to the nearly 14 million reads of blocks in the sparse buckets

The following table presents the data including the total I/O operations traced. The number of sparse visits are those blocks that were accessed less than or equal to 10 times during the SLOB test. I should point out that there will naturally be more I/O during a SLOB test when index accesses are forced physical as is the case with the default buffer pools. That is, the RECYCLE buffer pool case will have a slightly higher logical I/O rate (cache hits) due to index buffer accesses.

Summary

If you want to know how database I/O performs on a platform use a database. If using a database to test I/O on a platform then by all means drive it with SLOB ( a database I/O tool).

Regarding randomness, even in the default case SLOB proves itself to be very random. If you want to push for even more randomness then the way forward is to configure db_recycle_cache_size.

Enjoy SLOB! The best place to start with SLOB is the SLOB Resources Page.

Exadata Database Machine X2-2 or X2-8? Sure! Why Not? Part II.

Published October 4, 2010 Exadata , Exadata Compression and Encryption , Exadata Database Machine , Exadata Database Machine X2-8 HP Full Rack , Exadata Encryption , Exadata X2-8 , Nehalem EX , oracle , Oracle Intel Integrated Performance Primitives , Westmere EP 8 Comments
Tags: Exadata, Oracle Exadata Storage Server, Oracle Exadata Storage Server Software, X2-2 vs X2-8

In my recent post entitled Exadata Database Machine X2-2 or X2-8? Sure! Why Not? Part I, I started to address the many questions folks are sending my way about what factors to consider when choosing between Exadata Database Machine X2-8 versus Exadata Database Machine X2-2. This post continues that thread.

As my friend Greg Rahn points out in his recent post about Exadata, the latest Exadata Storage Server is based on Intel Xeon 5600 (Westmere EP) processors. The Exadata Storage Server is the same whether the database grid is X2-2 or X2-8. The X2-2 database hosts are also based on Intel Xeon 5600. On the other hand, the X2-8 database hosts are based on Intel Xeon 7500 (Nehalem EX). This is a relevant distinction when thinking about database encryption.

Transparent Database Encryption

In his recent post, Greg brings up the topic of Oracle Database Transparent Data Encryption (TDE). As Greg points out, the new Exadata Storage Server software is able to leverage Intel Advanced Encryption Standard New Instructions (Intel AES-NI) found in the Intel Integrated Performance Primitives (Intel IPP) library because the processors in the storage servers are Intel Xeon 5600 (Westmere EP). Think of this as “hardware-assist.” However, in the case of the database hosts in the X2-8, there is no hardware-assist for TDE as Nehalem EX does not offer support for the necessary instructions. Westmere EX will—someday. So what does this mean?

TDE and Compression? Unlikely Cousins?

At first glance one would think there is nothing in common between TDE and compression. However, in an Exadata environment there is storage offload processing and for that reason roles are important to understand. That is, understanding what gets done is sometimes not as important as who is doing what.

When I speak to people about Exadata I tend to draw the mental picture of an “upper” and “lower” half. While the count of servers in each grid is not split 50/50 by any means, thinking about Exadata in this manner makes understanding certain features a lot simpler. Allow me to explain.

Compression

In the case of compressing data, all work is done by the upper half (the database grid). On the other hand, decompression effort takes place in either the upper or lower half depending on certain criteria.

Upper Half Compression. Always.
Lower Half Compression. Never
Lower Half Decompression. Data compressed with Hybrid Columnar Compression (HCC) is decompressed in the Exadata Storage Servers when accessed via Smart Scan. Visit my post about what triggers a Smart Scan for more information.
Upper Half Decompression. With all compression types, other than HCC, decompression effort takes place in the upper half. When accessed without Smart Scan, HCC data is also decompressed in the upper half.

Encryption

In the case of encryption, the upper/lower half breakout is as follows:

Upper Half Encryption. Always. Data is always encrypted by code executing in the database grid. If the processors are Intel Xeon 5600 (Westmere EP), as is the case with X2-2, there is hardware assist via the IPP library. The X2-8 is built on Nehalem EX and therefore does not offer hardware-assist encryption.
Lower Half Encryption. Never.
Lower Half Decryption. Smart Scan only. If data is not being accessed via Smart Scan the blocks are returned to the database host and buffered in the SGA (see the Seven Fundamentals). Both the X2-2 and X2-8 are attached to Westmere EP-based storage servers. To that end, both of these configurations benefit from hardware-assist decryption via the IPP libarary. I reiterate, however, that this hardware-assist lower-half decryption only occurs during Smart Scan.
Upper Half Decryption. Always in the case of data accessed without Smart Scan. In the case of X2-2, this upper-half decryption benefits from hardware-assist via the IPP library.

That pretty much covers it and now we see commonality between compression and encryption. The commonality is mostly related to whether or not a query is being serviced via Smart Scan.

That’s Not All

If HCC data is also stored in encrypted form, a Smart Scan is able to filter out vast amount of encrypted data without even touching it. That is, HCC short-circuits a lot of decryption cost. And, even though Exadata is really fast, it is always faster to not do something at all than to shift into high gear and do it as fast as possible.

Oracle Exadata Storage Server Version 1. A “FAQ” is Born. Part I.

Published September 26, 2008 I/O Topics , Oracle I/O Performance , Oracle11g , Real Application Clusters 16 Comments
Tags: Exadata, HP Oracle Database Machine, Netezza, Oracle Exadata Storage Server, Oracle Programmable Storage Server

BLOG UPDATE (22-MAR-10): Readers, please be aware that this blog entry is about the HP Oracle Database Machine (V1).

BLOG UPDATE (01-JUN-09). According to my blog statistics, a good number of new readers find my blog by being referred to this page by google. I’d like to draw new readers’ attention to the sidebar at the right where there are pages dedicated to indexing my Exadata-related posts. The original blog post follows:

I expected Oracle Exadata Storage Server to make an instant splash, but the blogosphere has really taken off like a rocket with the topic. Unfortunately there is already quite a bit of misinformation out there. I’d like to approach this with routine quasi-frequently asked question posts. When I find misinformation, I’ll make a blog update. So consider this installment number one.

Q. What does the word programmable mean in the product name Exadata Programmable Storage Server?

A. I don’t know, but it certainly has nothing to do with Oracle Exadata Storage Server. I have seen this moniker misapplied to the product. An Exadata Storage Server “Cell”-as we call them-is no more programmable than a Fibre Channel SAN or NAS Filer. Well, it is of course to the Exadata product development organization, but there is nothing programmable for the field. I think, perhaps, someone may have thought that Exadata is a field programmable gate array (FPGA) approach to solving the problem of offloading query intelligence to storage. Exadata is not field-“programmable” and it doesn’t use or need FPGA technology.

Q. How can Exadata be so powerful if there is only a single 1gb path from the storage cells to the switch?

A. I saw this on a blog post today and it is an incorrect assertion. In fact, I saw a blogger state, “1gb/s???? that’s not that good.” I couldn’t agree more. This is just a common notation blunder. There is, in fact, 20 Gb bandwidth between each Cell and each host in the database grid, which is close to 2 gigabytes of bandwidth (maximum theoretical 1850MB/s due to the IB cards though). I should point out that none of the physical plumbing is “secret-sauce.” Exadata leverages commodity components and open standards (e.g., OFED ).

Q. How does Exadata change the SGA caching dynamic?

A. It doesn’t. Everything that is cached today in the SGA will still be cached. Most Exadata reads are buffered in the PGA since the plan is generally a full scan. That is not to say that there is no Exadata value for indexes, because there can be. Exadata scans indexes and tables with the same I/O dynamic.

Q. This Exadata stuff must be based on NAND FLASH Solid State Disk

A. No, it isn’t and I won’t talk about futures. Exadata doesn’t really need Solid State Disk. Let’s think this one through. Large sequential read and write speed is about the same on FLASH SSD as rotating media, but random I/O is very fast. 12 Hard Disk Drives can saturate the I/O controller so plugging SSD in where the 3.5″ HDDs are would be a waste.

Q. Why mention sequential disk I/O performance since sequential accesses will only occur in rare circumstances (e.g., non-concurrent scans).

A. Yes, and the question is what? No, honestly. I’ll touch on this. Of course concurrent queries attacking the same physical disks will introduce seek times and rotational delays. And the “competition” can somehow magically scan different table extents on the same disks without causing the same drive dynamic? Of course not. If Exadata is servicing concurrent queries that attack different regions of the same drives then, yes, by all means there will be seeks. Those seek, by the way, are followed by 4 sequential 1MB I/O operations so the seek time is essentailly amortized out.

Q. Is Exadata I/O really sequential, ever?

A. I get this one a lot and it generally comes from folks that know Automatic Storage Management (ASM). Exadata leverages ASM normal redundancy mirroring which mirrors and stripes the data. Oh my, doesn’t that entail textbook random I/O? No, not really. ASM will “fill” a disk from the “outside-in. ” This does not create a totally random I/O pattern since this placement doesn’t randomize from the outer edge of the platters to the spindle and back. In general, the “next” read on any given disk involved in a scan will be at a greater offset in the physical device and not that “far” from the previous sectors read. This does not create the pathological seek times that would be associated with a true random I/O profile.

When Exadata is scanning a disk that is part of an ASM normal redundancy disk group and needs to “advance forward” to get the next portion of the table, Exadata directs the drive mechanics to position at the specific offset where it will read an ASM allocation unit of data, and on and on it goes. Head movements of this variety are considered “short seeks.” I know what the competition says about this topic in their positioning papers. Misinformation will be propagated.

Let me see if I can handle this topic in a different manner. If HP Oracle Exadata Storage Server was a totally random I/O train wreck then it wouldn’t likely be able to drive all the disks in the system at ~85MB/s. In the end, I personally think the demonstrated throughput is more interesting than an academic argument one might stumble upon in an anti-Exadata positioning paper.

Well, I think I’ll wrap this up as installment one of an on-going thread of Q&A on HP Oracle Exadata Storage Server and the HP Oracle Database Machine.

Don’t forget to read Ron Weiss’ Oracle Exadata Storage Server Technical Product Whitepaper. Ron is a good guy and it is a very informative piece. Consider it required reading-especially if you are trolling my site in the role of competitive technical marketing. <smiley>

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage

Posts Tagged 'Oracle Exadata Storage Server'