Archive for the 'Oracle11g' Category

SLOB Physical I/O Randomness. How Random Is Random? Random!

I recently read a blog post by Kyle Hailey regarding some lack of randomness he detected in the Orion I/O generator tool. Feel free to read Kyle’s post but in short he used dtrace to detect Orion was obliterating a very dense subset of the 96GB file Orion was accessing.

I’ve used Orion for many years and, in fact, wrote my first Orion related blog entry about 8 years ago. I find Orion to be useful for some things and of course DBAs must use Orion’s cousin CALIBRATE_IO as a part of their job. However, neither of these tools perform database I/O. If you want to see how database I/O behaves on a platform it’s best to use a database. So, SLOB it is. But wait! Is SLOB is just another storage cache-poking randomness-challenged distraction from your day job? No, it isn’t.

But SLOB Is So Very Difficult To Use

It’s quite simple actually. You can see how simple SLOB is to set up and test by visiting my picture tutorial.

How Random Is Random? Random!

SLOB is utterly random. However, there are some tips I’d like to offer in this post to show you how you can choose even higher levels of randomness in your I/O testing.

Kyle used dtrace and some shell commands to group block visits into buckets. Since I’m analyzing the randomness of SLOB I’ll use a 10046 trace on the database sessions. First I’ll run a 96 user SLOB test with slob.conf->UPDATE_PCT=0.

After the SLOB test was completed I scrambled around to find the trace files and worked out a simple set of sed(1) expressions to spit out the block numbers being visited by each I/O of type db file sequential read:


I then grouped the blocks being visited into buckets much the same way Kyle did in his post:


I’ll show some analysis of the those buckets later in the post.  Yes, SLOB is random as analysis of 96u.blocks.txt will show but it can be even more random if one configures a RECYCLE buffer pool. One of the lesser advertised features of SLOB is the fact that all working tables in the SLOB schemas are created with BUFFER_POOL RECYCLE in the storage clause. The idea behind this is to support the caching of index blocks in the SGA buffer pool. When no RECYCLE pool is allocated there is a battle for footprint in the SGA buffer pool causing even buffers with index blocks to be reused for buffering table/index blocks of the active transactions. Naturally when indexes are not cached there will be slight hot-spots for constant, physical, re-reads of the index blocks. The question becomes what percentage of the I/O do these hot blocks account for?

To determine how hot index blocks are I allocated a recycle buffer pool and ran another 2 minute SLOB test. As per the following screen shot I again grouped block visits into buckets:


After having both SLOB results (with and without RECYCLE buffer pool) I performed a bit of text processing to determine how different the access patterns were in both scenarios. The following shows:

  • The vast majority of blocks are visited 10 or less times in both models
  • The RECYCLE pool model clearly flattens out the re-visit rates as the hotest block is visited only 12 times compared to the 112 visits for the hottest block in the default case
  • If 12 is the golden standard for sparsity (as per the RECYCLE pool test case) then even the default is quite sparse because dense buckets accounted for only 84,583 physical reads compared to the nearly 14 million reads of blocks in the sparse buckets


The following table presents the data including the total I/O operations traced. The number of sparse visits are those blocks that were accessed less than or equal to 10 times during the SLOB test. I should point out that there will naturally be more I/O during a SLOB test when index accesses are forced physical as is the case with the default buffer pools. That is, the RECYCLE buffer pool case will have a slightly higher logical I/O rate (cache hits) due to index buffer accesses.



If you want to know how database I/O performs on a platform use a database. If using a database to test I/O on a platform then by all means drive it with SLOB ( a database I/O tool).

Regarding randomness, even in the default case SLOB proves itself to be very random. If you want to push for even more randomness then the way forward is to configure db_recycle_cache_size.

Enjoy SLOB! The best place to start with SLOB is the SLOB Resources Page.







Oracle’s Timeline, Copious Benchmarks And Internal Deployments Prove Exadata Is The Worlds First (Best?) OLTP Machine – Part 1.5

In Part I of this series about Oracle OLTP/ERP on Exadata versus non-Exadata, I took a moment to point out the inaccuracies of a particular piece of Oracle marketing literature. In a piece aimed at chronicling Oracle Corporation history, the marketing department went way out of line by making the following claim regarding Exadata:

[…] wins benchmarks against key competitors [..]

Please don’t get me wrong, those five words appearing in any random sentence wouldn’t pose any sort of  a problem. However, situated as they are in this particular sentence does create a  problem because the statement is utterly false.  Exadata has not won a single benchmark against any competitor–“key” or otherwise.

Along For The Ride
In Part I of this series I pointed out the fact HP Oracle Exadata Storage Server cells (a.k.a., V1 Exadata) were used in this June 2009 HP BladeSystem 1-TB Scale TPC-H. However, merely involving Exadata hardware can hardly support Oracle’s marketing claim vis a vis winning benchmarks against key competitors.

There is a big difference between being involved in a benchmark and being the technology that contributes to the result.

I made it clear, in Part I, that Exadata storage was used in that 2009 HP TPC-H result but none of the Exadata features contributed to the result. I clarified that assertion by pointing out that the particular benchmark in question was an In-memory Parallel Query result. Since the result establish Oracle database performance achieved through in-memory database processing I didn’t feel compelled to shore up my assertion. I didn’t think anyone would be confused over the fact that in-memory database processing is not improved by storage technology.

I was wrong.

In the comment section of Part I a comment by a blog reader took offense at my audacious claim. Indeed, how could I assert that storage is not a relevant component in achieving good in-memory database processing benchmark results. The reader stated:

You give reference to a TPHC that used Exadata and then say no Exadata features were used. [..] You obviously don’t know what you are talking about

Having seen that I began to suspect there may be other readers confused on the matter so I let the comment through moderation and decided to address the confusion it in this post.

So now it’s time to address the reader’s comment. If Exadata is used in a benchmark, but Exadata Storage Server offload processing is disabled, would one consider that an Exadata benchmark or was Exadata merely along for the ride?

Here is a screenshot of the full disclosure report that shows Exadata storage intelligence (offload processing) features were disabled. For this reason I assert that Exadata has never won a benchmark against “competitors”, neither “key” nor otherwise.

The screenshot:

Oracle Exadata Storage Server Version 1. A “FAQ” is Born. Part I.

BLOG UPDATE (22-MAR-10): Readers, please be aware that this blog entry is about the HP Oracle Database Machine (V1).

BLOG UPDATE (01-JUN-09). According to my blog statistics, a good number of new readers find my blog by being referred to this page by google. I’d like to draw new readers’ attention to the sidebar at the right where there are pages dedicated to indexing my Exadata-related posts.  The original blog post follows:

I expected Oracle Exadata Storage Server to make an instant splash, but the blogosphere has really taken off like a rocket with the topic. Unfortunately there is already quite a bit of misinformation out there. I’d like to approach this with routine quasi-frequently asked question posts. When I find misinformation, I’ll make a blog update. So consider this installment number one.

Q. What does the word programmable mean in the product name Exadata Programmable Storage Server?

A. I don’t know, but it certainly has nothing to do with Oracle Exadata Storage Server. I have seen this moniker misapplied to the product. An Exadata Storage Server “Cell”-as we call them-is no more programmable than a Fibre Channel SAN or NAS Filer. Well, it is of course to the Exadata product development organization, but there is nothing programmable for the field. I think, perhaps, someone may have thought that Exadata is a field programmable gate array (FPGA) approach to solving the problem of offloading query intelligence to storage. Exadata is not field-“programmable” and it doesn’t use or need FPGA technology.

Q. How can Exadata be so powerful if there is only a single 1gb path from the storage cells to the switch?

A. I saw this on a blog post today and it is an incorrect assertion. In fact, I saw a blogger state, “1gb/s???? that’s not that good.” I couldn’t agree more. This is just a common notation blunder. There is, in fact, 20 Gb bandwidth between each Cell and each host in the database grid, which is close to 2 gigabytes of bandwidth (maximum theoretical 1850MB/s due to the IB cards though). I should point out that none of the physical plumbing is “secret-sauce.” Exadata leverages commodity components and open standards (e.g., OFED ).

Q. How does Exadata change the SGA caching dynamic?

A. It doesn’t. Everything that is cached today in the SGA will still be cached. Most Exadata reads are buffered in the PGA since the plan is generally a full scan. That is not to say that there is no Exadata value for indexes, because there can be. Exadata scans indexes and tables with the same I/O dynamic.

Q. This Exadata stuff must be based on NAND FLASH Solid State Disk

A. No, it isn’t and I won’t talk about futures. Exadata doesn’t really need Solid State Disk. Let’s think this one through. Large sequential read and write  speed is about the same on FLASH SSD as rotating media, but random I/O is very fast. 12 Hard Disk Drives can saturate the I/O controller so plugging SSD in where the 3.5″ HDDs are would be a waste.

Q. Why mention sequential disk I/O performance since sequential accesses will only occur in rare circumstances (e.g., non-concurrent scans).

A. Yes, and the question is what? No, honestly. I’ll touch on this. Of course concurrent queries attacking the same physical disks will introduce seek times and rotational delays. And the “competition” can somehow magically scan different table extents on the same disks without causing the same drive dynamic? Of course not. If Exadata is servicing concurrent queries that attack different regions of the same drives then, yes, by all means there will be seeks. Those seek, by the way, are followed by 4 sequential 1MB I/O operations so the seek time is essentailly amortized out.

Q. Is Exadata I/O really sequential, ever?

A. I get this one a lot and it generally comes from folks that know Automatic Storage Management (ASM). Exadata leverages ASM normal redundancy mirroring which mirrors and stripes the data. Oh my, doesn’t that entail textbook random I/O? No, not really. ASM will “fill” a disk from the “outside-in. ” This does not create a totally random I/O pattern since this placement doesn’t randomize from the outer edge of the platters to the spindle and back. In general, the “next” read on any given disk involved in a scan will be at a greater offset in the physical device and not that “far” from the previous sectors read. This does not create the pathological seek times that would be associated with a true random I/O profile.

When Exadata is scanning a disk that is part of an ASM normal redundancy disk group and needs to “advance forward” to get the next portion of the table, Exadata directs the drive mechanics to position at the specific offset where it will read an ASM allocation unit of data, and on and on it goes. Head movements of this variety are considered “short seeks.” I know what the competition says about this topic in their positioning papers. Misinformation will be propagated.

Let me see if I can handle this topic in a different manner. If HP Oracle Exadata Storage Server was a totally random I/O train wreck then it wouldn’t likely be able to drive all the disks in the system at ~85MB/s. In the end, I personally think the demonstrated throughput is more interesting than an academic argument one might stumble upon in an anti-Exadata positioning paper.

Well, I think I’ll wrap this up as installment one of an on-going thread of Q&A on HP Oracle Exadata Storage Server and the HP Oracle Database Machine.

Don’t forget to read Ron Weiss’ Oracle Exadata Storage Server Technical Product Whitepaper. Ron is a good guy and it is a very informative piece. Consider it required reading-especially if you are trolling my site in the role of competitive technical marketing. <smiley>

Oracle11g Automatic Memory Management Part III. Automatically Automatic?

Oracle Database 11g Automatic Memory Management is not automatic. Let me explain. The theory (my interpretation) behind such features as AMM is that omission of the relevant initialization parameters for the feature constitutes an implied disabling of the feature. I’m sure many of you are going to think I’m stupid for not knowing this, and indeed it is likely documented in bold 14 pitch font somewhere, but unless you set MEMORY_TARGET you don’t get AMM. I sort of presumed it would be the other way around. Here is a simple example.

I have a minimal init.ora and am running catalog.sql and catproc.sql only to hit an ORA-04031. Here is the init.ora:


db_block_size = 8192
db_files = 300
processes = 100

db_name = test

And, here is the ORA-04031:

SQL> grant select on ku$_fhtable_view to public
  2  /
grant select on ku$_fhtable_view to public
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 1044 bytes of shared memory ("shared
pool","select value$ from sys.props...","Typecheck","kggfaDoKghAlloc:1")

And here is all I had to add to the init.ora to fix it:


db_block_size = 8192
db_files = 300
processes = 100

db_name = test 


As I pointed out in my blog entry entitled Oracle11g  Automatic Memory Management Part II – Automatically Stupid?, I have been pleasantly surprised by AMM in 11g. I suppose this simple catalog.sql/catproc.sql example is another-albeit very simplistic-example.

Databases are the Contents of Storage. Future Oracle DBAs Can Administer More. Why Would They Want To?

I’ve taken the following quote from this Oracle whitepaper about low cost storage:

A Database Storage Grid does not depend on flawless execution from its component storage arrays. Instead, it is designed to tolerate the failure of individual storage arrays.

In spite of the fact that the Resilient Low-Cost Storage Initiative program was decommissioned along with the Oracle Storage Compatability Program, the concepts discussed in that paper should be treated as a barometer of the future of storage for Oracle databases-with two exceptions: 1) Fibre Channel is not the future and 2) there’s more to “the database” than just the database. What do I mean by point 2? Well, with features like SecureFiles, we aren’t just talking rows and columns any more and I doubt (but I don’t know) that SecureFiles is the end of that trend.

Future Oracle DBAs
Oracle DBAs of the future become even more critical to the enterprise since the current “stove-pipe” style IT organization will invariably change. In today’s IT shop, the application team talks to the DBA team who talks to the Sys Admin team who tlks to the Storage Admin team. All this to get an application to store data on disk through a Oracle database. I think that will be the model that remains for lightly-featured products like MySQL and SQL Server, but Oracle aims for more. Yes, I’m only whetting your appetite but I will flesh out this topic over time. Here’s food for thought: Oracle DBAs should stop thinking their role in the model stops at the contents of the storage.

So while Chen Shapira may be worried that DBAs will get obviated, I’d predict instead that Oracle technology will become more full-featured at the storage level. Unlike the stock market where past performance is no indicator of future performance, Oracle has consistently brought to market features that were once considered too “low-level” to be in the domain of a Database vendor.

The IT industry is going through consolidation. I think we’ll see Enterprise-level IT roles go through some consolidation over time as well. DBAs who can wear more than “one hat” will be more valuable to the enterprise. Instead of thinking about “encroachment” from the low-end database products, think about your increased value proposition with Oracle features that enable this consolidation of IT roles-that is, if I’m reading the tea leaves correctly.

How to Win Friends and Influence People
Believe me, my positions on Fibre Channel have prompted some fairly vile emails in my inbox-especially the posts in my Manly Man SAN series. Folks, I don’t “have it out”, as they say, for the role of Storage Administrators. I just believe that the Oracle DBAs of today are on the cusp of being in control of more of the stack. Like I said, it seems today’s DBA responsibilities stop at the contents of the storage-a role that fits the Fibre Channel paradigm quite well, but a role that makes little sense to me. I think Oracle DBAs are capable of more and will have more success when they have more control. Having said that, I encourage any of you DBAs who would love to be in more control of the storage to look at my my post about the recent SAN-free Oracle Data Warehouse. Read that post and give considerable thought to the model it discusses. And give even more consideration to the cost savings it yields.

The Voices in My Head
Now my alter ego (who is a DBA, whereas I’m not) is asking, “Why would I want more control at the storage level?” I’ll try to answer him in blog posts, but perhaps some of you DBAs can share experiences where performance or availability problems were further exacerbated by finger pointing between you and the Storage Administration group.

Note to Storage Administrators
Please, please, do not fill my email box with vitriolic messages about the harmony today’s typical stove-pipe IT organization creates. I’m not here to start battles.

Let me share a thought that might help this whole thread make more sense. Let’s recall the days when an Oracle DBA and a System Administrator together (yet alone) were able to provide Oracle Database connectivity and processing for thousands of users without ever talking to a “Storage Group.” Do you folks remember when that was? I do. It was the days of Direct Attach Storage (DAS). The problem with that model was that it only took until about the late 1990s to run out of connectivity-enter the Fibre Channel SAN. And since SANs are spokes attached to hubs of storage systems (SAN arrays), we wound up with a level of indirection between the Oracle server and its blocks on disk. Perhaps there are still some power DBAs that remember how life was with large numbers of DAS drives (hundreds). Perhaps they’ll recall the level of control they had back then. On the other hand, perhaps I’m going insane, but riddle me this (and feel free to quote me elsewhere):

Why is it that the industry needed SANs to get more than a few hundred disks attached to a high-end Oracle system in the late 1990s and yet today’s Oracle databases often reside on LUNs comprised of a handful of drives in a SAN?

The very thought of that twist of fate makes me feel like a fish flopping around on a hot sidewalk. Do you remember my post about capacity versus spindles? Oh, right, SAN cache makes that all better. Uh huh.

Am I saying the future is DAS? No. Can I tell you now exactly what model I’m alluding to? Not yet, but I enjoy putting out a little food for thought.

Oracle11g Automatic Memory Management – Part III. A NUMA Issue.

Now I’m glad I did that series about Oracle on Linux, The NUMA Angle. In my post about the the difference between NUMA and SUMA and “Cyclops”, I shared a lot of information about the dynamics of Oracle running with all the SGA allocated from one memory bank on a NUMA system. Déjà vu.

Well, we’re at it again. As I point out in Part I and Part II of this series, Oracle implements Automatic Memory Management in Oracle Database 11g with memory mapped files in /dev/shm. That got me curious.

Since I exclusively install my Oracle bits on NFS mounts, I thought I’d sling my 11g ORACLE_HOME over to a DL385 I have available in my lab setup. Oh boy am I going to miss that lab when I take on my new job September 4th. Sob, sob. See, when you install Oracle on NFS mounts, the installation is portable. I install 32bit Linux ports via 32bit server into an NFS mount and I can take it anywhere. In fact, since the database is on an NFS mount (HP EFS Clustered Gateway NAS) I can take ORACLE_HOME and the database mounts to any system with a RHEL4 OS running-and that includes RHEL4 x86_64 servers even though the ORACLE_HOME is 32bit. That works fine, except 32bit Oracle cannot use libaio on 64bit RHEL4 (unless you invokde everything under the linux32 command environment that is). I don’t care about that since I use either Oracle Disk Manager or, better yet, Oracle11g Direct NFS. Note, running 32bit Oracle on a 64bit Linux OS is not supported for production, but for my case it helps me check certain things out. That brings us back to /dev/shm on AMD Opteron (NUMA) systems. It turns out the only Opteron system I could test 11g AMM on happens to have x86_64 RHEL4 installed-but, again, no matter.

Quick Test

[root@tmr6s5 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 5119 MB
node 0 free: 3585 MB
node 1 size: 4095 MB
node 1 free: 3955 MB
[root@tmr6s5 ~]# dd if=/dev/zero of=/dev/shm/foo bs=1024k count=1024
1024+0 records in
1024+0 records out
[root@tmr6s5 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 5119 MB
node 0 free: 3585 MB
node 1 size: 4095 MB
node 1 free: 2927 MB

Uh, that’s not good. I dumped some zeros into a file on /dev/shm and all the memory was allocated from socket 1. Lest anyone forget from my NUMA series (you did read that didn’t you?), writing memory not connected to your processor is, uh, slower:

[root@tmr6s5 ~]# taskset -pc 0-1 $$
pid 9453's current affinity list: 0,1
pid 9453's new affinity list: 0,1
[root@tmr6s5 ~]# time dd if=/dev/zero of=/dev/shm/foo bs=1024k count=1024 conv=notrunc
1024+0 records in
1024+0 records out

real    0m1.116s
user    0m0.005s
sys     0m1.111s
[root@tmr6s5 ~]# taskset -pc 1-2 $$
pid 9453's current affinity list: 0,1
pid 9453's new affinity list: 1
[root@tmr6s5 ~]# time dd if=/dev/zero of=/dev/shm/foo bs=1024k count=1024 conv=notrunc
1024+0 records in
1024+0 records out

real    0m0.931s
user    0m0.006s
sys     0m0.923s

Yes, 20% slower.

What About Oracle?
So, like I said, I mounted that ORACLE_HOME on this Opteron server. What does an AMM instance look like? Here goes:

SQL> !numactl --hardware
available: 2 nodes (0-1)
node 0 size: 5119 MB
node 0 free: 3587 MB
node 1 size: 4095 MB
node 1 free: 3956 MB
SQL> startup pfile=./amm.ora
ORACLE instance started.

Total System Global Area 2276634624 bytes
Fixed Size                  1300068 bytes
Variable Size             570427804 bytes
Database Buffers         1694498816 bytes
Redo Buffers               10407936 bytes
Database mounted.
Database opened.
SQL> !numactl --hardware
available: 2 nodes (0-1)
node 0 size: 5119 MB
node 0 free: 1331 MB
node 1 size: 4095 MB
node 1 free: 3951 MB

Ick. This means that Oracle11g AMM on Opteron servers is a Cyclops. Odd how this allocation came from memory attached to socket 0 when the file creation with dd(1) landed in socket 1’s memory. Hmm…

What to do? SUMA? Well, it seems as though I should be able to interleave tmpfs memory and use that for /dev/shm-at least according to the tmpfs documentation. And should is the operative word. I have been tweaking for a half hour to get the mpol=interleave mount option (with and without the -o remount technique) to no avail. Bummer!

If AMD can’t get the Barcelona and/or Budapest Quad-core off the ground (and into high-quality servers from HP/IBM/DELL/Verari), none of this will matter. Actually, come to think of it, unless Barcelona is really, really fast, you won’t be sticking it into your existing Socket F motherboards because that doubles your Oracle license fee (unless you are on standard edition which is priced on socket count). That leaves AMD Quad-core adopters waiting for HyperTransport 3.0 as a remedy. I blogged all this AMD Barcelona stuff already.

Given the NUMA characteristics of /dev/shm, I think I’ll test AMM versus MMM on NUMA, and them test again on SUMA-if I can find the time.

If anyone can get /dev/shm mounted with the mpol option, please let me know because, at times, I can be quite a dolt and I’d love this to be one of them.

Oracle11g Automatic Memory Management – Part II. Automatically Stupid?

Oracle Database 10g Automatic Memory Management (AMM) might have been this smart, but I don’t know. I’m playing a bit with Oracle Database 11g AMM and find that so far it is a pretty smart cookie-at least in the sense I’m blogging about in this entry.

Automatic So Long As You Do Everything Correctly 
One thing I always hate is a feature that causes the server to not function at all or at some degraded state if the configuration is not perfect for the feature. In my mind, if something is called automatic it should be automatic. Like I say, don’t make me kneel on peach pits to benefit from an automatic feature. So how about a quick test.

I’ve got 11g x86 on a Proliant DL380 fit with 4GB RAM. As I mention in this blog entry, Oracle uses memory mapped files in /dev/shm for the SGA when you use 11g AMM. On a 4GB system, the default /dev/shm size is about half of physical memory. I want to set up a larger, custom size and see if 11g AMM will try to cram 10lbs of rocks into a 5lb bag.

# umount /dev/shm
# mount -t tmpfs shmfs -o size=3584m /dev/shm
# df /dev/shm
Filesystem           1K-blocks      Used Available Use% Mounted on
shmfs                  3670016         0   3670016   0% /dev/shm

There, I now have 3.5GB of space for /dev/shm. I don’t want to use that much because leaving Linux with .5GB for the kernel will likely cause some chaos. I want to see if 11g AMM is smart enough to allocate just enough to fill the virtual address space of the Oracle processes. So, I set AMM larger than I know will fit in the address space of a 32-bit linux processes:

SQL> !grep MEMORY amm.ora

So what happened? Well, I haven’t relocated the SGA so I shouldn’t expect more than about 2GB. I wouldn’t expect more than about 1.7GB for buffers. Did AMM try to over-allocate? Did it get nervous and under-allocate? Did it tell me to help it be more automatic through some configuration task I need to perform? Let’s see:

SQL> startup pfile=./amm.ora
ORACLE instance started.

Total System Global Area 2276634624 bytes
Fixed Size                  1300068 bytes
Variable Size             570427804 bytes
Database Buffers         1694498816 bytes
Redo Buffers               10407936 bytes
Database mounted.
Database opened.

Nice, AMM was smart enough to pile in about 1.6GB of buffers and the appropriate amount of variable region to go with it. A look at DBWR’s address space shows that the first 16MB /dev/shm granule file (my term) was mapped in at virtual address 512MB. The last 16MB segment fit in at 2688MB. The top of that last granule is 2704MB. If I subtract off the sum of 2276634624 (show sga) + 512MB (the attach address) I’m left with a little over 20MB that are most likley Oracle rounding up for page alignment and other purposes.

# pmap `pgrep -f dbw` | grep 'dev.shm' | head -1
20000000      4K r-xs-  /dev/shm/ora_bench1_1441803_0
# pmap `pgrep -f dbw` | grep 'dev.shm' | tail -1
a8000000  16384K rwxs-  /dev/shm/ora_bench1_1507341_7

I don’t really expect folks to be running Oracle in production on 32bit Linux servers in modern times, but I was pleasantly surprised to see 11g AMM is smart enough to poke around until it filled the address space. I asked for more than it could give me (MEMORY_TARGET=3500M) and instead of failing and suggesting I request less automatic memory, it did the right thing. I like that.

Yet Another Excellent RAC Install Guide

Tim Hall sent me email to point me to a recent step-by-step install tip he produced for Oracle11g with NAS storage (NFS).  In the email he asked me if I had any experience with the new Oracle11g Direct NFS (DNFS) feature. The answer is, yes, I have a lot of DNFS experience as I hinted to with my blog post entitled Manly Men Only Deploy Oracle With Fibre Channel Part VI. Introducing Oracle11g Direct NFS. If I haven’t plugged the “Manly Man” series lately I am doing so again now. I think anyone interested in storage with an Oracle slant would take interest.  The full series of Manly Man posts can be found easily through my index of CFS/NFS/ASM Topics as well as this entry about a recent Oracle 300GB TPC-H result. That TPC-H result is very interesting-especially if you are trying to get out of SAN/DAS/NAS rehab. Yes, that was supposed to be humorous.

Back to the point. Here is the link to Tim’s (very good) step-by-step Oracle Database 11g RAC on Linux setup for NFS environments. I especially liked the mention of Direct NFS since I think it is very important technology as my jointly-authored Oracle Whitepaper on the topic should attest.

Improved Linux Real Application Clusters Clusterware Installation with Oracle Database 11g

Just a quick blog entry. I have installed 11g RAC quite a few times already and just wanted to share with you folks an observation.

Those of you who have installed 10gR2 Clusterware (CRS) know that towards the end of the CRS installation you have to go from node to node and execute $ORA_CRS_HOME/ When you run it on the last node the script will try to set up VIPs (this is why you have to run the CRS as root in an xterm because it is a window-less JAVA app). Oracle1gR2 has had an annoying bug in it that failed the VIP setup because it was very picky about the IP addresses you assigned to VIPs. The workaround for that was to ignore the problem and then invoke $ORA_CRS_HOME/bin/vipca and walk through the setup of VIPs (including GSD and so on). It was a minor problem that was easy to work around.

10g and 11g Clusterware Co-Existence

I have not seen that problem with 11g. In fact, the reason I’m blogging this is because I just walked through an install of 10gR2 Clusterware on my cluster running x86 RHEL4 attached to NAS (NFS). I need a setup where I have both 10g and 11g clusterware installed and I need to be able to “hide” and “expose” either with a few simple commands to test one or the other. After the successful install of 10gR2 CRS, I “hid it” (to include all the residue in /etc) and proceeded to install 11g CRS. Since I just did both 10gR2 CRS and 11g CRS installs back to back I was reminded that 10gR2 CRS has that pesky problem and I did have to hand-invoke vipca to get through it. I was pleasantly reminded, however, that 11g does not have that problem.

For those of you who are used to seeing the complaint about VIPs at the conclusion of the last execution, see the following screen shot from 11g and breathe a sigh of relief.


And a picture speaks a thousand words so here is a shot of my little 11g NAS RAC clusterware setup:


Note to self: Investigate whether 11g CRS works with 10gR2 RAC instances and make a blog entry. It should, so I will.

Oracle11g: Where’s My Alert Log?

Just a short blog entry about Oracle 11g. One of the first things that caught me by surprise with 11g, when I first started in the beta program, was that the default location for the alert log has moved. It is still placed under the traditional OFA structure, but not /u01/app/oracle/admin. There is a new directory called diag that resides in /u01/app/oracle as seen on one of my systems:

 $ pwd
 $ ls -l
 total 144
 drwxr-xr-x 2 oracle dba 4096 Jul 9 21:32 alert
 drwxr-xr-x 3 oracle dba 4096 Jul 8 11:11 cdump
 drwxr-xr-x 2 oracle dba 4096 Jul 9 04:02 hm
 drwxr-xr-x 9 oracle dba 4096 Jul 8 11:11 incident
 drwxr-xr-x 2 oracle dba 4096 Jul 9 04:02 incpkg
 drwxr-xr-x 2 oracle dba 4096 Jun 29 22:00 ir
 drwxr-xr-x 2 oracle dba 4096 Jul 10 08:59 lck
 drwxr-xr-x 2 oracle dba 4096 Jul 10 08:59 metadata
 drwxr-xr-x 2 oracle dba 4096 Jul 8 11:11 stage
 drwxr-xr-x 2 oracle dba 4096 Jul 8 11:11 sweep
 drwxr-xr-x 3 oracle dba 57344 Jul 10 09:02 trace
 $ cd trace
 $ pwd
 $ ls -l alert*
 -rw-r----- 1 oracle dba 1098745 Jul 10 09:00 alert_bench1.log

In this case, my database is called bench and the first instance is bench1. To quickly locate alert logs associated with many different ORACLE_HOMEs simple execute the adrci command and then execute “show alert”

Manly Men Only Deploy Oracle with Fibre Channel – Part VI. Introducing Oracle11g Direct NFS!

Since December 2006, I’ve been testing Oracle11g NAS capabilities with Oracle’s revolutionary Direct NFS feature. This is a fantastic feature. Let me explain. As I’ve laboriously pointed out in the Manly Man Series, NFS makes life much simpler in the commodity computing paradigm. Oracle11g takes the value proposition further with Direct NFS. I co-authored Oracle’s paper on the topic:

Here is a link to the paper.

Here is a link to the joint Oracle/HP news advisory.

What Isn’t Clearly Spelled Out. Windows Too?
Windows has no NFS in spite of stuff like SFU and Hummingbird. That doesn’t stop Oracle. With Oracle11g, you can mount directories from the NAS device as CIFS shares and Oracle will access them with high availability and performance via Direct NFS. No, not CIFS, Direct NFS. The mounts only need to be visible as CIFS shares during instance startup.

Who Cares?
Anyone that likes simplicity and cost savings.

The Worlds Largest Installation of Oracle Databases
…is Oracle’s On Demand hosting datacenter in Austin, Tx. Folks, that is a NAS shop. They aren’t stupid!

Quote Me

The Oracle11g Direct NFS feature is another classic example Oracle implementing features that offer choices in the Enterprise data center. Storage technologies, such as Tiered and Clustered storage (e.g., NetApp OnTAP GX, HP Clustered Gateway), give customers choices—yet Oracle is the only commercial database vendor that has done the heavy lifting to make their product work extremely well with NFS. With Direct NFS we get a single, unified connectivity model for both storage and networking and save the cost associated with Fibre Channel. With built-in multi-path I/O for both performance and availability, we have no worries about I/O bottlenecks. Moreover, Oracle Direct NFS supports running Oracle on Windows servers accessing databases stored in NAS devices—even though Windows has no native support for NFS! Finally, simple, inexpensive storage connectivity and provisioning for all platforms that matter in the Grid Computing era!

Oracle11g Now Exists! Are the Files Secure or Fast?

Yes, July 11 2007 is here and so is Oracle11g. I wonder what that stuff was that I’ve been testing since December 2006? Anyway, this article covers the launch this morning. It’s standard fare news coverage, but I picked something out and I thought I’d see if I could blog it first. The article states:

Oracle Fast Files

The next-generation capability for storing large objects (LOBs) such as images, large text objects, or advanced data types – including XML, medical imaging, and three-dimensional objects – within the database. Oracle Fast Files offers database applications performance fully comparable to file systems. By storing a wider range of enterprise information and retrieving it quickly and easily, enterprises can know more about their business and adapt more rapidly.

Odd. I’ve known that feature as SecureFiles for months now. Looks like a name change.

Is it True?
I don’t know whether LOBs in 11g Fast Files or Secure Files is faster than accessing them from calls to a filesystem. I’ve tested a lot of Oracle11g and that isn’t one of the features I’ve looked at. I did blog on this feature rather pessimistically way back in November 2006 in this blog entry—before I had my hands on Oracle11g.

My Take?
I hope the Secure/Fast Files feature is indeed faster and better than calls out to a filesystem. The more comprehensive Oracle becomes the better! Regular readers of my blog know the topic of unstructured data is a regular rant of mine.

Here is a link to a late-breaking Oracle paper on SecureFiles.

Every Release of Oracle Database is “The Best Ever”, Right? Enter Oracle11g!


I see Eddie Awad has hit the press about the July 11 launch of Oracle11g. From everything I’ve seen in this release there should be no technical reasons holding back customers’ adoption. There are certain folks out there that say every release of Oracle “is the best I’ve seen yet.” A good way to sell books for sure, but I’m not that way. I most certainly didn’t say that about Oracle 7.2! Anyway, I have been thoroughly impressed with my testing—most particularly in the area of stability. And, as I keep hinting, there are features that neither Rich Niemiec, Mike Ault nor Don Burleson have been discussing that I think will be very attractive—especially in the commodity computing space. Unfortunately I have to remind myself of the real world and how long it takes to get applications qualified for a release of the server. Let’s hope E-Biz gets there as soon as possible.

Big Drives Dislike Little Nibbles. Got Unstructured Data? Oracle11g!

I just read a nice piece by Robin Harris over at one of my favorite blogs,, about large sequential I/Os being the future focus of storage. I think he is absolutely correct. Let’s face it; there is only so much data that can take the old familiar row and column form that we Oracle folk spend so much time thinking about. As Robin points out, digitizing the analog world around us is going to be the lion’s share of what consumes space on those little round, brown spinning things. Actually, the future is now.

Drives for OLTP? Where?
They aren’t making drives for OLTP anymore. I’ve been harping on this for quite some time and Robin touched on it in his fresh blog entry so I’ll quote him:

Disk drives: rapidly growing capacity; slowly growing IOPS. Small I/Os are costly. Big sequential I/Os are cheap. Databases have long used techniques to turn small I/Os into larger ones.

He is right. In fact, I blogged recently about DBWR multiblock writes. That is one example of optimizing to perform fewer, larger I/Os. However, there is a limit to how many such optimizations there are to be found in the traditional OLTP I/O profile. More to the point, hard drive manufacturers are not focused on getting us our random 4KB read or write in, say, 8ms or whatever we deem appropriate. OLTP is just not “sexy.” Instead, drive manufacturers are working out how to get large volumes of data crammed onto as few platters as possible. After all, they make money on capacity, not IOPS.

Did Oracle Get The Memo? Where is That Unstructured Data?
Many of you can recall the Relational, OODB and Object Relational wars of the 1990s. That not-so-bloody war left behind a lot of companies that closely resemble dried bones at this point. After all, Oracle had the relational model well under control and there wasn’t a whole lot of user-generated content in those days—the world was all about rows and columns. Over the last 7-10 years or so, folks have been using Oracle procedures to integrate unstructured data into their applications. That is, store the rows and columns in the database and call out to the filesystem for unstructured data such as photos, audio clips, and so on. You don’t think Oracle is going to give up on owning all that unstructured data do you?

The Gag Order
Oracle has put a gag-order on partners where Oracle11g is concerned—as is their prerogative to do. However, that makes it difficult for me to tie Oracle11g into this thread. Well, last week at COLLABORATE ’07, Oracle executives stumped about some Oracle11g features, but none that I can tie in with this blog post—I’m so sad. Hold it, here I find an abstract of a presentation given by an Oracle employee back in February 2007. Conveniently for me, the title of the presentation was Oracle 11g Secure Files: Unifying files in the database—exactly the feature I wanted to tie in. Good, I still haven’t revealed any information about Oracle futures that hasn’t shown up elsewhere on the Web! Whew. The abstract sums up what the Secure Files feature is:

Unifying the storage of files and relational data in the database further allows you to easily and securely manage files alongside the relational data.

Let me interpret. What that means is that Oracle wants you to start stuffing unstructured data inside a database. That is what the Object guys wanted as well. Sure, with such content inside an Oracle database the I/Os will be large and sequential but the point I’m trying to make is that 11g Secure Files puts a cross-hair on folks like Isilon—the company Robin discussed in his blog entry. Isilon specializes in what Robin calls “big file apps.” Having Oracle’s cross-hairs on you is not great fun. Ask one of the former OODB outfits. And lest we forget, Larry Ellison spent money throughout the 1990s to try to get nCUBE’s video on demand stuff working.

We’ve come full circle.

Disks, or Storage?
Today’s databases—Oracle included—make Operating System calls to access disks, or more succinctly, offsets in files. I’ll be blogging soon on Application Aware Storage which is an emerging technology that I find fascinating. Why just send your Operating System after offsets in files (disks) when you can ask the Storage for so much more?

Which Version Supports Oracle Over NFS? Oracle9i? Oracle10g?

Recently, a participant on the oracle-l email list asked the following question:

Per note 359515.1 nfs mounts are supported for datafiles with oracle 10. Does anyone know if the same applies for 9.2 databases?

I’d like to point out a correction. While Metalink note 359515.1 does cover Oracle10g related information about NFS mount options for various platforms, that does not mean Oracle over NFS is limited to Oracle10g. In fact, that couldn’t be further from the truth. But before I get ahead of myself I’d like to dive in to the port-level aspect of this topic.

There are no single set of NFS mount options that work across all Oracle platforms. In spite of that fact, another participant of the oracle-l list replied to the original query with the following:

try :

OK, the problem is that of the 6 platforms that support Oracle over NFS (e.g., Solaris, HP-UX, AIX, Linux x86/x86_64/AI64), the forcedirectio NFS mount option is required only on Solaris and HP-UX. For this reason, I’ll point out that the best references for NFS mount options to use for Oracle10g is Metalink 359515.1 and NAS vendors’ documents for Oracle9i.

Support for Oracle9i on NFS was a little spottier than Oracle10g, but it was there. The now defunct Oracle Storage Compatibility Program (OSCP) was very important in ensuring Oracle9i would work with varying NAS offerings. The Oracle server has evolved nicely to handle Oracle over NFS to such a degree that the OSCP program is no longer even necessary. That means that Oracle10g is sufficiently robust to know whether the NFS mount you are feeding it is valid. That aside, the spotty Oracle9i support I allude to is actually at the port level mostly. That is, from one port to another, Oracle9i may or may not have required patches to operate efficiently and with integrity. One such example is the Oracle9i port to Linux where Oracle Patch number 2448994 was necessary so that Oracle would open files on NFS mounts with the O_DIRECT flag of the open(2) call. But, imagine this, it was not that simple. No, you had to have the following correct:

  • The proper mount options specified by the NAS vendor
  • A version of the Linux kernel that supported O_DIRECT
  • Oracle patch 2448994
  • The Correct setting for the filesystemio_options init.ora parameter

Whew, what a mess. Well, not that bad really. Allow me to explain. Both of the Linux 2.6 Enterprise kernels (RHEL 4, SuSE 9) support open(2)s of NFS files with the O_DIRECT. So there is one requirement taken care of—because I assume nobody is using RHAS 2.1. The patch is simple to get from Metalink and the correct setting of the filesystemio_options parameter is “directIO”. Finally, when it comes to mount options, NAS vendors do pretty well documenting their recommendations. Netapp has an entire website dedicated to the topic of Oracle over NFS. HP EOMs the File Serving Utility for Oracle from PolyServe and documents their mount options in their User Guide as well as in this paper about Oracle on the HP Clustered Gateway NAS.

I’m not aware of any patches for any Oracle10g port to enable Oracle over NFS. I watch the Linux ports closely and I can state that canned, correct support for NFS is built in. If there were any Oracle10g patches required for NFS I think they’d be listed in Metalink 359515.1 which, at this time, does not specify any. As far as the Linux ports go, you simply mount the NFS filesystems correctly and set the init.ora parameter filesystemio_options=setall and you get both Direct I/O and asynchronous I/O.


I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 747 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories


All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: