Archive for the 'oracle' Category



Proof-Positive: Memory is Faster Than Disk. Don’t Need No Book Learnin’ to Cipher That One.

I’ve been reading a lot of blogosphere content about Data Warehousing these days. I’ve taken a lot of interest in such technology as Netezza, GreenPlum, DATAllegro and others and blog reading proves to be an interesting way to augment one’s knowledge. Who’d have thought I’d learn so much about OLTP through this reading.

Memory is Faster than Disk, So Let’s Do a Complete Rewrite
Why, just today I found out that it is time for a total rewrite of commercial RDBMS products. Uh huh. More interestingly, though, I learned:

  • Memory is faster than disk. Really, truely, it is!
  • A dual-core (2.8GHz) server with 4GB memory and 4 250GB SATA drives can perform 51,000 TpmC
  • Disabling transaction logging entirely in a commercial RDBMS will increase throughput (TpmC) about three-fold

I found these pearls of wisdom while reading a Stonebraker paper referred to on this blog post. Yes, I know that blog is basically a store-front for Vertica, but I like to learn about different things that are going on in database technology. Unfortunately this time I was wasting my time. The URL in that blog post points to the VLDB front page, but a little sleuthing found the paper posted here: The End of an Architectural Era (It’s Time for a Complete Rewrite).

Recite after me:

If you get two orders of magnitude performance gain, you are either not doing it or you’ve moved it closer to the processor.

Dang, and I ain’t even got no too pretty good pedigree. Pshaw, I dasn’t fidget ‘mungst the quality!

Central versus De-centralized versus Shared-Nothing
No, it isn’t time for a re-write, especially one that requires a complete shared-nothing database approach. Now don’t get me wrong, I’m all for de-coupling and grid architecture-most particularly where storage is concerned. If I hear of another poor production site that is head-saturated on a $500,000 storage array when driving a measly 15 or so 15K RPM drives, I’ll BAARF. Please see the following post for what I’m talking about:

Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?

My Blog Posts Prove Oracle Doesn’t Support NFS!

In my post called Building a Stretch Real Application Clusters Configuration? Get The CRS Voting Disk Setup Right!, I linked to a paper Oracle maintains that explains how to use an NFS export from a small Unix/Linux server as storage for a third voting disk in a stretch RAC cluster. I pointed out that the paper instructs on how to use the noac mount option for Linux RAC clusters in spite of the many resources that suggest actimeo=0 will do. The authors of the document are standing fast that if you are building a Linux-based RAC stretch cluster and are using an NFS mount as a third voting device you do indeed need to mount that particular NFS filesystem with noac. That nugget of truth contradicts so many different documents that I don’t care to list. Instead, I’ll list a resource from Metalink that helps clarify the issue. In fact, I would say that no matter what the sundry Installation Guides or Release Notes say, refer to Metalink 359515.1 when the topic of Oracle Database 10g on NFS filesystems comes up.

Datafiles or CRS Files? The Mount Options Differ.
Metalink 359515.1 is a really helpful note. It spells out the RAC-related mount options for 10gR2 on Solaris, AIX, HP-UX and Linux. Most importantly, it spells out the options for the datafiles and the CRS files in two separate columns. Lo and behold, Metalink 359515.1 clearly spells out that noac is needed for CRS files, but not for datafiles.

In the comment section of Building a Stretch Real Application Clusters Configuration? Get The CRS Voting Disk Setup Right!, a reader points out that the Third Party Vote on NFS paper has a dead end URL (http://www.oracle.com/technology/deploy/availability/htdocs/vendors_nfs.html) in the section that aimed to point out the fact that you cannot just use some Unix/Linux server NFS exports for any other purpose than this unique third voting disk setup in a stretch cluster scenario. He is right, that URL is a dead end, but I’d rather point folks to Linux RTCM (RAC Technology Certification Matrix) or the Unix RTCM, both of which clearly spell out a list of supported NFS File servers. Missing from the list is, of course, some plain old Linux or Unix server dishing out NFS exports-because the only supported application of simple, Unix/Linux NFS exports is the third voting disk scenario in a stretch cluster.

The reader also added this comment:

Many people say Oracle doesn’t support NFS, we need to verify. Searching oracle.com, “We did not find any search results for: vendors_nfs.html” and the references from google all seem to point at that one mysteriously missing doc.

Gee whiz. Where to start? Yes, for the eleventeenth time, NFS filesystems are supported for Oracle Database (including RAC). Let’s not get so easily confused; NFS is a protocol and the storage is NAS. Let’s all enter the following formula in our decoder rings before reading Oracle documents:

(Some Stupid Little Linux/Unix Server Exporting Filesystems via NFS) != NAS

The only supported application of non-NAS NFS is described in the following paper: Using NFS for a Third CRS Voting Device

Now, for some light reading about Oracle on NFS with 11g, I submit:

Oracle Database 11g on NFS filesystems:

Using Network Attached Storage or NFS File Systems Installation Guide for Microsoft Windows

Using Network Attached Storage or NFS File Systems Installation Guide for HP-UX

Using Network Attached Storage or NFS File Systems Installation Guide for AIX 5L Based Systems (64-Bit)

Using Network Attached Storage or NFS File Systems Installation Guide for Solaris Operating System

Using Network Attached Storage or NFS File Systems Installation Guide for Linux

And, of course: Configuring Direct NFS Storage for Datafiles

But, let’s not forget:

Those Oracle Installs Just Keep Getting More and More Difficult

In my recent rant about Oracle database installation difficulties, I provided a link to a video in which fellow OakTable Network members Morten Egan and Mogens Norgaard captured how difficult the task really is.

Well, they’re at it again. You’ll see Morten “The Nose” Egan start out this new video taped Oracle installation by configuring a SAN with what looks like the HP Array Configuration Utility, but then my eyes are getting as bad as my blogging frequency. I couldn’t miss the Windows Disk Manager though-not even on fast forward.

I think we should start calling him Morten “The Hair” Egan. The link to the video follows:

Unconventional Oracle Installation Part II

Question: How to Choose From the Last of the Non-NUMA Xeon-based Servers

I thought a comment on one of my recent blog entries deserved handling in a blog entry. A reader posted:

Have you done any comparisons of the HP DL585 with an HP DL580? Is the DL580 a NUMA machine? Which one would you by today for a RAC cluster?

I’ll answer these out of order. The DL580 is not a NUMA system. Although it stands to reason that if HP continues the DL580 product line into the future to the point where they bake in the CSI interconnect then at that time the DL580 would be a NUMA system. So, the short answer to whether or not a DL580 is a NUMA system is no, it is not. I think long answers are more fun.

In my series of posts about Oracle on NUMA, I think I must have said it about umpteen times, but I’ll say it again concisely in this post. I’m talking about what “NUMA-aware” software means. I routinely hear that Oracle is NUMA-aware. It is, and it isn’t. The reason I say this is because there are widely varying degrees of NUMA-awareness that varies between hardware platforms and Oracle ports. I made the point in my recent post about Oracle Database 10g 10.2.0.4 that 10.2.0.4 contains NUMA-related fixes, and it does. However, that isn’t saying it is the fullness of NUMA-aware, because it isn’t. However, the only question that matters is whether it is sufficiently NUMA-aware for today’s NUMA systems, and I’d have to say that it is.

I’ll give a hint: No Linux Oracle release can be fully NUMA-aware until processes (e.g., shadow processes, PQO slaves, etc) can quickly and cheaply detect what CPU they are currently executing on and prefer memory resources based on that locality. Way back in 1996 I was in Advanced Oracle Engineering at Sequent and we were in the late stages of producing the first commercial NUMA system. It was my early Oracle work on Sequent’s NUMA that begat the GETENGNO(3SEQ) API, which was an extremely inexpensive call for processes to check what CPU they are executing on.

Let’s fast forward to today. The Linux development folks are considering the Linux corollary for Sequent’s GETENGNO() with the vgetcpu() call. The problem is that the call is very, very slow compared to the 4-6 cycles that Sequent required to inform a process what CPU it was executing on. Nonetheless, the point is that until vgetcpu() works, and Oracle exploits it, the pinnacle of NUMA-awareness has not been met. And while that may not matter given today’s AMD situation, it will certainly matter when Intel system are NUMA (e.g., CSI based). I guess I shouldn’t equate Linux NUMA with AMD since IBM’s x3950 is a building block for large NUMA systems, and there are others as well. But I was focusing on commodity-level NUMA systems which the x3950 most certainly is not.

There are a lot of factors in selecting hardware, but since I’m asked about DL585 vs DL580, I’d say DL580-so long as it is a DL580 G5. I have tested DL585 and DL580 side by side. However, that was a pretty old DL580 G3 (1066 MHz FSB). I see that the DL580 is now fit with the “Penryn” Xeons (e.g., 5460), which have a front-side bus speed of 1333MHz. There are G5’s that are fit with “Tigerton” Xeons which are 1066 MHz FSB. I’ve seen benchmark results that suggest there is some 21% to gain from going with a 5460-based G5 over a 7350-based G5. So, look closely at the specification. Also, I think a shrewd shopper would try to read the crystal ball to see when the DL580 G5 will be fit with the Xeon 5462 which has a 1600 MHz FSB. As always, with Oracle you want big pipes.

Oracle Database 11g Related Posts

I try not to make announcements about past posts, however, I noticed that my index page of 11g related posts was incomplete. Those of you who pick up my posts by RSS will have seen this stuff already. New readers that have been using the indexed pages might care to go to my index of 11g related posts. Parts II and III of the 11g Automatic Memory Management series were not linked. Now they are.

Really Bad Oracle Problems? Who To Call?

I’ve gotten many emails over the last several months that I’ve been blogging from IT shops inquiring as to whether or not I can consult in their datacenter on Oracle related performance problems or planning situations. Of course I have to turn such opportunities down since my current gig is with Oracle Server Technologies, but I’ve never been an independent. Fortunately I know really good people that are—many of whom are fellow members of the Oaktable Network. Once such independent is Kyle Hailey. If you are in the right geography ( or perhaps geography doesn’t matter, I don’t know) and need a real heavy hitter, I’d recommend contacting  PerfVision.com and have a chat with Kyle.

Some Clarifications about the Oracle Database 11g Direct NFS Feature

The blogging platform I use, WordPress, allows me to see from whence readers are being referred to my blog. I’ve gotten a couple of hits from this post on Jeff Browning’s blog. I’m not sure if that particular post is a critique or a caricature of my paper about Oracle Database 11g Direct NFS. Nonetheless, shortly after I first saw the post I privately informed Jeff of four inaccuracies contained in the post. They were:

1) Jeff’s assertion that Direct NFS will work on any NAS device is incorrect. I know of one NAS device in particular that DNFS will not function on whatsoever. Well, that’s not exactly true. It will function to the point of creating a database, but will corrupt the control file in doing so. I’m not going to state the particular manufacturer, because it is still unclear if there is a design incompatibility problem or simply a bug in that prevents that particular NAS device from functioning with DNFS.  Nonetheless, Jeff still hasn’t corrected the following quote, so I am here and now. Jeff said:

As such dNFS will work in exactly the same manner, with identical performance benefits, on any NAS device from any vendor.

That is incorrect.

2) Jeff uses incorrect terminology to explain one of the benefits of Direct NFS. That term is “context swapping”, and I quote:

Here is the theory behind dNFS. I/O on a database server occurs in a combination of user space and kernel space. Context swaps between the two spaces are expensive in terms of CPU cost. If you can move a part of that activity from kernel space into user space, you can save CPU cost due to reduced context swapping.

I’m not playing word games. Transitions from user to kernel mode are routinely, and inaccurately, referred to as context switching and while that is one of my pet peeves I suspect from the context of Jeff’s post that context switching is the term he actually meant. The problem is that system calls do not result in a context switch.

A context switch is the stopping of one process and switching to another for the sake of process scheduling. If a system call does not block, there is no context switch. An example would be getpid(2). On the other hand, a system call that does block (e.g., synchronous physical I/O) will count as a context switch because the process parked itself on a blocking system call. The scheduler will switch to a runable process according to such criteria as mode, priority, processor affinity and so on. The sum total of context switch categories is voluntary and involuntary. The former is when a process calls a blocking system call (or any other voluntary yield such as sched_yield(2)), the latter is when a user mode process executes to the end of its time slice at such a time as there is a runable process for the kernel to switch to.

And yes, I say all this about context switching versus context swapping with total disregard for this patent. In Unix/Linux, the term is context switch and knowing what that really means makes the output of monitoring commands such as vmstat(1) a little more meaningful I should think.

3) He stated 11g x86_64 was not available yet at the time of that blog post it was indeed available for download.

4) He misspelled my name in the blog post.

What is This Blog Post Really About?
Now, all of that aside, none of that is really what this blog post is about. This blog post is about a comment on Jeff’s blog. The reader posted a comment as follows:

One would assume that an OS vendor would be better at making NFS and multipathing work rather than a database vendor. Oracle has a disadvantage in that it has to test its client on all the various flavors of OS out there.

One would indeed assume just that-correctly in fact. Oracle Database 11g doesn’t make NFS better. Direct NFS is a replacement for NFS. While NFS is a great storage presentation model for Oracle (as I’ve said so many times), it does much more than Oracle requires. So, DNFS strips all that overhead out. Direct NFS is essentially an RPC shot straight at the NAS device-no file-related system calls (e.g., open(2), pread(2), io_submit(2), etc). And, oh, while I’m at it, I’ll point out that open(2), read(2) and io_submit(2) are system calls that do not result in a context switch (unless the read suffers a page cache miss or is O_DIRECT or raw(8)). But that is not what this blog post is about.

Another Blog Recommendation

Some time back I recommended folks give Greg Rahn’s blog a visit. The blog is at structureddata.org and I’ve been reading it for some time. I haven’t known Greg for that long, but I have the utmost respect for the group he is in since it is lead by Andrew Holdsworth. That means I have respect for Greg as well! J

Greg and I have had some short chats, the latest of which took place at Mogens Norgaard’s impromptu San Franscisco office of MiracleAS (Chevy’s across the street from Moscone south) where Mogens (as co-founder of the Oaktable Network) showed his ever-so-typical hospitality to fellow Oaktable member and those stray cats we happened to bring along with us. In my case, that was Greg Rahn and my old friend Randy from EMC (Federal) and Matt Zito of GridApp. As always that was a nice chat since Anjo Kolk, Doug Burns and Jared Still (Oaktable members) were there at one point or another as well. Of course we talked about Oracle, but I recall we spent more than a few moments talking about Data Warehouse 2.0 technology Data Warehouse Appliances-a topic I shall blog about in great detail soon enough.

I’m adding Greg to my blogroll.

If You Expect Cache, Make Sure You’re Cached.

I just did a Google blog search for information on the HP Smart Array Controller tool called hpaducli and didn’t find anything, so I’ll make this quick blog entry.

I just finished doing a small performance test on some storage I have access to and realized something was wrong. Granted, I’m doing a fairly wacky test, but something was wrong nonetheless. I am testing read throughput but limiting my dataset to an amount that fits within the HP Smart Array controller cache configured on the SB40c storage blade I am using. The timings I was getting were not coming in as a cached workload. I used the hpaducli command to find out that the cache was disabled. After the following grep(1) command told me I had no Smart Array cache, I used the ACU (cpqacuxe -R) to enable it and verified using the hpaducli -f command.

# grep -i 'Array Accelerator' /tmp/P400_status_before
      Array Accelerator is not configured.
      Array Accelerator is not configured.
      Array Accelerator is not configured.
      Array Accelerator is not configured.
      Array Accelerator is not configured.
      Array Accelerator is not configured.
# hpaducli -f /tmp/P400_status_after
# grep 'Array Accelerator'/tmp/P400_status_after
      Array Accelerator is enabled for this logical drive.
      Array Accelerator is enabled for this logical drive.
      Array Accelerator is enabled for this logical drive.
      Array Accelerator is enabled for this logical drive.
      Array Accelerator is enabled for this logical drive.
      Array Accelerator is enabled for this logical drive.

The moral of the story is when someone hands you some hardware, you never quite know what you are getting. What’s that line from Forrest Gump? You never know, this post might help someone, somewhere, someday.

Off Topic: Peave of the Day == The FAX Machine.

Instead of FAX communication, why don’t organizations just ask for 80 column punch cards or cuneiform on papyrus reed? These days I get so agitated when, say, a customer service department requires a FAX to clear up an issue. Does anyone else feel the way I do, or am I just (un?)characteristically crabby this morning?

Instead of wandering around looking for a FAX machine, I thought I’d email the following reasonable anti-FAX position to see where it would get me:

Regarding […customer service issue…], do you really want a FAX? Anymore that is becoming so outdated. How about a scanned PDF emailed? That would surely take the same amount of human resources (perhaps less since you wont be changing paper), cuts down on filing space, saves trees and keeps me from trying to find a  FAX machine.

And their brilliant reply? Of course:

Regarding your inquiry, the preferred methods of documentation submission are via postal mail [or] fax to our office at […]

Doesn’t it stand to reason I already knew their preferred method? Oh, hold it, there is new information there after all: I can simply stuff it in an envelope and snail mail it. Brilliant!

A Recommended Blog: Christian Bilien.

I’d like to recommend Christian Bilien’s Blog. He covers Solaris and SAN-related topics (and more) that pertain to Oracle very, very nicely. One more for the blog roll.

Blog A Lot, Or Blog Not. I’m Not Dead.

Well, it’s official. I am a completely delinquent blogger. It isn’t that I don’t have a lot of stuff to blog about, because I do. I just don’t know what is “safe” to blog about. I’m still trying to get a feel for things. I’ve taken a role as a Performance Architect in Oracle’s Systems Technology group, which is a division of the Database Server Technologies organization. I’m working on future products that I cannot talk about whatsoever. That is proving to be a bit of a bummer. Even if I blog about some of the lower-level stuff I’m seeing that is platform generic, it would still tip folks off on the sorts of things we are doing on this project, so as they say, “Mums the word.”

I know there are all sorts of folks I’ve wanted to meet up with at OOW. Send me an email at ora_kclosson at yahoo dot com , ok?

I do have a Linux block layer I/O related blog entry that is just about ready. I’ll do my best.

Oracle Database 11g (11.1.0.6) for x86_64

Oops, there was a false alarm about the availability of Oracle Database 11g (11.1.0.6) for x86_64 (AMD/EM64T). I just checked and it is now downloadable on OTN.

Automatic Databases Automatically Detect Storage Capabilities, Don’t They?

Doug Burns has started an interesting blog thread about the Oracle Database 11g PARALLEL_IO_CAP_ENABLED parameter in his blog entry about Parallel Query and Oracle Database 11g. Doug is discussing Oracle’s new concept of built-in I/O subsystem calibration-a concept aimed at more auto-tuning database instances. The idea is that Oracle is trying to make PQ more aware of the down-wind I/O subsystem capability so that it doesn’t obliterate it with a flood of I/O. Yes, a kinder, gentler PQO.

I have to admit that I haven’t yet calibrated this calibration infrastructure. That is, I aim to measure the difference between what I know a given I/O subsystem is capable of and what DBMS_RESOURCE_MANAGER.CALIBRATE_IO thinks it is capable of. I’ll blog the findings of course.

In the meantime, I recommend you follow what Doug is up to.

A Really Boring Blog Entry
Nope, this is not just some look at that other cool blog over there post. At first glance I would hope that all the regular readers of my blog would wonder what value there is in throttling I/O all the way up in the database itself given the fact that there are several points at which I/O can/does get throttled downwind. For example, if the I/O is asynchronous, all operating systems have a maximum number of asynchronous I/O headers (the kernel structures used to track asynchronous I/Os) and other limiting factors on the number of outstanding asynchronous I/O requests. Likewise, SCSI kernel code is fit with queues of fixed depth and so forth. So why then is Oracle doing this up in the database? The answer is that Oracle can run on a wide variety of I/O subsystem architectures and not all of these are accessed via traditional I/O system calls. Consider Direct NFS for instance.

With Direct NFS you get disk I/O implemented via the remote procedure call interface (RPC). Basically, Oracle shoots the NFS commands directly at the NAS device as opposed to using the C library read/write routines on files in an NFS mount-which eventually filters down to the same thing anyway, but with more overhead. Indeed, there is throttling in the kernel for the servicing of RPC calls, as is the case with traditional disk I/O system calls, but I think you see the problem. Oracle is doing the heavy lifting that enables you to take advantage of a wide array of storage options-and not all of them are accessed with age-old traditional I/O libraries. And it’s not just DNFS. There is more coming down the pike, but I can’t talk about that stuff for several months given the gag order. If I could, it would be much easier for you to visualize the importance of DBMS_RESOURCE_MANAGER.CALIBRATE_IO. In the meantime, use your imagination. Think out of the box…way out of the box…

Google AdSense Misfires.

I’ve never seen that one before. I don’t do the Google AdSense thing so it looks like those two retailers got a freebie. What do I get?

NOTE: You may have to right click-> view to get a good look.


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 741 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.