Archive Page 28

Improved Linux Real Application Clusters Clusterware Installation with Oracle Database 11g

Just a quick blog entry. I have installed 11g RAC quite a few times already and just wanted to share with you folks an observation.

Those of you who have installed 10gR2 Clusterware (CRS) know that towards the end of the CRS installation you have to go from node to node and execute $ORA_CRS_HOME/root.sh. When you run it on the last node the script will try to set up VIPs (this is why you have to run the CRS root.sh as root in an xterm because it is a window-less JAVA app). Oracle1gR2 has had an annoying bug in it that failed the VIP setup because it was very picky about the IP addresses you assigned to VIPs. The workaround for that was to ignore the problem and then invoke $ORA_CRS_HOME/bin/vipca and walk through the setup of VIPs (including GSD and so on). It was a minor problem that was easy to work around.

10g and 11g Clusterware Co-Existence

I have not seen that problem with 11g. In fact, the reason I’m blogging this is because I just walked through an install of 10gR2 Clusterware on my cluster running x86 RHEL4 attached to NAS (NFS). I need a setup where I have both 10g and 11g clusterware installed and I need to be able to “hide” and “expose” either with a few simple commands to test one or the other. After the successful install of 10gR2 CRS, I “hid it” (to include all the residue in /etc) and proceeded to install 11g CRS. Since I just did both 10gR2 CRS and 11g CRS installs back to back I was reminded that 10gR2 CRS has that pesky problem and I did have to hand-invoke vipca to get through it. I was pleasantly reminded, however, that 11g does not have that problem.

For those of you who are used to seeing the complaint about VIPs at the conclusion of the last root.sh execution, see the following screen shot from 11g and breathe a sigh of relief.

11g_crs2.jpg

And a picture speaks a thousand words so here is a shot of my little 11g NAS RAC clusterware setup:

11g_crs3.jpg

Note to self: Investigate whether 11g CRS works with 10gR2 RAC instances and make a blog entry. It should, so I will.

Oracle Database 11g: The SecureFiles Feature is not “Fast Files”, But Could Be Quite Fast

In my recent post about Oracle Database 11g SecureFiles, I referred to a July 11, 2007 press release that insinuated a name change from SecureFiles to Fast Files.

I just received email from one of the Program Managers (perhaps the 11g Program Manager—I’ll have to ask him) who set me straight that the feature is indeed called SecureFiles.

The press piece I refered to was in error.

As an aside, I’m chomping at the bits to do my own testing on just how much faster it is to store LOBs using the  SecureFiles feature than in a traditional file system. As I’ve pointed out, I hope it is better. Having more unstructured data support inside the database is a really good thing.

Manly Men Only Deploy Oracle with 64 Bit Linux – Part I. What About a x86 Port on EM64T/AMD64 Hardware?

In the comment thread of my of my latest installment in the “Manly Man” series, a reader posted a humorous comment that included a serious question:

[…] what are your thoughts about x86 vs. x86-64 Linux, in relation to Oracle and RAC? I’d appreciate a blog entry if you could.

Tim Hall followed up in suit:

I would have thought it was obvious. The number 64 is twice the size of 32, so it must be twice as good, making you twice as manly!

Yes, when it comes to bitness, some is good so more must be better, right? Tim then continued to point out the political incorrectness of the term Manly Man by suggesting the following:

PS. I think politically correct names for this series of blog entries are:

Personally Persons only deploy…
Humanly Humans only deploy…

Before I actually touch the 32 versus 64-bit question from the thread, I’ll submit the following Manly Man entertainment:

irish_spring.jpg

If you have enjoyed the Manly Man series at all, or simply need a little background, you must must see this YouTube Video about Manly Men and Irish Spring.

32 or 64 Bit Linux
When people talk about 32 versus 64 bit Oracle with Linux they are actually talking about 3 topics:

  1. Running 32 bit Oracle on 32 bit Linux with native 32 bit hardware (e.g., Pentium IV (Willamette), Xeon MP (Foster MP) ).
  2. Running 32 bit Oracle on 32 bit Linux with x86_64 hardware.
  3. Running 64 bit Oracle on 64 bit Linux with x86_64 hardware.

The oddball combination folks don’t talk about is 32 bit Oracle running on 64 bit Linux because Oracle doesn’t support it. I do test that however by installing Oracle on a 32 bit server and then NFS mounting that ORACLE_HOME over on a 64 bit Linux system. However, discussing this combination would therefore be moot due to the support aspect.

The most interesting comparison would be between 1 and 2 above provided both systems have precisely the same core clock speed, L2 cache and I/O subsystem. As such, the comparison would come down to how well the 32-bit optimized code is treated by the larger cache line size. There would, of course, be other factors since there are several million more transistors in an EM64T processor than a Pentium IV and other fundamental improvements. I have wished I could make that comparison though. The workload of choice would be one that “fits” in a 32 bit environment (e.g., 1GB SGA, 1GB total PGA) and therefore doesn’t necessarily benefit from 64 bitness.

If anyone were to ask me, I’d say go with 3 above. Oracle on x86_64 Linux is not new.

Bitness
In my recent blog entry about old software configurations in an Oracle over NFS situation, I took my well-deserved swipes at pre-2.6 Linux kernels. Perhaps the most frightening one in my experience was 32-bit RHEL 3. That whole 4/4 split kernel thing was a nightmare—unless you like systems that routinely lock up. But all told I was taking swipes at pre-2.6 kernels without regard for bitness. So long as the 2.6 kernel is on the table, the question of bitness is not necessarily so cut and dried.

In my recent blog entry about Tim Hall’s excellent step-by-step guide for RAC on NFS, a reader shared a very interesting situation he has gone through:

I have a Manly Man question for you. This Manly Man Wanna Be (MMWB) runs a 2-node 10g RAC on Dell 2850s with 2 dual-core Xeon CPUs (total of 4 CPUs). Each server has 16 GB of memory. While MMWB was installing this last year, he struggled mightily with 64-bit RAC on 64-bit Red Hat Linux 4.0. MMWB finally got it working after learning a lot of things about RPMs and such.

However, Boss Of Manly Man Wanna Be (BOMMWB) was nervous about 64-bit being “new,” and all of the difficulties that MMWB had with it, so we reinstalled with 32-bit RAC running on 32-bit Red Hat Linux 4.0.

My naturally petulant reaction would have been to focus on the comment about 64-bit being “new.” I’m glad I didn’t fire off. This topic deserves better treatment.

While I disagree with Boss of Manly Man’s assertion that 64-bit is “new”, I can’t take arms against the fact that this site measured different levels of pain when installing the same release of Oracle on the same release of RHEL4—only varying the bitness. It is unfortunate that this site has committed themselves to a 32 bit database based solely upon the their experiences during the installation. Yes, the x86_64 install of 10gR2 requires a bit more massaging of the platform vis a vis RPMs. In fact, I made a blog entry about 32 bit libraries required on 64 bit RHEL4. While there may occasionally be more headaches during an x86_64 install than x86, I would not deploy Oracle on a 32 bit operating system today unless there was a gun held to my head. All is not lost for this site, however. The database they created with 32 bit Oracle is perfectly usable in-place with 64 bit Oracle after a simple dictionary upgrade procedure documented in the note Metalink note entitled Changing between 32-bit and 64-bit Word Sizes (ML62290.1).

Has Anyone Ever Tested This Stuff?
I have…a lot! But I really doubt we are talking about running 32 bit Oracle on 32 bit hardware. Nobody even makes a native 32 bit x86 server these days (that I know of). I think the question at hand is more about 32 bit Oracle on 32 bit Linux with x86_64 hardware.

There has always been the big question about what 64 bit software performance is like when the workload possesses no characteristics that would naturally benefit from the larger address space. For instance, what about a small number of users attached to an SGA of 1GB and the total PGA footprint is no more than 1GB. That’s a workload that doesn’t need 64 bit. Moreover, what if the comparison is between 32 bit and 64 bit software running on the same server (e.g., AMD Opteron). In this case, the question gets more interesting. After all, the processor caches are the same, the memory->processor bandwidth is constant, the drivers can all DMA just fine. The answer is an emphatic yes! But yes, what? Yes there are occasions where 64 bit code will dramatically outperform 32 bit code on dual-personality 64 bit processors (e.g., AMD Opteron). It is all about porting. Let me explain.

The problem with running sophisticated 32 bit code on 64 bit processors is that the 32 bit code was most likely ported with a different processor cache line size in mind. This is important:

Native 32 bit x86 processors use a 32 byte cache line size whereas 64 bit processors (e.g., AMD64, EM64T) use a 64 byte cache line.
That means, in the case of a native 32 bit processor, load/store and coherency operations are performed on a swath of 32 bytes. Yes, there were exceptions like the Sequent NUMA-Q 2000 which had two different cache line sizes-but that was a prior life for me. Understanding cache line size and how it affects coherency operations is key to database throughput. And unlike Microsoft who never had to do the hard work of porting (IA64 not withstanding), Oracle pays very close attention to this topic. In the case of x86 Linux Oracle, the porting teams presumed the code was going to run on native 32 bit processors-a reasonable presumption.

What Aspects of Bitness Really Matter?
The area of the server that this topic impacts the most (by far) is latching. Sure, you use the database server to manage your data and the accesses to your data seem quite frequent to you (thousands of accesses per second), but that pales in comparison to the rate at which system memory is accessed for latches. These operations occur on the order of millions of times per second. Moreover, accesses to latches are write-intensive and most-painfully contended across multiple CPUs which results in a tremendous amount of bandwidth used for cache coherency. Spinlocks (latches) require attention to detail-period. Just sum up the misses and gets on all the latches during a processor-saturated workload sometime and you’ll see what I mean. What’s this have to do with 32 versus 64 bit?

It’s All About The Port
At porting time, Oracle pays close attention to ensure that latch structures fit within cache lines in a manner that eliminates false sharing. Remember, processors don’t really read or write single words. They read/write or invalidate the entire line that a given word resides in-at least when processor-to-memory operations occur. Imagine, therefore, a latch structure that is, say, 120 bytes long and that the actual latch word is the first element of the structure. Next, imagine that there are only 2 latches in our imaginary server and we are running on a 32 bit OS on a native 32 bit system such as Pentium IV or Xeon MP (Foster 32 bit) and therefore a 32 byte cache line size. We allocate and initialize our 2 structures at instance startup. These structures will lay out in 240 bytes within a single memory page. Since were dutiful enough to align our two structures on a page boundry, what we have is the first structure resides in the first 120 bytes of the memory page-the first 4 32 byte cache lines. But wait, there are 12 extra bytes in the 4th cache line. Doesn’t that mean the first 12 bytes of the second latch structure are going to share space in the 4th cache line? Not if you are good at porting. And in our example, we are.

That’s right, we were aware of our cache line size (32 bytes) so we padded the structure by allocating an array of unsigned integers (4 bytes) three deep as the last element of our structure. Now our latch structure is precisely 132 bytes or 4 lines. Finally, we have our imaginary 32 bit code optimized for a real 32 bit system (and therefore a 32 byte cache line size). That is, we have optimized our 32 bit software for our presumed platform which is 32 bit hardware. Now, if half of the CPUs in the box are hammering the first latch, there is no false sharing with the second. What’s this got to do with hardware bitness?The answer is in the fact that Oracle ports the x86 Linux release with a 32 bit system in mind.

Running 32 bit Code on a 64 bit CPU
The devil is in the details. Thus far our imaginary 2-latch system is optimized for hardware that operates on a 32 byte line. Since our structures fit within 4 32 byte lines or 2 64 byte lines should we execute on a x86_64 system there would be no false sharing so we must also be safe for a system with a 64 byte line, no? Well, true, there will be no false sharing between the two structures since they are now 2 64 byte lines as opposed to 4 32byte lines, but there is more to it.
Do you think it’s possible that the actual latch word in the structure might be adjacent (same 32 bytes) to anything interesting? Remember, heavily contended latches are constantly being tried for by processes on other CPUs. So if the holder of the latch writes on any other word in the cache line that holds the latch word, the processor coherency subsystem invalidates that line. To the other CPUs with processes spinning on the latch, this invalidation “looks” like the latch itself has been freed (changed from on to off) when in fact the latch is still held but an adjacent word in the same line was modified. This sort of madness absolutely thrashes a system. So, the dutiful port engineer rearranges the elements of the latch structure so that there is nothing else ever to be written in the same cache line that has the actual latch word. But remember, we ported to a 32 bit system with a 32 byte line. On the other hand, if you run this code on a 64 bit system–and therefore 64 byte lines–all of your effort to isolate the latch word from other write-mostly words was for naught. That is, if the cache line is now 64 bytes, any write by the latch holder in the first 64 bytes of the structure will cause invalidations (cache thrashing) for other processes trying to acquire the latch (spinning) on other CPUs. This isn’t a false sharing issue between 2 structures, but it has about the same effect.

Difficult Choices – 32 bit Software Optimized for 64 bit Hardware.
What if the porting engineer of our imaginary 2-latch system were to somehow know that the majority of 32 bit Linux servers would some day end up being 64 bit servers compatible with 32 bit software? Well, then he’d surely pad out the structure so that there are no frequently written words in the same 64 bytes in which the latch word resides. If the latch structure we have to begin with is 120 bytes, odds are quite slim that the percentage of read-mostly words will facilitate our need to pack the first 64 bytes with read-mostly objects along side the latch word. It’s a latch folks, it is not a read-mostly object! So what to do? Vapor!

Let’s say our 120 byte latch structure is a simple set of 30 words each being 4 bytes (remember we are dealing with a 32 bit port here). Let’s say further that there are only 4 read-mostly words in the bunch. In our imaginary 2 latch example, we’d have to set up the structure so that the first word is the latch, and the next 16 bytes are the 4 read-mostly elements. Now we have 20 bytes that need protection. To optimize this 32 bit code for a 64 bit processor, we’ll have to pad out to 64 bytes-with junk. So we’ll put an array of unsigned integers 11 deep (44 bytes) immediately after the latch word and our 4 read-mostly words. That fits nicely in 64 bytes-at the cost of wasting 44 bytes of processor cache for every single latch that comes through our processor caches. Think cache buffers chains folks! We aren’t done though.

We started with 120 bytes (30 4 byte words) and have placed only 5 of those words into their own cache line. We have 25 words, or 100 bytes left to deal with. Remember, we are the poor porting engineer that is doing an imaginary 32 bit software port optimized for 64 bit servers since nobody makes 32 bit servers any more. So, we’ll let the first 64 bytes of the remaining 100 fall into their own line. That leaves 36 bytes that we’ll also have to pad out to 64 bytes-there goes another 28 bytes of vapor. All told, we started with 120 bytes and wound up allocating 192 bytes so that our code will perform optimally on a processor that uses a 64 byte cache line. That’s a 60% increase in the footprint we leave on our processor caches which aren’t very large to start with. That’s how Oracle would have to optimize 32 bit Oracle for a 64 bit processor (x86 code on x86_64 kit). But they don’t because that would have been crazy. After all, 32 bit Oracle was intended to run on 32 bit hardware.

Porting, sound easy? Let me throw this one in there. It just so happens that the Oracle latch structure was in fact 120 bytes in Oracle8i on certain ports. Oh, and lest I forget, remember that Oracle keeps track of latch misses. What’s that got to do with this? Uh, that means processes that do not hold the latch increment counters in the latch structure when they miss. Imagine having one of those miss count words in the same line as the latch word itself!

This is tricky stuff.
Who Uses 32 bit Linux for Oracle These Days?
Finally, bugler, sound Taps.

A thread on oracle-l the other day got me thinking. The thread was about the difficulties being endured at a particular Linux RAC site that prompted the DBA there to audit what RPMs he has on the system. It appears as though everything installed was a revision “as high or higher” based on Oracle’s documented requirements. In his request for information from the list, I noticed uname(1) output that suggested he is using a 32 bit RHEL 4 system.

One place I always check for configuration information is Oracle’s Validated Configurations web page. This page covers Linux recipes for installation success. I just looked there to see if there was any help I could give offer that DBA and found that there are no 32 bit validated configurations!

I know there is a lot of 32 bit x86 hardware out there, but I doubt it is even possible to buy one today. Except for training or testing purposes I just can’t muster a reason to even use 32 bit Linux servers for the database tier at this point and to be honest, running a 32 bit port of Oracle on an x86_64 processor makes very little sense to me as well.

Oracle11g: Where’s My Alert Log?

Just a short blog entry about Oracle 11g. One of the first things that caught me by surprise with 11g, when I first started in the beta program, was that the default location for the alert log has moved. It is still placed under the traditional OFA structure, but not /u01/app/oracle/admin. There is a new directory called diag that resides in /u01/app/oracle as seen on one of my systems:


 $ pwd
 /u01/app/oracle/diag/rdbms/bench/bench1
 $ ls -l
 total 144
 drwxr-xr-x 2 oracle dba 4096 Jul 9 21:32 alert
 drwxr-xr-x 3 oracle dba 4096 Jul 8 11:11 cdump
 drwxr-xr-x 2 oracle dba 4096 Jul 9 04:02 hm
 drwxr-xr-x 9 oracle dba 4096 Jul 8 11:11 incident
 drwxr-xr-x 2 oracle dba 4096 Jul 9 04:02 incpkg
 drwxr-xr-x 2 oracle dba 4096 Jun 29 22:00 ir
 drwxr-xr-x 2 oracle dba 4096 Jul 10 08:59 lck
 drwxr-xr-x 2 oracle dba 4096 Jul 10 08:59 metadata
 drwxr-xr-x 2 oracle dba 4096 Jul 8 11:11 stage
 drwxr-xr-x 2 oracle dba 4096 Jul 8 11:11 sweep
 drwxr-xr-x 3 oracle dba 57344 Jul 10 09:02 trace
 $ cd trace
 $ pwd
 /u01/app/oracle/diag/rdbms/bench/bench1/trace
 $ ls -l alert*
 -rw-r----- 1 oracle dba 1098745 Jul 10 09:00 alert_bench1.log

In this case, my database is called bench and the first instance is bench1. To quickly locate alert logs associated with many different ORACLE_HOMEs simple execute the adrci command and then execute “show alert”

Manly Men Only Deploy Oracle with Fibre Channel – Part VII. A Very Helpful Step-by-Step RAC Install Guide for NFS

Tim Hall has stepped up to the plate to document a step-by-step recipe for setting up Oracle10g RAC on NFS mounts. In Tim’s blog entry, he points out that for testing and training purposes it is true that you can simply export some Ext3 filesystem from a Linux server and use it for all things Oracle. Tim only had 2 systems, so what he did was use one of the servers as the NFS server. The NFS server exported a filesystem and both the servers mounted the filesystem. In this model, you have 2 NFS clients and one is acting as both an NFS client and an NFS server.

This is the link to Tim’s excellent step-by-step guide.

How Simple

If you’ve ever had a difficult time getting RAC going, I think you’d be more than happy with how simple it is with NFS and using Tim’s guide and a couple of low-end test servers would prove that out.

Recently I blogged about the fact that most RAC difficulties are in fact storage difficulties. That is not the case with NFS/NAS.

Thanks Tim!

Manly Men Only Deploy Oracle with Fibre Channel – Part VI. Introducing Oracle11g Direct NFS!

Since December 2006, I’ve been testing Oracle11g NAS capabilities with Oracle’s revolutionary Direct NFS feature. This is a fantastic feature. Let me explain. As I’ve laboriously pointed out in the Manly Man Series, NFS makes life much simpler in the commodity computing paradigm. Oracle11g takes the value proposition further with Direct NFS. I co-authored Oracle’s paper on the topic:

Here is a link to the paper.

Here is a link to the joint Oracle/HP news advisory.

What Isn’t Clearly Spelled Out. Windows Too?
Windows has no NFS in spite of stuff like SFU and Hummingbird. That doesn’t stop Oracle. With Oracle11g, you can mount directories from the NAS device as CIFS shares and Oracle will access them with high availability and performance via Direct NFS. No, not CIFS, Direct NFS. The mounts only need to be visible as CIFS shares during instance startup.

Who Cares?
Anyone that likes simplicity and cost savings.

The Worlds Largest Installation of Oracle Databases
…is Oracle’s On Demand hosting datacenter in Austin, Tx. Folks, that is a NAS shop. They aren’t stupid!

Quote Me

The Oracle11g Direct NFS feature is another classic example Oracle implementing features that offer choices in the Enterprise data center. Storage technologies, such as Tiered and Clustered storage (e.g., NetApp OnTAP GX, HP Clustered Gateway), give customers choices—yet Oracle is the only commercial database vendor that has done the heavy lifting to make their product work extremely well with NFS. With Direct NFS we get a single, unified connectivity model for both storage and networking and save the cost associated with Fibre Channel. With built-in multi-path I/O for both performance and availability, we have no worries about I/O bottlenecks. Moreover, Oracle Direct NFS supports running Oracle on Windows servers accessing databases stored in NAS devices—even though Windows has no native support for NFS! Finally, simple, inexpensive storage connectivity and provisioning for all platforms that matter in the Grid Computing era!

Oracle11g Now Exists! Are the Files Secure or Fast?

Yes, July 11 2007 is here and so is Oracle11g. I wonder what that stuff was that I’ve been testing since December 2006? Anyway, this CNNMoney.com article covers the launch this morning. It’s standard fare news coverage, but I picked something out and I thought I’d see if I could blog it first. The article states:

Oracle Fast Files

The next-generation capability for storing large objects (LOBs) such as images, large text objects, or advanced data types – including XML, medical imaging, and three-dimensional objects – within the database. Oracle Fast Files offers database applications performance fully comparable to file systems. By storing a wider range of enterprise information and retrieving it quickly and easily, enterprises can know more about their business and adapt more rapidly.

Odd. I’ve known that feature as SecureFiles for months now. Looks like a name change.

Is it True?
I don’t know whether LOBs in 11g Fast Files or Secure Files is faster than accessing them from calls to a filesystem. I’ve tested a lot of Oracle11g and that isn’t one of the features I’ve looked at. I did blog on this feature rather pessimistically way back in November 2006 in this blog entry—before I had my hands on Oracle11g.

My Take?
I hope the Secure/Fast Files feature is indeed faster and better than calls out to a filesystem. The more comprehensive Oracle becomes the better! Regular readers of my blog know the topic of unstructured data is a regular rant of mine.

Here is a link to a late-breaking Oracle paper on SecureFiles.

Manly Men Only Deploy Oracle with Fibre Channel – Part V. What About Oracle9i on RHAS 2.1? Yippie!

Due to my Manly Man Fibre Channel Series Part I , Part II , Part III and Part IV, my email box is getting loaded with a lot of questions about various Oracle over NFS combinations. The questions run the gamut from how to best tune Oracle9i on Red Hat AS 2.1 to Oracle10g on Red Hat RHEL 3 (all on NAS/NFS of course). And then it dawned on me. When I say I’m a fan of Oracle over NFS, that is just entirely too generic.

It Ain’t Linux Unless It Is a 2.6 Kernel
Honestly folks, Red Hat 3.0-or worse yet, RHAS 2.1? Sheer madness. I’m more than convinced that there are a lot of solid RHEL 3.0 systems out there running Oracle. To those folks I’d say, “If it isn’t broken, don’t fix it.” But RHAS 2.1? That wasn’t even an operating system and to be hyper-critically honest, the “franken-kernel” that was RHEL 3.0 wasn’t really that much better, what with that hugemem 4×4 split garbage and all. SuSE SLES8 was vastly more stable than RHEL 3.0. But I digress. Look, if you are running on a pre-2.6 Kernel Linux distribution you’ve simply got to do yourself a favor and plan an upgrade! Now, back to NAS.

What Oracle on NFS?
I’ll be brief, I wouldn’t even think about using Oracle9i on NAS. I know there are a ton of databases out there doing it, but that is just me. The Oracle Server code specific to NFS (Operating System Dependent code) has gone through some serious evolution/maturation. I’ve watched the changes specifically handling NFS mature from 9i through 10g and now into 11g. Simply put, I didn’t like what I say in Oracle9i-specific to NFS that is. Oracle9i is a perfectly fine release-albeit the port to 64bit Linux was pretty scary. I guess I wasn’t that brief. So I’ll continue.

So, Oracle9i on NAS is a no-go (in my book), what about Oracle10g? There again, I’ll be brief. In my opinion, Oracle10gR1 on NAS was about as elegant as a fish flopping around on a hot sidewalk-not a pretty picture. Yes, I have my reasons why for all this stuff, but this blog entry is purely an assertion of my opinion.

Thus far, I discussed 9i and 10gR1 Linux ports. I cannot speak authoritatively about the Solaris ports of either vis a vis fitness for NFS. If I was a betting man and had two dimes to rub together I would wager them that even the Solaris releases of 9i and 10g were probably pretty shaky on NAS. That leads us to 10gR2.

Solid
Oracle10gR2 on NAS is solid-at least for Linux clients. I have seen Metalink stories about Legacy Unix ports that have RMAN problems with NFS as a near-line backup target. Again, I cannot speak for all these sundry platforms. They are good platforms, but I don’t deal with them day to day.

11g
Don’t jump the gun…tomorrow AM…

Examples
In this May 5, 2007 post on toasters, a list participant posted the following:

We are about to start testing Oracle 9i (single instance) with NetApp NAS (6070) filers. We currently have Oracle running on Solaris 9 with SAN storage attached and VERITAS.

I wouldn’t touch that project with a 10 foot pole. If that database is stable, I wouldn’t switch out the storage architecture-especially on that old of an Oracle release.

I’ve also had a thread going with Chen Shapira who has blogged about Oracle troubles on NAS. Her point throughout that blog entry, and the comments to follow, was that they’ve suffered uptime impact that never really solidly indicts to the storage, but there seems to be a lot of fingers pointed that way. Having read of the types of instability his systems have suffered, I suspected old stuff. It came out in the comment section that they are on RHEL 3.0 64-bit. Now, like I’ve said, RHEL 3.0 is carrying a lot of Oracle databases out there I know, but I wonder how many on NAS? When I say Oracle on NFS, I’m mostly saying Linux Oracle10gR2 releases on Linux 2.6 Kernels—and beyond.

I made a blog entry on this topic back in October of last year as well.

Old Operating System Releases
I take criticism (by true believers mostly) when I point out that running Oracle on a Legacy Unix release that is, say, four years old is not a reason for concern. I wish I could say the same thing about the current state of the art in the Linux world. Dating back to my first high-end Linux project (The Tens–A 10 Node, 10TB, 10,000 User Oracle9i Linux Cluster Project in 2002), I’ve been routinely reminded that Linux stands for:

(L)inux (i)s (n)ot (u)ni(x)

Now, that said, you’ll find much less dissatisfaction with Oracle in general on 2.6 Linux Kernel based systems, but in my opinion, that goes extra for NAS deployments

Oracle Faces Fierce Competition in the SMB Space!

In my post about Oracle in the Small and Medium Business space I pointed out that while Oracle can drive large SMPs to TPC results in the 4 Million TpmC range, their recent push to SMB is really “where it’s at.” I made that blog entry fresh off the heels of a press release about SMB back in May 2007. This blog entry is a follow-up to a June 27 press release where Oracle announced ORACLE 1-CLICK ORDERING and the establishment of an SMB Technology Program Office within Oracle headed by Judson Althoff. I wish Judson and the rest of the folks at Oracle focusing on SMB the absolute best! I’d really like to see Oracle get more traction in the SMB space.

Not My Typical Blog Entry
Yes, it is. I just haven’t gotten to the meat yet. Just days after Oracle’s release about the SMB Program Office, this piece in eWEEK.com popped up. I got a chuckle out of this piece because it is basically Microsoft saying that Oracle is too big and powerful to come down to the SMB scale. In the words of Michael Park, corporate vice president of U.S. SMB at Microsoft:

So they are taking their big concrete and trying to whittle down to a smaller scale.

Hmm, that’s Microsoft. The company that entered the “Enterprise Database Market” by licensing Sybase and running on x86 and IA64 machines with only one Opertaing System to support. I’m a pretty simply guy, so I’ll stick to that analogy. I wonder what is more difficult, whittling something down, or un-whittling something together?

I know I left my PC-XT with DOS 3.0 Around Here Somewhere…
These weren’t just stones being thrown by Microsoft though-they had help from the heavy hitters. The eWEEK.com article quotes Taylor MacDonald of Sage Software as saying:

Recasting enterprise products isn’t the way to win over small businesses

Sage who? The article points out that Sage Software is still hitting the market with such stalwarts as Peachtree for accounting! That’s just fine because I think I can get you a total solution for that. Retrocomputer.com has this to offer:

Peachtree Complete III the business accounting system – comes on 5.25 inch floppies and 11 manuals. All 10 floppies are here and the disks are in very good shape and have been in the box in a closet for a long time, so they have been protected. The dustcover is thin cardboard and has some wear, but the manuals and disks are in real good shape. This is from 1990 and runs on any XT or AT with 640k of Ram. If you really wanted to, you could actually install this on a machine and run your business with it. You would have to add the tax codes manually, but only once. At one time, this was the accounting package of choice. Anyway, it’s a little heavy, but I can ship it media mail for big shipping savings. $12.00 Shipping Weight 9 lbs.

And if you need to write checks, you can get Bank Account Manager to go with it.

Oracle’s Competition
There you have it. You heard it here first. Oracle’s newest competitor is Peachtree. You better get one of these.

Perspective
The eWEEK.com article further quoted Microsoft’s Park:

To be successful in SMB you have to have great products aimed at the customers in the space, and you have to have great partners to drive solutions to these customers,” he said. “When you are talking about mid-market, these guys have the same business requirements as enterprise but they don’t have hundreds of IT staff and thousands of dollars to spend. They have to be much more practical in their decision making.

Thousands of dollars? Yep, better get that PC-XT and a copy of Peachtree.

YAP – Yet Another RAC Poll

I was talking with someone the other day about Oracle Parallel Server (OPS) and Real Application Clusters. I got to thinking about what percentage of RAC deployments have been done by folks that had prior OPS experience. I wondered if the number was really small?

I remember during the summer of 2000 I was working on the Oracle Disk Manager library at Veritas using a pre-release version of Oracle called Oracle 8.2. That was the code that became the Oracle9i product. The clustered database in Oracle 8.2 was still being called Oracle Parallel Server since that was before the name Real Application Clusters hit the street. Oh well, that is just a little walk down memory lane.

YAP
No, not Anjo Kolk’s YAPP, but Yet Another Poll. Yes, if you can bear it, please visit my poll called “RAC Archeology.” And, yes it is yet another poll about RAC, but I’d like to dial in on a couple of aspects of storage as you can tell by the wording of the questions. Maybe the same 150 folks that participated in Jared Still’s poll (as I discussed in Manly Man Part IV) will be kind enough to stop by this one.

Folks, if you use RAC, please take a second to participate in the RAC Archeology poll. Thanks.

Manly Men Only Deploy Oracle with Fibre Channel – Part IV. SANs are Simple, RAC is Difficult!

Several months back I made a blog entry about the RAC poll put together by Jared Still. The poll can be found here. Thus far there have been about 150 participants through the poll—best I can tell. Some of the things I find interesting about the results are:

1. Availability was cited 46% of the time as the motivating factor for deploying RAC whereas scalability counted for 37%.

2. Some 46% of the participants state that RAC has met between 75% and 100% of their expectations.

3. More participants (52%) say they’d stay with RAC given the choice to revert to non-RAC.

4. 52% of the deployments are Linux (42% Red Hat, 6% Oracle Enterprise Linux, 4% SuSE) and 34% are using the major Legacy Unix offerings (Solaris 17%, AIX 11%, HP-UX 6%).

5. 84% of the deployments are using block storage (e.g., FCP, iSCSI) with 42% of all respondents using ASM on block storage. Nearly one quarter of the respondents say they use a CFS. Only 13% use file storage (NAS via NFS).

Surveys often make for tough cipherin’. It sure would be interesting to see which of the 52% that use Linux also state they’d stay with RAC given the choice to revert or re-deploy with a non-RAC setup. Could they all have said they’d stick with RAC? Point 1 above is also interesting because Oracle markets RAC as a prime ingredient for availability as per MAA.

Of course point 5 is very interesting to me.

RAC is Simple…on Simple Storage
We are talking about RAC here, so the 84% from point 5 above get to endure the Storage Buffet. On the other hand, the 24% of the block storage deployments that layered a CFS over the raw partitions didn’t have it as bad, but the rest of them had to piece together the storage aspects of their RAC setup. That is, they had to figure out what to do with the clusterware files, database, Oracle Home and so forth. The problem with CFS is that there is no one CFS that covers all platforms. That war was fought and lost. NFS on the other hand is ubiquitous and works nicely for RAC. On that note, an email came in to my inbox last Friday on this very topic. The author of that email said:

[…] we did quite a lot of tests in the summer last year and figured out that indeed using Oracle/NFS can make a very good combination (many at [COMPANY XYZ] were spectical, I had no opinion as I had never used it, I wanted to see the fact). So I have convinced our management to go the NFS way (performance ok for the workload under question, way simpler management).

[…] The production setup (46 nodes, some very active, some almost idle accessing 6 NAS “heads”) does its job with satisfying performance […]

What do I see in this email? NFS works well enough for this company that they have deployed 46 nodes—but that’s not all. I pay particular attention to the 3 most important words in that quote: “way simpler management.”

Storage Makes or Breaks Many RAC Deployments
I watched intently as Charles Schultz detailed his first forray into RAC. First, I’ll point out that Charles and I had an email side-bar conversation on this topic. He is aware that I intended to weave his RAC experience into a blog entry of my own. So what’s there to blog about? Well, I’ll just come right out and say it—RAC is usually only difficult when difficult storage is used. How can I say that? Let’s consider Charles’ situation.

First, Charles is an Oracle Certified Master who has no small amount of exposure to large Oracle environments. Charles points out on his blog that the environment they were trying to deploy RAC into has some 150 or more databases consuming some 10TB of storage! That means Charles is no slouch. And being the professional he is, Charles points out that he took specialized RAC training to prepare for the task of deploying Oracle in their environment. So why did Charles struggle with setting up a 2-node RAC cluster to the point of making a post to the oracle-l email list for assistance? The answer is simply that the storage wasn’t simple.

It turned out that Charles’ “RAC difficulty” wasn’t even RAC. I assert that the highest majority of what is termed “RAC difficulty” isn’t RAC at all, but the platform or storage instead. By platform I mean Linux RPM dependency and by storage I mean SAN madness. Charles’ difficulties boiled down to Linux FCP multipathing issues. Specifically, multipathing was causing ASM to see multiple entries for each LUN. I made the following comment on Charles’ blog:

Hmm, RHEL4 and two nodes. Things should not be that difficult. I think what you have is more on your hands than RAC. I’ve seen OCFS2, and ASM [in Charles’ blog thread]. That means you also have simple raw disks for OCR/CSS and since this is Dell, is my guess right that you have EMC storage with PowerPath?

Lot’s on your plate. You know me, I’d say NAS…

Ok, I’m sorry for SPAMing your site, Charles, but your situation is precisely what I talk about. You are a Certified Master who has also been to specific RAC training and you are experiencing this much difficulty on a 2 node cluster using a modern Linux distro. Further, most of your problems seem to be storage related. I think that all speaks volumes.

Charles replied with:

[…] I agree whole-heartedly with your statements; my boss made the same observations after we had already sunk over 40 FTE of 2 highly skilled DBAs plunking around with the installation.

If I read that correctly, Charles and a colleague spent a week trying to work this stuff out and Charles is certainly not alone in these types of situations that generally get chalked up as “RAC problems.” There was a lengthy thread on oracle-l about very similar circumstances not that long ago.

Back To The Poll
It has been my experience that most RAC difficulties are storage related—specifically the storage presentation. As point 5 in the poll above shows, some 84% of the respondents had to deal with raw partitions at one time or another. Indeed, even with CFS, you have to get the raw partitions visible and like-named on each node of the cluster before you can create a filesystem. If I hear of one more RAC deployment falling prey to storage difficulties, I’ll…

gross.jpg

Ah, forget that. I use the following mount options on Linux RAC NFS clients:

rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0

and I generally widen up a few kernel tunables when using Oracle over NFS:

net.core.rmem_default = 524288
net.core.wmem_default = 524288
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.ipfrag_high_thresh=524288
net.ipv4.ipfrag_low_thresh=393216
net.ipv4.tcp_rmem=4096 524288 16777216
net.ipv4.tcp_wmem=4096 524288 16777216
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=0
net.ipv4.tcp_window_scaling=1
net.core.optmem_max=524287
net.core.netdev_max_backlog=2500
sunrpc.tcp_slot_table_entries=128
sunrpc.udp_slot_table_entries=128
net.ipv4.tcp_mem=16384 16384 16384

Once the filesystem(s) is/are mounted, I have 100% of my storage requirements for RAC taken care of. Most importantly, however, is to not forget Direct I/O when using NFS, so I set the following init.ora parameter filesystemio_options as follows:

filesystemio_options=setall

Life is an unending series of choices. Choosing between simple or difficult storage connectivity and provisioning is one of them. If you overhear someone lamenting about how difficult “RAC” is, ask them how they like their block storage (FCP, iSCSI).

Working on Oracle Technology Brings on a Powerful Hunger.

Too much lab work for blogging these last few days, but I thought I’d put out a blog entry with a photo. I love Oracle and platform technology, but I also fancy myself a bit of a chef.

The following is a photo of a dish I made some time back. It is a Wildfleisch Ragout a La Berghof with Chanterelle mushrooms and red cabbage. The plate is garnished with pear halves dressed with pflaumenmus and dill weed. The only thing missing from the picture is the Spaten Pils, but rest assured, it was on the table. Yes, it was all quite tasty.

742054354_7efe905f42.jpg

The photo that follows is one of me heading off to “the market” to pick up the Wildfleisch-it was a bit of a hike (elevation 6089′ or 1855m), and I only get over there once a year.

122836313_3fcc98c27c.jpg

Manly Men Only Deploy Oracle with Fibre Channel – Part III. Did I Hear EMC Say NAS?

And here I thought I came up with it all on my own—the connectivity and presentation model value propositions of NAS that is. I was checking out network cache appliances by Gear6 when I found a reference to a post on Chuck Hollis’ blog over at EMC. Chuck was talking about the benefits of NAS in a VMware context, but I’d like to quote some of the bits I liked the most:

• You get to manage a file system, rather than a collection of LUNs

• You get some modicum of access control through the file system mechanisms

• You get access to advanced NAS features, like thin provisioning, snaps, replication, etc.

And, as a special added bonus, you get to use low-cost ethernet to connect your servers to your storage. Very nice, especially if you’re looking at blades or high-density racks.

Of course I like that last bit with an eye on the commodity computing paradigm with Oracle.

Audited TPC-C Proves SQL Server is Better Than Oracle

Some time back I had a blog thread going about Oracle on AMD’s upcoming quad-core “Barcelona” processor. The thread (found here ) took AMD’s published, promotional material that set expectations for TPC-C throughput. At first there were projections from AMD that Barcelona would deliver 70% better throughput than Opteron 2200 systems. Later, as I blogged in this blog entry, AMD suggested we should expect as much as a 40% improvement over the Intel Xeon 5355 “Cloverdale” processor. The whole series of Barcelona-related posts was based on my analysis of Oracle licensing cost vis a vis Barcelona since Oracle licenses per core. In that series I used TPC-C results to make my points and at one juncture I used a TPC-C result from an Intel Xeon 5355 with SQL Server in one of my calculations. Folks got up in arms about that.

It seems people aren’t aware that all three major database players get about the same TPC-C per core on commodity hardware because the TPC-C workload has been optimized to the maximum. It’s a hardware benchmark folks.

This blog entry is to draw attention to the error of my ways.

My analysis of the potential for Oracle performance on Barcelona was covered in detail in that series of posts, but the TPC-C result I used in some of my calculations–thus raising the ire of some readers–was this 2-socket Xeon 5355 result with SQL Server. Certain readers thought it was absurd to collude differing database vendors’ TPC-C numbers into my calculations in spite of how many times I’ve blogged about the fact that TPC-C is a hardware benchmark, not a software benchmark. Oh well, you can’t win them all.

Confession is Good for The Soul
But what does this have to do with the Barcelona thread. Well, the Barcelona thread is just how we got here. So now I admit the error of my ways. During my series on Barcelona, I used this TPC-C result showing that Xeon 5355 can do 30,092 TpmC per core—with SQL Server. I factored that TPC-C number into Oracle cost per core comparison to AMD’s prediction on what Barcelona might do. After all, AMD is predicting they’ll beat out Xeon 5355 by 40% so I needed a Xeon 5355 number. Egad! Using a TPC-C number, without regard for database product, to compare hardware platforms! Shame on me. It turns out I was wrong. How wrong?

A Lower TPC-C Result Proves Oracle is Better
Yes, it’s true. A lower result can be a better result. It turns out that Oracle did eventually publish a TPC-C result on Xeon 5355-based gear. The result came in at 100,924 TpmC or 25,231 TpmC per core—19% lower than the SQL Server result of 30,092 TpmC/core. SQL Server simply must be a better database!

Not even close. Yes, the SQL Server number is 19% better on a per-core basis, but the cost to produce that result was $1.85 per TpmC whereas Oracle’s result was only 42% of that cost or $.78 per TpmC and it’s easy to see why. The SQL Server result was obtained with 64GB main memory whereas Oracle’s result used only 24GB. But that isn’t all. The SQL Server result came from a configuration that included, get this, 552 disk drives compared to the Oracle result that only required 104 drives.

The Tale of the Tape
You heard it here first! SQL Server is a better database than Oracle. All you have to do is throw in 5.3 fold more disk drives and an additional 2.6 fold main memory, shake vigorously and out plops a 19% performance improvement. Not bad for 2.4 fold additional cost!

Summary
SQL Server is not better than Oracle. By the way, AMD Barcelona most likely won’t deliver 40% more throughput measured in TpmC than Intel Xeon 5355.

It Takes Time to Read Long Blog Entries!

I’m sure my last entry was laborious to read. This link will take you to a video record of some poor souls who found themselves waiting in a plane (pushed from the gate) for over 7 hours. All told it took them 10 hours to travel from JFW to DFW. That would be plenty of time to read long blog entries!

I think 7 hours takes the cake. The longest time I spent suffering that nasty trick was 3 hours. I’ve heard that the airline records/reports that as an on time departure. The plane is departed when it leaves the gate.


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 741 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.