Archive for the 'Real Application Clusters Installation' Category

Will Oracle Ever Release Sun Servers Based On Westmere EP and Nehalem EX Processors? Yes.

Oracle has announced the release of several new x86 servers based on the Westmere EP and Nehalem EX processors. This is a really short blog entry, because the website is so loaded with information I haven’t much to add:

Oracle Real Application Clusters Does Not Scale?
I’d like to single out an interesting benchmark. This Oracle Sun x4470 SAP SD benchmark result highlights Real Application Clusters scalability and a result in 6U worth of rack space that beats the recent HP Proliant DL980 G7 result with Microsoft by 15%. The DL980 G7 is an 8U server.

Yes, It Runs IBM DB2, Too.
Lest we forget, people do deploy IBM DB2 on Solaris. To read more follow the link:

Updated Options for Deploying IBM DB2 on Solaris x86-based Servers

The Most Horribly Botched Oracle Database 10g Real Application Clusters Install Attempt.

Now don’t get me wrong. What I’m blogging about is not really an Oracle Database 10g Real Application Clusters (RAC) problem. All of the problems I will mention in this entry were clearly related to a botched configuration at the OS level. Since Oracle10g will be around for a long time, I suspect that some day someone else might run into this sort of a problem and I aim to make their lives easier. That said, there could be some goodies in here for all the regular readers of this blog.

The Scenario
I took the brand new 4-node RAC cluster I have in the lab and aimed to see how much the oracle-validated-100-4el4x86_64.rpm package assists in setting up the system in preparation for a RAC install. I was excited since this was the first 2-socket Xeon “Cloverdale” 5355-based cluster I’ve had the chance to test. These processors are murderously fast so I was chomping at the bits to give them a whirl.

The systems were loaded with RHEL4 U4 x86_64. I executed the oracle-validated-100-4el4x86_64.rpm package and here is what it returned:

# rpm -ivh oracle-validated-1.0.0-4.el4.x86_64.rpm
warning: oracle-validated-1.0.0-4.el4.x86_64.rpm: V3 DSA signature: NOKEY, key ID b38a8516
error: Failed dependencies:
        /usr/lib/libc.so is needed by oracle-validated-1.0.0-4.el4.x86_64
        control-center is needed by oracle-validated-1.0.0-4.el4.x86_64
        fontconfig >= 2.2.3-7.0.1 is needed by oracle-validated-1.0.0-4.el4.x86_64
        gnome-libs is needed by oracle-validated-1.0.0-4.el4.x86_64
        libdb.so.3()(64bit) is needed by oracle-validated-1.0.0-4.el4.x86_64
        libstdc++.so.5()(64bit) is needed by oracle-validated-1.0.0-4.el4.x86_64
        xscreensaver is needed by oracle-validated-1.0.0-4.el4.x86_64
    Suggested resolutions:
        compat-db-4.1.25-9.x86_64.rpm
        compat-libstdc++-33-3.2.3-47.3.x86_64.rpm
        control-center-2.8.0-12.rhel4.5.x86_64.rpm
        gnome-libs-1.4.1.2.90-44.1.x86_64.rpm
        xscreensaver-4.18-5.rhel4.11.x86_64.rpm

So I chased down what was recommended and installed those RPMs as well:

 # cat > /tmp/list
        compat-db-4.1.25-9.x86_64.rpm
        compat-libstdc++-33-3.2.3-47.3.x86_64.rpm
        control-center-2.8.0-12.rhel4.5.x86_64.rpm
        gnome-libs-1.4.1.2.90-44.1.x86_64.rpm
        xscreensaver-4.18-5.rhel4.11.x86_64.rpm

# rpm -ivh --nodeps `cat /tmp/list | xargs echo`
warning: compat-db-4.1.25-9.x86_64.rpm: V3 DSA signature: NOKEY, key ID db42a60e
Preparing...                ########################################### [100%]
   1:xscreensaver           ########################################### [ 20%]
   2:compat-db              ########################################### [ 40%]
   3:compat-libstdc++-33    ########################################### [ 60%]
   4:control-center         ########################################### [ 80%]
   5:gnome-libs             ########################################### [100%]

At this point I think I should have been ready to install Oracle Clusterware (CRS). Well, the install failed miserably during the linking phase and-shame on me-I didn’t save any of the logs or screen shots. No matter, I can still make a good blog entry out of this-so long as you are willing to believe that the linking phase of the CRS install failed. I decided to clean up the botched installation and use my normal installation process.

The approach I generally take is to look at a comparable Oracle Validated Configuration and pull the list of RPMs specified. And that is precisely what I did:

$ cat > /tmp/list
binutils-2.15.92.0.2-21.x86_64.rpm
        compat-db-4.1.25-9.x86_64.rpm
        compat-libstdc++-33-3.2.3-47.3.x86_64.rpm
        control-center-2.8.0-12.rhel4.5.x86_64.rpm
        gcc-3.4.6-3.x86_64.rpm
        gcc-c++-3.4.6-3.x86_64.rpm
        glibc-2.3.4-2.25.i686.rpm
        glibc-2.3.4-2.25.x86_64.rpm
        glibc-devel-2.3.4-2.25.i386.rpm
        glibc-devel-2.3.4-2.25.x86_64.rpm
        glibc-common-2.3.4-2.25.x86_64.rpm
        glibc-headers-2.3.4-2.25.x86_64.rpm
        glibc-kernheaders-2.4-9.1.98.EL.x86_64.rpm
        gnome-libs-1.4.1.2.90-44.1.x86_64.rpm
        libgcc-3.4.6-3.x86_64.rpm
        libstdc++-3.4.6-3.x86_64.rpm
        libstdc++-devel-3.4.6-3.x86_64.rpm
        libaio-0.3.105-2.x86_64.rpm
        make-3.80-6.EL4.x86_64.rpm
        pdksh-5.2.14-30.3.x86_64.rpm
        sysstat-5.0.5-11.rhel4.x86_64.rpm
        xorg-x11-deprecated-libs-6.8.2-1.EL.13.36.x86_64.rpm
        xorg-x11-deprecated-libs-6.8.2-1.EL.13.36.i386.rpm
        xscreensaver-4.18-5.rhel4.11.x86_64.rpm

# rpm -ivh --nodeps `cat /tmp/list | xargs echo`

[…output deleted…]

# rpm -ivh oracle-validated-1.0.0-4.el4.x86_64.rpm
warning: oracle-validated-1.0.0-4.el4.x86_64.rpm: V3 DSA signature: NOKEY, key ID b38a8516
error: Failed dependencies:
        /usr/lib/libc.so is needed by oracle-validated-1.0.0-4.el4.x86_64
        fontconfig >= 2.2.3-7.0.1 is needed by oracle-validated-1.0.0-4.el4.x86_64

# rpm -ivh fontconfig-2.2.3-7.x86_64.rpm
warning: fontconfig-2.2.3-7.x86_64.rpm: V3 DSA signature: NOKEY, key ID db42a60e
Preparing...                ########################################### [100%]
        package fontconfig-2.2.3-7 is already installed

So, once again I should have been ready to go. That was not the case. I got the following stream of error output when I ran vipca:

# sh ./vipca
PRKH-1010 : Unable to communicate with CRS services.
  [PRKH-1000 : Unable to load the SRVM HAS shared library
  [PRKN-1008 : Unable to load the shared library "srvmhas10"
  or a dependent library, from

LD_LIBRARY_PATH="/opt/oracle/crs/jdk/jre/lib/i386/client:/opt/oracle/crs/jdk/jre/lib/i386:/opt/oracle/crs/jdk/jre/..
/lib/i386:/opt/oracle/crs/lib32:/opt/oracle/crs/srvm/lib32:/opt/oracle/crs/lib:/opt/oracle/crs/srvm/lib:"
  [java.lang.UnsatisfiedLinkError: /opt/oracle/crs/lib32/libsrvmhas10.so: libclntsh.so.10.1: 

cannot open shared object file: No such file or directory]]]
PRKH-1010 : Unable to communicate with CRS services.
  [PRKH-1000 : Unable to load the SRVM HAS shared library
  [PRKN-1008 : Unable to load the shared library "srvmhas10"
  or a dependent library, from

LD_LIBRARY_PATH="/opt/oracle/crs/jdk/jre/lib/i386/client:/opt/oracle/crs/jdk/jre/lib/i386:/opt/oracle/crs/jdk/jre/..
/lib/i386:/opt/oracle/crs/lib32:/opt/oracle/crs/srvm/lib32:/opt/oracle/crs/lib:/opt/oracle/crs/srvm/lib:"
  [java.lang.UnsatisfiedLinkError: /opt/oracle/crs/lib32/libsrvmhas10.so: libclntsh.so.10.1: 

cannot open shared object file: No such file or directory]]]
PRKH-1010 : Unable to communicate with CRS services.
  [PRKH-1000 : Unable to load the SRVM HAS shared library
  [PRKN-1008 : Unable to load the shared library "srvmhas10"
  or a dependent library, from

LD_LIBRARY_PATH="/opt/oracle/crs/jdk/jre/lib/i386/client:/opt/oracle/crs/jdk/jre/lib/i386:/opt/oracle/crs/jdk/jre/..
/lib/i386:/opt/oracle/crs/lib32:/opt/oracle/crs/srvm/lib32:/opt/oracle/crs/lib:/opt/oracle/crs/srvm/lib:"
  [java.lang.UnsatisfiedLinkError: /opt/oracle/crs/lib32/libsrvmhas10.so: libclntsh.so.10.1:

Egad! What I’m about to tell you should prove beyond the shadow of a doubt that I am not a DBA! I don’t need VIPs or gsd or srvctl for that matter. My test harnesses do not use any of that-at least not the test harness I was intending to use for this specific battery of testing. So, I ignored the vipca problem and figured I’d just install the database and get to work. I’ve never before seen what I’m about to show you.

During the installation of the database, OUI detected all nodes of the cluster so I thought I might be able to sneak this one through. however, during the installation of the database, OUI presented a dialogue for database updates. Now, this is seriously odd since this was a freshly built cluster. There were no other databases installed, but that isn’t the most peculiar aspect of this dialogue. Check it out:

botched.jpg

That is strangely beyond strange! I’m not sure if you can tell what that is but the cells in the dialogue are populated with error output (PRKH-* and LD_LIBRARY_PATH, etc). Wow, that was crazy. What I’m about to tell you should be proof positive that not only am I not a DBA, I do things that are only done by someone who has no idea what a DBA is! I needed the database installed and I didn’t have any databases to upgrade, so I thought I’d ignore this database upgrade mess and push on. Bad idea.

Of course the install was a total failure. I spent some time figuring out what was going on and then it dawned on me. I forgot to install the 32bit glibc-devel RPM. After doing so I cleaned up the mess, walked through the install again and viola-no problems.

The Moral of the Story
Don’t forget the required 32bit library when installing 10g on x86_64 Linux and pay particular attention to the fact that neither the Validated Configuration list of RPMS, nor the oracle-validated-100-4el4x86_64.rpm package had any mention of this library.

I hope it helps someone, someday. Nonetheless, that was a weird screen shot wasn’t it?

Manly Men Only Deploy Oracle with Fibre Channel – Part IV. SANs are Simple, RAC is Difficult!

Several months back I made a blog entry about the RAC poll put together by Jared Still. The poll can be found here. Thus far there have been about 150 participants through the poll—best I can tell. Some of the things I find interesting about the results are:

1. Availability was cited 46% of the time as the motivating factor for deploying RAC whereas scalability counted for 37%.

2. Some 46% of the participants state that RAC has met between 75% and 100% of their expectations.

3. More participants (52%) say they’d stay with RAC given the choice to revert to non-RAC.

4. 52% of the deployments are Linux (42% Red Hat, 6% Oracle Enterprise Linux, 4% SuSE) and 34% are using the major Legacy Unix offerings (Solaris 17%, AIX 11%, HP-UX 6%).

5. 84% of the deployments are using block storage (e.g., FCP, iSCSI) with 42% of all respondents using ASM on block storage. Nearly one quarter of the respondents say they use a CFS. Only 13% use file storage (NAS via NFS).

Surveys often make for tough cipherin’. It sure would be interesting to see which of the 52% that use Linux also state they’d stay with RAC given the choice to revert or re-deploy with a non-RAC setup. Could they all have said they’d stick with RAC? Point 1 above is also interesting because Oracle markets RAC as a prime ingredient for availability as per MAA.

Of course point 5 is very interesting to me.

RAC is Simple…on Simple Storage
We are talking about RAC here, so the 84% from point 5 above get to endure the Storage Buffet. On the other hand, the 24% of the block storage deployments that layered a CFS over the raw partitions didn’t have it as bad, but the rest of them had to piece together the storage aspects of their RAC setup. That is, they had to figure out what to do with the clusterware files, database, Oracle Home and so forth. The problem with CFS is that there is no one CFS that covers all platforms. That war was fought and lost. NFS on the other hand is ubiquitous and works nicely for RAC. On that note, an email came in to my inbox last Friday on this very topic. The author of that email said:

[…] we did quite a lot of tests in the summer last year and figured out that indeed using Oracle/NFS can make a very good combination (many at [COMPANY XYZ] were spectical, I had no opinion as I had never used it, I wanted to see the fact). So I have convinced our management to go the NFS way (performance ok for the workload under question, way simpler management).

[…] The production setup (46 nodes, some very active, some almost idle accessing 6 NAS “heads”) does its job with satisfying performance […]

What do I see in this email? NFS works well enough for this company that they have deployed 46 nodes—but that’s not all. I pay particular attention to the 3 most important words in that quote: “way simpler management.”

Storage Makes or Breaks Many RAC Deployments
I watched intently as Charles Schultz detailed his first forray into RAC. First, I’ll point out that Charles and I had an email side-bar conversation on this topic. He is aware that I intended to weave his RAC experience into a blog entry of my own. So what’s there to blog about? Well, I’ll just come right out and say it—RAC is usually only difficult when difficult storage is used. How can I say that? Let’s consider Charles’ situation.

First, Charles is an Oracle Certified Master who has no small amount of exposure to large Oracle environments. Charles points out on his blog that the environment they were trying to deploy RAC into has some 150 or more databases consuming some 10TB of storage! That means Charles is no slouch. And being the professional he is, Charles points out that he took specialized RAC training to prepare for the task of deploying Oracle in their environment. So why did Charles struggle with setting up a 2-node RAC cluster to the point of making a post to the oracle-l email list for assistance? The answer is simply that the storage wasn’t simple.

It turned out that Charles’ “RAC difficulty” wasn’t even RAC. I assert that the highest majority of what is termed “RAC difficulty” isn’t RAC at all, but the platform or storage instead. By platform I mean Linux RPM dependency and by storage I mean SAN madness. Charles’ difficulties boiled down to Linux FCP multipathing issues. Specifically, multipathing was causing ASM to see multiple entries for each LUN. I made the following comment on Charles’ blog:

Hmm, RHEL4 and two nodes. Things should not be that difficult. I think what you have is more on your hands than RAC. I’ve seen OCFS2, and ASM [in Charles’ blog thread]. That means you also have simple raw disks for OCR/CSS and since this is Dell, is my guess right that you have EMC storage with PowerPath?

Lot’s on your plate. You know me, I’d say NAS…

Ok, I’m sorry for SPAMing your site, Charles, but your situation is precisely what I talk about. You are a Certified Master who has also been to specific RAC training and you are experiencing this much difficulty on a 2 node cluster using a modern Linux distro. Further, most of your problems seem to be storage related. I think that all speaks volumes.

Charles replied with:

[…] I agree whole-heartedly with your statements; my boss made the same observations after we had already sunk over 40 FTE of 2 highly skilled DBAs plunking around with the installation.

If I read that correctly, Charles and a colleague spent a week trying to work this stuff out and Charles is certainly not alone in these types of situations that generally get chalked up as “RAC problems.” There was a lengthy thread on oracle-l about very similar circumstances not that long ago.

Back To The Poll
It has been my experience that most RAC difficulties are storage related—specifically the storage presentation. As point 5 in the poll above shows, some 84% of the respondents had to deal with raw partitions at one time or another. Indeed, even with CFS, you have to get the raw partitions visible and like-named on each node of the cluster before you can create a filesystem. If I hear of one more RAC deployment falling prey to storage difficulties, I’ll…

gross.jpg

Ah, forget that. I use the following mount options on Linux RAC NFS clients:

rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0

and I generally widen up a few kernel tunables when using Oracle over NFS:

net.core.rmem_default = 524288
net.core.wmem_default = 524288
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.ipfrag_high_thresh=524288
net.ipv4.ipfrag_low_thresh=393216
net.ipv4.tcp_rmem=4096 524288 16777216
net.ipv4.tcp_wmem=4096 524288 16777216
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=0
net.ipv4.tcp_window_scaling=1
net.core.optmem_max=524287
net.core.netdev_max_backlog=2500
sunrpc.tcp_slot_table_entries=128
sunrpc.udp_slot_table_entries=128
net.ipv4.tcp_mem=16384 16384 16384

Once the filesystem(s) is/are mounted, I have 100% of my storage requirements for RAC taken care of. Most importantly, however, is to not forget Direct I/O when using NFS, so I set the following init.ora parameter filesystemio_options as follows:

filesystemio_options=setall

Life is an unending series of choices. Choosing between simple or difficult storage connectivity and provisioning is one of them. If you overhear someone lamenting about how difficult “RAC” is, ask them how they like their block storage (FCP, iSCSI).

Manly Men Only Deploy Oracle with Fibre Channel – Part II. What’s So Simple and Inexpensive About NFS for Oracle?

The things I routinely hear from DBAs leads me to believe that they often don’t understand storage. Likewise, the things I hear from Storage Administrators convinces me they don’t always know what DBAs and system administrators have to do with those chunks of disk they dole out for Oracle. This is a long blog entry aimed at closing that gap with a particular slant to Oracle over NFS. Hey, it is my blog after all.

I also want to clear up some confusion about points I made in a recent blog entry. The confusion was rampant as my email box will attest so I clearly need to fix this.

I was catching up on some blog reading the other day when I ran across this post on Nuno Souto’s blog dated March 18, 2006. The blog entry was about how Noon’s datacenter had just taken on some new SAN gear. The gist of the blog entry is that they did a pretty major migration from one set of SAN gear to the other with very limited impact—largely due to apparent 6-Ps style forethought. Noons speaks highly of the SAN technology they have.

Anyone that participates in the oracle-l email list knows Noons and his important contributions to the list. In short, he knows his stuff—really well. So why am I blogging about this? It dawned on me that my recent post about Manly Men Only Deploy Oracle with Fibre Channel Storage jumped over a lot of ground work. I assure you all that neither Noons nor the tens of thousands of Oracle shops using Oracle on FCP are Manly Men as I depicted in my blog entry. I’m not trying to suggest that people are fools for using Fibre Channel SANs. Indeed, after waiting patiently from about 1997 to about 2001 for the stuff to actually work warrants at least some commitment to the technology. OK, ok, I’m being snarky again. But wait, I do have a point to make.

Deploying Oracle on NAS is Simpler and Cheaper, Isn’t It?
In my blog entry about “Manly Man”, I stated matter-of-factly that it is less expensive to deploy Oracle on NAS using NFS than on SANs. Guess what, I’m right, it is. But I didn’t sufficiently qualify what I was talking about. I produced that blog entry presuming readers would have the collective information of my prior blog posts about Oracle over NFS in mind. That was a weak presumption. No, when someone like Noons says his life is easier with SAN he means it. Bear in mind his post was comparing SAN to DAS, but no matter. Yes, Fibre Channel SAN was a life saver for too many sites to count in the late 90s. For instance, sites that bought into the “server consolidation” play of the late 1990s. In those days, people turned off their little mid-range Unix servers with DAS and crammed the workloads into a large SMP. The problem was that eventually the large SMP couldn’t physically attach any more DAS. It turns out that Fibre was needed first and foremost to get large numbers of disks connected to the huge SMPs of the era. That is an entirely different problem to solve than getting large numbers of servers connected to storage.

Put Your Feet in the Concrete
Most people presume that Oracle over NFS must be exponentially slower than Fibre Channel SAN. They presume this because at face value the wires are faster (e.g., 4Gb FCP versus 1Gb Ethernet). True, 4Gb is more bandwidth than 1Gb, but you can have more than one NFS path to storage and the latencies are a wash. I wanted to provide some numbers so I thought I’d use Network Appliance’s data that suggested a particular test of 8-way Solaris servers running Oracle OLTP over NFS comes within 21% of what is possible on a SAN. Using someone else’s results was mistake number 1. Folks, 21% degredation for NFS compared to SAN is not a number cast in stone. I just wanted to show that it is not a day and night difference and I wanted to use Network Appliance numbers for validity. I would not be happy with 21% either and that is good, because the numbers I typically see are not even in that range to start with. I see more like 10% and that is with 10g. 11g closes the gap nicely.

I’ll be producing data for those results soon enough, but let’s get back to the point. 21% of 8 CPUs worth of Oracle licenses would put quite a cost-savings burden on NAS in order to yield a net gain. That is, unless you accept the fact that we are comparing Oracle on NAS versus Oracle on SAN in which case the Oracle licensing gets cancelled out. And, again, let’s not hang every thought on that 21% of 8 CPUs performance difference because it is by no means a constant.

Snarky Email
After my Manly Man post, a fellow member of the OakTable Network emailed me the viewpoint of their very well-studied Storage Administrator. He calculated the cost of SAN connectivity for a very, very small SAN (using inexpensive 8-port FC switches) and factored in Oracle Enterprise Edition licensing to produce a cost per throughput using the data from that Network Appliance paper—the one with the 21% deficit. That is, he used the numbers at hand (21% degradation), Oracle Enterprise Edition licensing cost and his definition of a SAN (low connectivity requirements) and did the math correctly. Given those inputs, the case for NAS was pretty weak. To my discredit, I lashed back with the following:

…of course he is right that Oracle licensing is the lion’s share of the cost. Resting on those laurels might enable him to end up the last living SAN admin.

Folks, I know that 21% of 8 is 1.7 and that 1.7 Enterprise Edition Licenses can buy a lot of dual-port FCP HBAs and even a midrange Fibre Channel switch, but that is not the point I failed to make. The point I failed to make was that I’m not talking about solving the supposed difficulties of provisioning storage to those one or two remaining refrigerator-sized Legacy Unix boxes you might have. There is no there, there. It is not difficult at all to run a few 4Gb FCP wires to separate 8 or 16 port FC switches and then back to the storage array. Even Manly Man can do that. That is not a problem that needs solved because that is neither difficult nor is it expensive (at least the SAN aspect isn’t). As the adage goes, a picture speaks a thousand words. The following is a visual of a problem that doesn’t need to be solved—a simple SAN connected to a single server. Ironically, what it depicts is potentially millions of dollars worth of server and storage connected with merely thousands of dollars worth of Fibre Channel connectivity gear. In case the photo isn’t vivid enough, I’ll point out that on the left is a huge SMP (e.g., HP Superdome) and on the right is an EMC DMX. In the middle is a redundant set of 8-port switches—cheap, and simple. Even providing private and public Ethernet connectivity in such a deployment is a breeze by the way.

simplesan.jpg

I Ain’t Never Doing That Grid Thing.
Simply put, if the only Oracle you have deployed—now and forever—sits in a couple of refrigerator-sized legacy SMP boxes, I’m going to sound like a loon on this topic. I’m talking about provisioning storage to commodity servers—grid computing. Grid may not be where you are today, but it is in fact where you will be someday. Consider the fact that most datacenters are taking their huge machines and chopping them up into little machines with hardware/software virtualization anyway so we might as well just get to the punch and deploy commodity servers. When we do, we feel the pain of Fibre Channel SAN connectivity and storage provisioning. Because connecting large numbers of servers to storage was not exactly a design center for Fibre Channel SAN technology. Just the opposite is true; SANs were originally meant to connect a few servers to a huge number of disks—more than was possible with DAS.

Commodity Computing (Grid) == Huge SAN
Large numbers of servers connected to a SAN makes the SAN very complex. Not necessarily more disks, but the presentation and connectivity aspects get very difficult to deal with.

If you are unlucky enough to be up to your knees in the storage provisioning, connectivity and cost nightmare associated with even a moderate number of commodity servers in a SAN environment you know what I’m talking about. In these types of environments, people are deploying and managing director-class Fibre Channel switches where each port can cost up to $5,000 and they are deploying more than one switch for redundancy sake. That is, each commodity server needs a 2 port FC HBA and 2 paths to two different switches. Between the HBAs and the FC switch ports, the cost is as much as $10,000-$12,000 just to connect a “pizza box” to the SAN. That’s the connectivity story and the provisioning story is not much prettier.

Once the cabling is done, the Storage Administrator has to zone the switches and provision storage (e.g., create LUNs, LUN masking, etc). For RAC, that would be a minimum of 3 masked LUNs for each database. Then the System Administrator has to make sure Oracle has access to those LUNs. That is a lot of management overhead. NAS on the other hand uses very inexpensive NICs and switches. Ah, now there is an interesting point. Using NAS means each server only has one type of network connectivity instead of two (e.g., FC and Ethernet). Storage provisioning is also simpler—the database server administrator simply mounts the NFS filesystem and the DBA can go straight to work with RAC or non-RAC Oracle databases. How simple. And yes, the Oracle licensing cost is a constant, so in this paradigm, the only way to recuperate cost is in the storage connectivity side. The savings are worth consideration, and the simplicity is very difficult to argue.

It’s time for another picture. The picture below depicts a small commodity server deployment—38 servers that need storage.

complexsan.jpg

Let’s consider the total connectivity problem starting with the constant—Ethernet. Yes, every one of these 38 servers needs both Ethernet and Fibre Channel connectivity. For simplicity, let’s say only 8 of these servers are using RAC. The 8 that host RAC will need a minimum of 4 Gigabit Ethernet NICs/cables—2 for the public interfaces and two for a bonded, private network for Oracle Cache Fusion (GCS, GES) for a total of 32. The remaining 30 could conceivably do fine with 2 public networks each for a subtotal of 60. All told, we have 92 Ethernet paths to deal with before we look at storage networking.

On the storage side, we’ll need redundant paths for all 38 server to multiple switches so we start with 38 dual-port HBAs and 76 front-side Fibre Channel switch ports. Each switch will need a minimum of 2 paths back to storage, but honestly, would anyone try to feed 38 modern commodity servers with 2 4Gb paths worth of storage bandwidth? Likely not. On the other hand, it is unlikely the 30 smaller servers will each need dedicated 4Gb I/O bandwidth to storage so we’ll play zone trickery on the switch and group sets of 2 from the 30 yielding a requirement for 15 back-side I/O paths from each switch for a subtotal of 30 back-side paths. Following in suit, the remaining 8 RAC servers will require 4 back-side paths from each of the two switches for a subtotal of 8 back-side paths. To sum it up, we have 76 front-side and 38 back-side paths for a total of 114 storage paths. Yes, I know this can be a lot simpler by limiting the number of switch-to-storage paths. That’s a game called Which Servers Should We Starve for I/O and it isn’t fun to play. These arrangements are never attempted with small switches. That’s why the picture depicts large, expensive director-class switches.

Here’s our mess. We have 92 Ethernet paths and 114 storage paths. How would NAS make this simpler? Well, Ethernet is the constant here so we simply add more inexpensive Ethernet infrastructure. We still need redundant switches and I/O paths, but Ethernet is cheap and simple and we are down to a single network topology instead of two. Just add some simple NICs and simple Ethernet switches and go. And oh, by the way, the two network-topologies-model (e.g., GbE_+ FCP) generally means two different “owners” since the SAN would generally be owned by the Storage Group and the Ethernet would be owned by the Networking Group. With NAS, all connectivity from the Ethernet switches forward can be owned by the Networking Group freeing the Storage Group to focus on storage—as opposed to storage networking.

And, yes, Oracle11g has features that make the connectivity requirement on the Ethernet side simpler but 10g environments can benefit from this architecture too.

Not a Sales Pitch
Thus far, this blog entry has been the what. This would make a pretty hollow blog entry if I didn’t at least mention the how. The odds are very slim that your datacenter would be able to do a 100% NAS storage deployment. So Network Appliance handles this by offering multiple protocol storage from their Filers. The devil shall not remain with the details.

Total NAS? Nope. Multi-Protocol Storage.
I’ll be brief. You are going to need both FCP and NAS, I know that. If you have SQL Server (ugh) you certainly aren’t going to connect those servers to NAS. There are other reasons FCP isn’t going to go away soon enough. I accept the fact that both protocols are required in real life. So let’s take a look a multi-protocol storage and how it fits into this thread.

Network Appliance Multi-Protocol Support
Network Appliance is an NFS device. If you want to use it for FCP or iSCSI SAN, large files in the Filer’s filesystem (WAFL) are served with either FCP or iSCSI protocol and connectivity. Fine. It works. I don’t like it that much, but it works. In this paradigm, you’d choose to run the simplest connectivity type you deem fit. You could run some FCP to a few huge Legacy SMPs, FCP to some servers running SQL Server (ugh), and most importantly Ethernet for NFS to whatever you choose—including Oracle on commodity servers. Multi-protocol storage in this fashion means total vendor lock-in, but it would allow you to choose between the protocols and it works.

SAN Gateway Multi-Protocol Support
Don’t get rid of your SAN until there is something reasonable to replace it with. How does that statement fit this thread? Well, as I point out in this paper, SAN-NAS gateway devices are worth consideration. Products in this space are the HP Enterprise File Services Clustered Gateway and EMC Celerra. With these devices you leverage your existing SAN by connecting the “NAS Heads” to the SAN using very low-end, simple Fibre Channel SAN connectivity (e.g., small switches, few cables). From there, you can provision NFS mounts to untold numbers of NFS clients—a few, dozens or hundreds. The mental picture here should be a very small amount of the complex, expensive connectivity (Fibre Channel) and a very large amount of the inexpensive, simple connectivity (Ethernet). What a pleasant mental picture that is. So what’s the multi-protocol angle? Well, since there is a down-wind SAN behind the NAS gateway, you can still directly cable your remaining Legacy Unix boxes with FCP. You get native FCP storage (unlike NetApp with the blocks-from-file approach) for the systems that need it and NAS for the ones that don’t.

I’m a Oracle DBA, What’s in It for Me?
Excellent question and the answer is simply simplicity! I’m not just talking simplicity, I’m talking simple, simple, simple. I’m not just talking about simplicity in the database tier either. As I’ve pointed out upteen times, NFS will support you from top to bottom—not just the database tier, but all your unstructured data such as software installations as well. Steve Chan chimes in on the simplicity of shared software installs in the E-Biz world too. After the NFS filesystem is mounted, you can do everything from ORACLE_HOME, APPL_TOP, clusterware files (e.g., the OCR and CSS disks), databases, RMAN, imp/exp, SQL*Loader/External Tables, ETL, compiled PL/SQL, UTL_FILE, BFILE, trace/logging, scripts, and on and on. Without NFS, what sort of mix-match of raw, filesystem, raw+ASM combination would be required? A complex one—and the really ironic part is you’d probably still end up with some NFS mounts in addition to all that raw disk and non-CFS filesystem space as well!

Whew. That was a long blog entry.

Real Application Clusters on x86_64 RHEL? Don’t Forget Those 32bit Libraries!

The required list of RPMs for installing Oracle10g Release 2 on RHEL4 x86_64 is listed on the web in many places. What I don’t find, however, is ample coverage of the errors one sees if things are not completely in order. I think this blog entry might help out someone that makes it well into the Oracle clusterware install and hits this problem. I think there is enough of this error stack to anchor future googlers.

As one of my co-workers, Sergei, discovered yesterday, if you don’t have glibc-devel-2.3.4-2.25.i386.rpm (yes the 32bit glibc-devel) installed, you will see the following error stack when you run the root.sh in $ORA_CRS_HOME:

CSS is active on these nodes.
qar14s21
qar14s22
qar14s23
qar14s24

CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
PRKH-1010 : Unable to communicate with CRS services.
[PRKH-1000 : Unable to load the SRVM HAS shared library
[PRKN-1008 : Unable to load the shared library “srvmhas10″ or a dependent library, from LD_LIBRARY_PATH=”/u01/app/oracle/product/10.2.0/crs_1/jdk/jre/lib/i386/client:\
/u01/app/oracle/product/10.2.0/crs_1/jdk/jre/lib/i386:\
/u01/app/oracle/product/10.2.0/crs_1/jdk/jre/../lib/i386:\
/u01/app/oracle/product/10.2.0/crs_1/lib32:\
/u01/app/oracle/product/10.2.0/crs_1/srvm/lib32:\
/u01/app/oracle/product/10.2.0/crs_1/lib:\
/u01/app/oracle/product/10.2.0/crs_1/srvm/lib:\
/u01/app/oracle/product/10.2.0/crs_1/lib”
[java.lang.UnsatisfiedLinkError: /u01/app/oracle/product/10.2.0/crs_1/lib32/libsrvmhas10.so: libclntsh.so.10.1: cannot open shared object file: No such file or directory]]]

Not the most intuitive error stack given the origin of the problem especially since it happens so late in the game. I hope this helps somebody one day. That is a part of what blogging is supposed to be about I think.

Thanks to Sergei as well.


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 743 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: