BLOG UPDATE: This post has developed an interesting comment thread worth noting.
I currently have a nearly chaotic set of differing configurations to deal with that run the gamut of x86_64 servers attached to 2/4 Gb FCP SANs and others to NAS via GbE. So sometimes I miss the mark. I just tried to fire up one of my databases on a DL585 running RHEL4 attached to the Enterprise File Services Clustered Gateway NAS device. In the midst of the chaos I mistakenly mounted the filesystem containing the Oracle Database 10g test database using the wrong mount options, so:
$ tail alert*log
ALTER DATABASE MOUNT
Mon Jun 18 15:04:28 2007
WARNING:NFS file system /mnt mounted with incorrect options
WARNING:Expected NFS mount options: rsize>=32768,wsize>=32768,hard
Mon Jun 18 15:04:28 2007
ORA-00202: control file: ‘/u01/app/oracle/product/10.2.0/db_1/rw/DATA/cntlbench_1’
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 3
Mon Jun 18 15:04:28 2007
I clearly did not mount the filesystems correctly. After remounting with the following options, everything was OK:
rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0
But then these mount options are port-specific and as they say in true Clintonian form, “It’s the Port Stupid.”
It’s All About the Port
The only complaint I have about Oracle over NFS is at the port level. I intend to start blogging about the idiosyncrasies between, say, certain Legacy Unix and Linux ports of Oracle with regard to NAS mount options. I think RMAN has the most issues and, again, these are always port level. For instance, certain ports inspect the mount options of the actual mounted filesystem and others will look at the mnttab. And then, in some cases, certain ports do it one way for the instance and then another for functionality such as RMAN. Sometimes when the database or tools don’t like the mount options they return an error message spelling out what is missing and other times just a generic complaint that the mount options are incorrect—and that too varies by port and version of Oracle as well. Recently I found that the HP-UX port of Oracle10g needs the llock mount option which is apparently not documented very well.
In all cases, issues regarding mount options are the responsibility of the Oracle port team for the release. That is where this functionality is built. The layers above the I/O layer (Operating System Dependent code) have no idea whether there is DAS, SAN or NAS down stream. That abstraction is one of the main reasons Oracle is the best database out there. That porting heritage goes back to Oracle version 4. Anyway, I digress…
Complicated.
Yes, these mount option topics are more complicated than they should be, but this situation is not permanent. As we get closer to July 11, I’ll be blogging more about what that means. Regardless, I stand fast in my view that provisioning storage for Oracle via NFS is simpler, simpler, simpler than SANs and that goes for both RAC or non-RAC databases. Just mount the filesystem and go…
In the meantime, if you have a particular port of Oracle10g that isn’t getting along with your NAS, remember our motto, “It’s the port[…]” so log an SR and Oracle will get you on your way.
Is it as much the Oracle port or the different handling of NAS by the various *n*x flavours?
I get the feeling the Oracle porting folks would have tremendous difficulty in accomodating some form of “middle ground” that could cope with all the current methods of handling/mounting NFS/NAS.
This storage technology is somewhat “green” in terms of everyone singing to the same tune like we have for example with SANs. I think another year or so things will stabilize a lot more and we’ll start seeing one method as the “standard”?
One unrelated question:
Dunno if you bounced into it before with SANs but they have a thing called “Mirrorview” for async and synch block change shipping. Have you ever seen or do you know of anything similar in the NAS/NFS sphere?
Noons,
As always you make a great point. It astounds me that even though NFS is Sun’s stuff and is a standard, every NFS client out there has a potpourri of mount options. But, alas, this is why Oracle has porting teams. Those teams are staffed by experts for the particular platform being ported to. For instance, Sequent put 30 full time engineers on the base and apps. These were Sequent specialists who also specialized in Oracle at the port level. So, if there is a port of Oracle that flubs up the handling of NAS, the cross hairs are squarely on the heads of the porting folks and as I was saying, those are not all Oracle folks.
I think the main thing I’m trying to point out is that if some particular port of Oracle is having trouble with NAS, it isn’t right to say “Oracle doesn’t work with NAS.” Now, it would likely sound like hair-splitting to a production shop having a problem with, say, the AIX 5L port of 10gR2 on Bluarc NAS if I were to make the distinction that the problem they are having is likely localized to a little itsy-bitsy module called skgfifi() and not Oracle in general. However, what I detect in the community at large is two things:
1. Ignorance about the performance of Oracle over NFS
2. A babdy-out with the bath water treatment should a problem be hit
One such problem I’m thinking of is anyone who tried RH 2.1 NFS with Oracle9i to NetApp. Uh, not only would I throw the baby out with the bathwater, I’d also find the baby’s mom and put her in the raft too because that recipe was one for disaster. Oracle10gR2 on NAS is another picture entirely..yes, with the exception of some port-peculiarities.
It seems like in 10g times, Oracle has been more committed to get ASM ported on every platform with higher quality than NFS support.
Let’s hope that version 11 times will make NFS supported equally well on all platforms.
Alex,
That will happen and, uh, that isn’t actually shooting for the stars either …
My question pertains to setting up Oracle configurations on high availability NAS configurations. Assuming a basic 2 node NAS cluster, when a fail-over of the NAS occurs what is the time out window of Oracle on NFS and CIFS to make the transition? Can you set this time out variable and how? I saw in some of the setting a to=300, who this imply a 5 minute allowance?
Thanks so much for you help,
BlueWho
BlueWho,
Bluarc? Gosh, welcome to my blog. CRS can be tuned to accomodate long outages of storage. This is not a big deal. Now, having said that, the NAS I deal with mostly (although I do NetApp on accasion) is the HP EFS Clustered Gateway which transparently fails over NFS exports in less than 30 seconds (generally about 20).
Oracle Metalink Note 294430.1 is the bible for CRS tolerance for I/O timeouts. Since Bluarc is a former OSCP member, I’m surprised your own docs don’t cover this. They do, don’t they?
Kevin,
Thank you for your response. The OSCP program was ended by Oracle in January 2007. Essentially NAS is considered a mature technology and no longer requires the Oracle stamp of approval for interop with NFS. We have a similar window on fail-over timing, but what I was seeking is what’s best for Oracle? I’m in the process of updating our docs and I’d like to detail how customers can adjust and accommodate various time out periods or at least set it appropriately and know the defaults. We don’t have Metalinks access which appears to be a customer based portal. Anyway can you get me a copy of the Note you referenced?
Thanks, BlueWho
BlueWho,
Yes, I know about OSCP. I would have thought BlueArc would have partner status with Oracle and therefore access to Metalink and docs. Not to be rude, but I can’t really cut and paste out of MetaLink. Sorry.