Archive for the 'oracle trace' Category

Direct I/O Can Crash Dataguard. A Tricky ORA-01031.


BLOG CORRECTION: When explaining the use of the ORASYM environment variable to a reader, I noticed I typed it and assigned it to the wrong value. A double-decker bug! This entry corrects that bug. Please read the comments for context.

If improperly configured as per your platform, that is.

A Tricky ORA-01031 Error Case
I noticed something very odd today in the Linux x86_64 port of Oracle10gR2. If you set filesystemio_options=directIO and start up the instance (without connecting through the listener), subsequent connects as SYSDBA through the listener will open the orapw file with O_DIRECT. Huh? Yes, a normal, non-shared file opened O_DIRECT. I don’t know when that started, or what other platforms do this, but if the orapw file is located on a filesystem that does not support open(2) with O_DIRECT, you get an ORA-01031. This was no big deal really, other than the mystery, since I store my database on PolyServe with the direct I/O mount option. That is, the filesystemio_options=directIO is redundant for the database files and—as it turns out—creates a problem if the orapw file is on PSFS with regular mount options. I could simply omit the init.ora setting. But I’d like to cover this anyway.


It seems this was hit before by someone back in Oracle9iR2 days as per Metalink 3312751. So, as I was saying, I have a couple of simple solutions for my particular case where I hit this, but I thought there might be someone out there that would enjoy seeing the way I figured out it was the orapw file that was causing me trouble. But first, what did this do to Dataguard in my environment?


Mixing Dataguard, O_DIRECT and the orapw File
It may be that nobody on earth will ever hit this problem. I don’t know. But when it happened to me I had shut down my Dataguard primary, set filesystemio_options= directIO (for some test or another) and then went off and did other things. You know the memory is the first thing to go. “All of the sudden”, my Dataguard setup was not working and I could not remember what had changed (test gear). After starting my Dataguard primary back up (with filesystemio_options=directIO), I saw the following little treat in my alert log:

Thu Feb 22 10:12:46 2007
Errors in file /u01/app/oracle/admin/PROD/bdump/prod_arc0_6977.trc:
ORA-01017: invalid username/password; logon denied
ORA-27041: unable to open file
Linux-x86_64 Error: 22: Invalid argument
Additional information: 2
Thu Feb 22 10:12:46 2007
Error 1017 received logging on to the standby
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191
It may be necessary to define the DB_ALLOWED_LOGON_VERSION
initialization parameter to the value “10”. Check the
manual for information on this initialization parameter.

OK, forget for a moment that Metalink note 259142.1 states that DB_ALLOWED_LOGON_VERSION is replaced by a setting in the sqlnet.ora file yet this Oracle10gR2 Dataguard primary is spitting out mention of that as possible remedy for the problem. I want to go over how I figured out what the problem actually was.


Oracle Binary Wrapper
Since the problem at hand was the inability to connect as SYSDBA through the listener, I would not be able to simply run sqlplus under strace (or truss). The listener is the parent of the dedicated server process in that situation. So, I had to implement a wrapper around the Oracle binary. To do this you simply mv $ORACLE_HOME/bin/oracle to something like $ORACLE_HOME/bin/oracle.bin and then create a script like this:


-bash-3.00$ cat $ORACLE_HOME/bin/oracle
export ORASYM=/u01/app/oracle/product/10.2.0/db_1/bin/oracle.bin
if [ -f /tmp/trace ]
strace -f -o /tmp/trace.$$ /u01/app/oracle/product/10.2.0/db_1/bin/oracle.bin


exec /u01/app/oracle/product/10.2.0/db_1/bin/oracle.bin $*


I didn’t create the /tmp/trace file until after I started the instance. This is how you can have background processes executing without single stepping (e.g., strace, truss) all Oracle processes. Instead you target a specific connection to the database by creating the /tmp/trace file immediately before connecting and then removing the file so that other connections are normal. I then ran sqlplus and generated the ORA-01031 error:


-bash-3.00$ sqlplus /nolog
SQL*Plus: Release – Production on Thu Feb 22 12:44:28 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn sys@PROD/test as sysdba
ORA-01031: insufficient privileges
SQL> exit


The shadow process PID was 20209 so I grep(1)ed for orapw from /tmp/trace.20209 and found the culprit:

NOTE: Right click-> view the image



As the strace output revealed, the shadow process created through the listener uses O_DIRECT when opening the orapw file. Trivial pursuit!

The solution is very platform specific as I was saying, but it is not inconceivable to hit this on other platforms. In my case I simply don’t run with filesystemio_options= directIO because I store all my database files in PolyServe(HP) PSFS with the direct I/O mount option anyway. Works just fine, but I thought I’d elaborate on the point by relocating the orapw file to a mount that supports O_DIRECT and then use a symlink to $ORACLE_HOME/dbs followed by a successful connection as follows:


-bash-3.00$ cp /u01/app/oracle/product/10.2.0/db_1/dbs/orapwPROD \ /u01/app/oracle/oradata/PROD/orapwPROD
-bash-3.00$ rm /u01/app/oracle/product/10.2.0/db_1/dbs/orapwPROD
-bash-3.00$ ln -s /u01/app/oracle/oradata/PROD/orapwPROD \ /u01/app/oracle/product/10.2.0/db_1/dbs/orapwPROD
-bash-3.00$ df /u01/app/oracle/oradata/PROD
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/psd/psd2p2 1225017312 3617672 1221399640 1% /u02
-bash-3.00$ mount | grep u02
/dev/psd/psd2p2 on /u02 type psfs (rw,dboptimize,shared,data=ordered)
-bash-3.00$ sqlplus /nolog

SQL*Plus: Release – Production on Thu Feb 22 12:52:01 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn sys@PROD/test as sysdba
SQL> show parameter filesystem

———————————— ———– ——————————
filesystemio_options string directIO


Forward Look
I’ll be making a blog entry quite soon about how O_DIRECT can save memory for non-database activity like archived redo log compression.




I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 744 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories


All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: