It’s OK To Blame the “Storage.” Oracle Problems When “Storage” is “Just Fine”

“It’s not the storage.” Now that is a famous quote. I hear about it all the time. Some bizarre Oracle problems crop up and the DBA team do all the analysis they possibly can and then start talking to the System Administration and eventually, the Storage Administration folks. Let me tell you a tale…

I had a fully functional 4 node Oracle10gR2 RAC cluster running on RHEL4 x86_64. Everything was just fine, until Wednesday this week. The 30+ hours that ensued were quite interesting—because of how many different teams of people were involved. I’d like to share the strangest set of Oracle failures that turned out to be “storage related.” Why the quotes?

Storage Starts When Oracle Makes an I/O Call
That is my opinion and it always has been. As far as the Oracle server is concerned, everything “downwind” of a physical I/O (e.g., db file sequential read, direct path write, log write parallel write, db file scattered read, CF read, etc) is storage related—generically speaking. Some of that is in the I/O library (user mode), some in the Kernel I/O stack, and more in the I/O adapter firmware, storage array firmware and all the hardware along the way of course. So why did my database spend about an entire day down when there were “no storage errors?” Ever hear that from the Storage Group in your IT shop? Sure you have. The problems my database had were not because blocks of data where failing to hit their rightful spot on some little round, brown spinning thing somewhere. No, this “storage problem” was up-wind from that point.

Diagnosis
Yes, the best way to get to the bottom of a problem is to ask 3 questions:

  1. Has this ever worked
  2. When did it stop working
  3. What changed

If nothing changed in my situation, that database would have most likely happily continued to run. But we were changing a lot of things—I admit. We were testing the I/O stack comparing RHEL4 kernel versions 2.6.9-34 and 2.6.9-42. The following problems are not typical of “I/O related issues”:

Wed Mar 28 19:08:00 2007
Errors in file /u01/app/oracle/admin/PROD/udump/prod1_ora_7140.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [3708], [910], [], [], [], [], [], []
Wed Mar 28 19:08:05 2007
Trace dumping is performing id=[cdmp_20070328190805]
Wed Mar 28 19:08:38 2007
Errors in file /u01/app/oracle/admin/PROD/bdump/prod1_ckpt_7018.trc:
ORA-00600: internal error code, arguments: [2103], [0], [4], [1], [900], [], [], []

Well, OK, CF Enqueue timeout could be I/O related, and ORA-00600 [3708] is not that uncommon, but 2116 is not very common. The point is that I had no I/O errors anywhere. The problem persisted:

Thu Mar 29 10:36:20 2007
GES: Potential blocker (pid=6870) on resource DM-0x0-0x1;
enqueue info in file /u01/app/oracle/admin/PROD/bdump/prod1_lmd0_6735.trc and DIAG trace file
Thu Mar 29 10:42:06 2007
Errors in file /u01/app/oracle/admin/PROD/udump/prod1_ora_6870.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []

$ more /u01/app/oracle/admin/PROD/udump/prod1_ora_6870.trc
*** 2007-03-29 10:13:11.558
Dumping diagnostic information for CKPT:
OS pid = 6776
loadavg : 1.34 0.97 0.54
memory info: free memory = 0.00M
swap info: free = 0.00M alloc = 0.00M total = 0.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 D oracle 6776 1 0 75 0 – 129285 wait_f 09:51 ? 00:00:00 ora_ckpt_PROD1
(no debugging symbols found)
Using host libthread_db library “/lib64/tls/libthread_db.so.1”.
Attaching to program: /proc/6776/exe, process 6776
skgpgpstack: fgets() timed out after 60 seconds
*** 2007-03-29 10:14:45.135
*** 2007-03-29 10:14:55.139
Waited for detached process: CKPT for 310 seconds:

This cluster was quite ill as should be apparent. But is it apparent that “Storage” is the problem?

What is “Storage” Again?
The moral of this story is to remember that I/O starts when Oracle makes a physical I/O call. This problem turned out to be an accidental association of .34 kernel modules getting used by a .42 kernel. Most things worked just fine—but I sure could not boot my Oracle database.

Once the right kernel modules were sorted out, the instances booted just fine and it started chugging along:

SQL> @io.sql

READS READMB WRITES WRITEMB
———- ———- ———- ———-
6648 41.6289063 97259 403.9375

SQL> list
1 select sum(PHYRDS) reads,sum(PHYBLKRD * 4 )/1024 readMB,
2 sum(PHYWRTS) writes,sum(PHYBLKWRT * 4 )/1024 writeMB
3 from dba_data_files,gv$filestat
4* where dba_data_files.file_id = gv$filestat.file#

The Moral
This blog entry won’t be life changing for anyone—except perhaps someone who is googling for ORA-0600 errors that start popping up “out of nowhere” on a database that has otherwise been quite healthy. Or perhaps even, someone that suspects they have a “storage problem”, but the blocks of data seem to be getting to disk just fine as far as the Storage Group can tell.

I hope my viewpoint about where “Storage” begins resonates with readers though.

13 Responses to “It’s OK To Blame the “Storage.” Oracle Problems When “Storage” is “Just Fine””


  1. 1 Nathan Dietsch March 31, 2007 at 7:20 am

    Hi Kevin,

    This may be pedantic, but as a system/storage administrator, I would have classified your problem as a configuration management issue (mixing kernel modules of different versions) or an operating system issue rather than a storage issue. I understand your point regarding “storage” being everything downwind of an Oracle IO, but if Oracle was not able to start properly, then I personally wouldn’t classify it as a storage issue.

    As I said before, I get the point about storage being everything downwind of an Oracle IO. Part of the problem with making the storage teams and the system teams separate is that they split the responsibility for what is essentially part of the same code path. As you say, IO starts when Oracle issues its read or write, although I would take it down to issuing of the actual reads or writes at the system call level. Everything below that should be considered part of a stack (filesystem/raw devices, volume manager, multi-pathing software, switches, LUNs, spindles) and should be the responsibility of one group. That way, when a DBA says IO is slow (with some form of empirical evidence), one group can take responsibility.

    Just my $0.02

    Kind Regards,

    Nathan Dietsch

  2. 2 naqi mirza March 31, 2007 at 3:29 pm

    Although not directly storage related (cluster filesystem configuration related). I remember when I was involved in a single node conversion of a 9207 database to 2 node rac on sun clusters using QFS. The conversion itself was smooth – using a standby database we were able to achieve this in addition to the migration of 1.4TB in under an hour of downtime (this was a 24×7 shop), but I digress….The system was brought up and then made available to users. Slowly the system began to crawl where at times it became unresponsive. The io , memory , etc was checked by sysadmins and initally was cleared as not being the problem – The system always exhibited high io load, even if its doing that now, that should not be the problem. It was then looked at as an oracle problem – the conversion to rac so as to speak. Yes finger pointing began, as this became a serious issue – money was being lost.
    For me the process of elimination began by looking at statspack reports prior to rac conversion and comparing that with statspack reports after.
    This was indeed eye popping, I noticed for example a few events like;
    .
    controlfile sequential read
    controlfile parallel write
    .
    That were showing increases of wait times by about x100, in some cases x1000.
    .
    The storage was looked at again, apparently Sun consultants came across a particular file that was part of the qfs configuration – it had been improperly named. Once this was rectified (not sure if anything else was done) the io load dropped – the wait event times decreased – in some cases better than before the conversion.
    .
    Oh well at least I could finally sleep.

  3. 3 kevinclosson March 31, 2007 at 3:37 pm

    Nathan,

    I agree with the common organizational structure you discuss…and your other points as well. Yes this was an OS configuration boo-boo. I was, as you perceive, trying to share the idea that I/O starts as soon as the Oracle OSDs call an I/O routine. The point as the there were no I/O failures at the storage end and Oracle was not reporting I/O failures, yet the problem was in the “I/O stack.”

  4. 4 Fahd Mirza April 2, 2007 at 5:32 am

    As a DBA, who is about to migrate his database to a 2-node RAC, this post gives me creeps.

  5. 5 Jeff April 2, 2007 at 5:59 pm

    Storage starts where Oracle ends. If I can reproduce the problem using OS commands and/or a “non oracle” program, it’s an OS or Storage problem.

    Yet, you have to prove it every single time. About two years ago I had an issue with a gbic going bad on one of my boxes. A particular path took 10 times longer to write than it’s neighbors. I had to essentially go down to the OS level to prove Oracle wasn’t the problem and then got the storage guys to look at it.

  6. 6 zelihao May 17, 2007 at 11:17 am

    Hello,
    We have Sun Solaris 10 , Oracle 9.2.0.8 and with Sun Cluster 3.2. After we switched to old server to new one we have problems with high I/O. Our System and Oracle admins could not find any errors on OS and Oracle level as well as storage vendor. The iostat output shows high disk activity especially for writing on disk which redologs stored. When we get the truss output I find the following errors but we were getting the same ones with the old server. Does anyone advise if does these errors are related with the problem. Our application vendor claims that there might be a bug on Solaris 10. What do you think?

    /1: kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) Err#22 EINVAL
    /1: semtimedop(7, 0xFFFFFFFF7FFFDF9C, 1, 0xFFFFFFFF7FFFDF88) Err#11 EAGAIN
    /1: semtimedop(7, 0xFFFFFFFF7FFFDF9C, 1, 0xFFFFFFFF7FFFDF88) Err#11 EAGAIN
    /1: semtimedop(7, 0xFFFFFFFF7FFFDF9C, 1, 0xFFFFFFFF7FFFDF88) Err#11 EAGAIN
    /1: kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) Err#22 EINVAL
    /1: semtimedop(7, 0xFFFFFFFF7FFFDF9C, 1, 0xFFFFFFFF7FFFDF88) Err#11 EAGAIN
    /1: semtimedop(7, 0xFFFFFFFF7FFFDF9C, 1, 0xFFFFFFFF7FFFDF88) Err#11 EAGAIN
    /1: kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) Err#22 EINVAL
    /1: semtimedop(7, 0xFFFFFFFF7FFFDF9C, 1, 0xFFFFFFFF7FFFDF88) Err#11 EAGAIN

    Thanks,
    Zeliha

    • 7 Pav May 9, 2010 at 2:39 pm

      Hi Zeliha

      I came across your blog while searching for exactly the same errors you encountered. I have not yet found a viable solution,
      but am experiencing the same issues when trying to do an Oracle import. I noticed this blogs is dated 2007 and I would be
      gratefull if you found a solution and could elaborate on what caused it? I am running solaris 10 10/09 64-bit on Sun Sparc m5000, with NFS mounts to NetApp storage where DB resides.

      Thanks
      Pav

  7. 8 kevinclosson May 17, 2007 at 3:33 pm

    Zeliha,

    Well, this isn’t intended to be a support site, but I’ll point out quickly that having KAIO failing EINVAL on both the old and the new system is a real problem yet you said your admins can’t find any errors. Looks like you are using a filesystem that doesn’t support kernel async I/O, but I can’t tell you any more than that given the information at hand.

  8. 9 Craig July 11, 2008 at 7:16 pm

    I think a real trick for clustered storage is to understand the IO to multipathed devices and graph over time. The Ganlia project is very good for this. Search google for ganlia.

    Trying to gather (and graph my IO multipath aliases (to also send to Ganglia)).

    I am using Oracales latest linux which is really:
    [root@wpe01 Ganglia_Distro]# cat /etc/redhat-release
    Enterprise Linux Enterprise Linux Server release 5.1 (Carthage)

    And the linux native multipathd for multipathing. It is working well.

    asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
    [size=32G][features=0][hwhandler=0]
    \_ round-robin 0 [prio=1][active]
    \_ 0:0:1:32 sdm 8:192 [active][ready]
    \_ round-robin 0 [prio=0][enabled]
    \_ 0:0:0:32 sdb 8:16 [active][ready]

    I am trying to figure HOW I can get iostat –x information for the aliases. An example is asm01 above. The dm that it points to can change (and does) between reboots.

    Tried mapping a raw device to it, called asm01, but that doesn’t seem to be something I can run iostat against either.

    Has anyone figured a way todo this? Really IO stats are needed per alias.

    Thanks!
    Craig

  9. 11 kevinclosson July 11, 2008 at 8:46 pm

    sounds like SAN-madness to me

  10. 12 Storage Guy September 29, 2010 at 8:45 pm

    I came across this while looking for something else. Typical DBA speak, “not me” is all they say. I’m not saying it is oracle but guess what, the world outside of oracle is not just one big non-oracle world. Even in 2007 most enterprises had separate storage and sysadmin teams, even little shops often have different groups for network support and things like development. Speaking of which, the majority of the performance problems I see end up being most easily solved with code/sql fixes.

    As people move towards linux and consolidation of applications they’re finding limitations of linux. I’m looking at an 8 node 10G rac cluster with around 400 devices (4 paths each) and over 30TB of disk and it seems the linux IO subsystem is having a hard time pushing the storage.

    Anyway, I assume anyone who reads this understands it’s not just oracle or non-oracle but there are many areas a bottleneck can occur and if you’re in a typical environment with at least separate DBA, sysadmin and storage administration teams please don’t just go around yelling “not me,” it doesn’t really help solve anything.

  11. 13 maclean November 5, 2010 at 7:15 am

    As same as ora-07445 always blamed the Operation System!


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,961 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: