Analyzing Asynchronous I/O Support with Oracle10g

This is not a post about why someone would want to deploy Oracle with a mix of files with varying support for asynchronous I/O. It is just a peek at how Oracle10g handles it. This blog post is a continuation of yesterday’s topic about analyzing DBWR I/O activity with strace(1).

I’ve said many times before that one of the things Oracle does not get sufficient credit for is the fact that the database adapts so well so such a tremendous variety of platforms. Moreover, each platform can be complex. Historically, with Linux for instance, some file systems support asynchronous I/O and others do not. With JFS on AIX, there are mount options to consider as is the case with Veritas on all platforms. These technologies offer deployment options. That is a good thing.

What happens when the initialization parameter filesystemio_options=asynch yet there are a mix of files that do and do not support asynchronous I/O? Does Oracle just crash? Does it offline files? Does it pollute the alert log with messages every time it tries an asynchronous I/O to a file that doesn’t support it? The answer is that it does not of that. It simply deals with it. It doesn’t throw the baby out with the bath water either. Much older versions of Oracle would probably have just marked the whole instance to the least common demoninator (synchronous).

Not Just A Linux Topic
I think the information in this blog post should be considered useful on all platforms. Sure, you can’t use strace(1) on a system that only offers truss(1), but you can do the same general analysis with either. The system calls will be different too. Whereas Oracle has to use the Linux-only libaio routines called io_submit(2)/io_getevents(2), all other ports[1] use POSIX asynchronous I/O (e.g., lio_listio,aio_write,etc) or other proprietary asynchronous I/O library routines.

Oracle Takes Charge
As I was saying, if you have some mix of technology where some files in the database do not support asynchronous I/O, yet you’ve configured the instance to use it, Oracle simply deals with the issue. There are no warnings. It is important to understand this topic in case you run into it though.

Mixing Synchronous with Asynchronous I/O
In the following screen shot I was viewing strace(1) output of a shadow process doing a tablespace creation. The instance was configured to use asynchronous I/O, yet the CREATE TABLESPACE command I issued was to create a file in a filesystem that does not support asynchronous I/O[2]. Performing this testing on a platform where I can mix libaio asynchronous I/O and libc synchronous I/O with the same instance makes it easy to depict what Oracle is doing. At the first arrow in the screen shot, the OMF datafile is created with open(2) using the O_CREAT flag. The file descriptor returned is 13. The second arrow points to the first asynchronous I/O issued against the datafile. The io_submit(2) call failed with EINVAL indicating to Oracle that the operation is invalid for this file descriptor.

NOTE: Firefox users report that you need to right click->view the image to see these screen shots

d2_1

Now, Oracle could have raised an error and failed the CREATE TABLESPACE statement. It did not. Instead, the shadow process simply proceeded to create the datafile with synchronous I/O. The following screen shot shows the same io_submit(2) call failing at the first arrow, but nothing more than the invocation of some shared libraries (the mmap() calls) occurred between that failure and the first synchronous write using pwrite(2)—on the same file descriptor. The file didn’t need to be reopened or any such thing. Oracle simply fires off a synchronous write.

dbw2_2

What Does This Have To Do With DBWR?
Once the tablespace was created, I set out to create tables in it with CTAS statements. To see what DBWR behaved like with this mix of asynchronous I/O support, I once again monitored DBWR with strace(1) sending the trace info to a file called mon.out. The following screen shot shows that the first attempts to flush SGA buffers to the file also failed with EINVAL. All was not lost however, the screen shot also shows that DBWR continued just fine using synchronous writes to this particular file. Note, DBWR does not have to perform this “discovery” on every flushing operation. Once the file is deemed unsuitable for asynchronous I/O, all subsequent I/O will be synchronous. Oracle just continues to work, without alarming the DBA.

dbw2_3

How Would a Single DBWR Process Handle This?

So the next question is what does it mean to have a single database writer charged with the task of flushing buffers from the SGA to a mix of files where not all files support asynchronous I/O? It is not good. Now, as I said, Oracle could have just reverted the entire instance to 100% synchronous I/O, but that would not be in the best interest of performance. On the other hand, if Oracle is doing what I’m about to show you, it would be nice if it made one small alert log entry—but it doesn’t. That is why I’m blogging this (actually it is also because I’m a fan of Oracle at the platform level).

In the following screen shot, I use egrep(1) to pull occurrences from the DBWR strace(1) output file where io_submit(2) and pwrite(2) are intermixed. Again, this is a single DBWR flushing buffers from the SGA to files of varying asynchronous I/O support:

dbw2_4

In this particular case, the very first io_submit(2) call flushed 4 buffers, 2 each to file descriptors 19 and 20. Before calling io_getevents(2) to process the completion of those asynchronous I/Os, DBWR proceeds to issue a series of synchronous writes to file descriptor 24 (another of the non-asynchronous I/O files in this database). By the way, notice that most of those writes to file descriptor 24 were multi-block DBWR writes. The problem with having one DBWR process intermixing synchronous with asynchronous I/O is that any buffers in the write batch bound for a synchronous I/O file will cause a delay in the instantiation of any buffer flushing to asynchronous I/O files. When DBWR walks an LRU to build a batch, it is not considering the lower-level OS support of asynchronous I/O on the file that a particular buffer will be written to. It just builds a batch based on buffer state and age. In short, synchronous I/O requests will cause a delay in the instantiation of subsequent asynchronous requests.

OK, so this is a two edged sword. Oracle handles this complexity nicely—much credit due. However, it is not entirely inconceivable that some of you out there have databases configured with a mix of asynchronous I/O support for your files. From platform to platform this can vary so much. Please be aware that this is not just a file system topic. It can also be a device driver issue. It is entirely possible to have a file system that generically supports asynchronous I/O created on a device where the device driver does not. This scenario will also result in EINVAL on asynchronous I/O calls. Here too, Oracle is likely doing the right thing—dealing with it.

What To Do?
Just use raw partitions. No, of course not. We should be glad that Oracle deals with such complexity so well. If you configure multiple database writers (not slaves) on a system that has a mix of asynchronous I/O support, you’ll likely never know the difference. But the topic is at least on your mind.

[1] Except Windows of course

[2] The cluster  file system in PolyServe’s Database Utility for Oracle uses a mount option to enable both direct I/O and OS asynchronous I/O. However, when using PolyServe’s Oracle Disk Manager (ODM) Library Oracle can perform asynchronous I/O on all mount types. Mount options for direct I/O is quite common as this is a requirement on UFS and OCFS2 as well.

6 Responses to “Analyzing Asynchronous I/O Support with Oracle10g”


  1. 1 Fairlie Rego December 6, 2006 at 11:06 pm

    Excellent post, Kevin.
    In screenshot 4 you mention “DBWR proceeds to issue a series of synchronous writes to file descriptor 13 (our non-asynchronous I/O file). By the way, notice that most of those writes to file descriptor 13 were multi-block DBWR writes”
    Shouldn’t the file descriptor be 24 instead? or have I not had enuff coffee yet?

    -Fairlie

  2. 2 kevinclosson December 6, 2006 at 11:22 pm

    Arrgh, good catch, Farlie. Yes, it turns out that 24 is one of the other files on the non-async I/O mount I need to change the text, but here is the deal:

    -bash-3.00$ pwd
    /proc/23126/fd
    -bash-3.00$ ps -Fp 23126
    UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
    oracle 23126 1 0 74388 86024 6 Dec05 ? 00:00:03 ora_dbw0_TEST
    -bash-3.00$ ls -l | egrep ’18|19|23|24′
    lrwx—— 1 oracle dba 64 Dec 5 12:32 18 -> /u04/oradata/TEST/datafile/o1_mf_system_2q66o463_.dbf
    lrwx—— 1 oracle dba 64 Dec 5 12:32 19 -> /u04/oradata/TEST/datafile/o1_mf_undotbs1_2q66o497_.dbf
    lrwx—— 1 oracle dba 64 Dec 5 12:32 23 -> /u04/oradata/TEST/datafile/o1_mf_test_2q9hw8w2_.dbf
    lrwx—— 1 oracle dba 64 Dec 5 12:32 24 -> /u01/app/oracle/TEST/TEST/datafile/o1_mf_test1_2qch4mlc_.dbf
    -bash-3.00$ mount | grep u01
    /dev/psd/psd7p2 on /u01 type psfs (rw,logtotty,shared,data=ordered)
    -bash-3.00$ mount | grep u04
    /dev/psd/psd8p1 on /u04 type psfs (rw,logtotty,dboptimize,shared,data=ordered)

    …/u01 is not mounted with the PolyServe dboptimize mount option. I plopped these files there just to create this situation for the sake of blogging it… /u04 is mounted dboptimize (direct I/O, async I/O) and has 65 spingles under it…

  3. 3 John July 21, 2009 at 4:29 pm

    Kevin
    Isn’t the bigger issue here however that if there is a mismatch on filesystemio_options and the underlying FS setup that

    1) Yes oracle handles mismatch/error internally
    2) It converts to synchronous IO – trying to mimic asynch by spawning multiple pwrite calls

    We lose the capabiltiy of directio/asynch io to our dbfiles and as such the performance with it

    So great oracle handles and doesn’t terminate.
    Bad that unless internally analyzed your now taking a hit with you synchronous calls and are totally oblivious.

    So in this case synce this parameter is not system modifiable without reboot – would it not be better for oracle to make that initial test on startup – and abort the instance if there was a mismatch ?

    Or are there cases where we do want synch ??

    ta

    • 4 kevinclosson July 23, 2009 at 6:46 pm

      Your points are the very reason I blogged it. Folks need to be aware that it will silently fall back to synchronous runtime behavior. I don’t think there is ever any reason anyone would want an instance without async I/O support…at least not from a performance perspeective. I would rather it spit out a warning on the issue, but what I want and what reality produces are two totally different things. So, best to check …

  4. 5 John July 27, 2009 at 1:32 pm

    Thanks – figured that was what the response would be. 🙂

    On a side note – I am having a hard time finding current doco/standards for directIO and ‘setall’ for Oracle 10 on Solaris.

    If you look in the Performance and Administrators reference for 10g there are sections on io options for AIX/HP/Linux etc in detail. But nothing on Solaris ??

    I was specifically thinking that in 10g similar to otehr ‘ports’ that we no longer need to explicity mount the filesystems ‘convosync=direct’ (eg AIX no longer requires explicit mount with ‘cio’ ? ) and oracle will handle the corresponding calls as long as we set filesystemio_options to setall …….

    But again can’t find for Solaris (can for other ports).

    Do you have a reference or can provide detail on this ?

    thanks Kevin

    • 6 kevinclosson July 27, 2009 at 7:23 pm

      Hi John,

      Sorry, I don’t pay a lot of attention these day sot any platform other than Linux and I very rarely even use file systems at all…we’ll, I am doing a bunch of non-Exadata testing at the moment and happen to be using Ext3 with direct/async I/O, but that is a lark.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,972 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: