What Performs Better, Direct I/O or Direct I/O? There is No Such Thing As a Stupid Question!

When I was just starting out in IT, one of my first mentors told me that the only stupid question is the one you don’t ask. I really appreciated that mentality because I asked a lot of them—a lot. I’m not blogging about a stupid question, per se, but one that does get asked so often that it seems it shouldn’t have to be answered ever again. The topic is Oracle performance with direct/raw disk I/O versus buffered filesystems.

In a side-channel discussion with a reader, I was asked the following question:

We have created table with same size of data into three different tablespaces respetively. To our surprise, the query was so quick on normal filesystem, unfortunately, query where the table resides on ASM disk groups, ran very slow comparatively to the normal filesystem.

If an AIX filesystem is using both direct I/O and concurrent I/O, there should be no difference in I/O service times between the filesystem and ASM tests. Note to self: The whole direct I/O plus concurrent I/O deserves another blog entry.

My reply to the question was:

If it was dramatically quicker on the AIX filesystem (JFS?) then you are likely getting cached in the OS buffer cache and comparing to raw transfers with ASM. If the dataset is small enough to stay cached in the OS cache, then it will always be quicker than raw/direct IO (e.g., ASM or a filesystem with direct mount options).

The reader followed up with:

But, Kevin,

People say, sometime is direct IO, i.e. bypassing the OS cache is more faster.
If so, what we are doing with raw device is the same, i.e. direct IO?

And the thread continued with my reply:

Complex topic. “Faster” is not the word. Faster means service times and there is no way an I/O direct from the disk into the address space of an Oracle process can have faster service times because it is raw. In fact, the opposite is true. If the block being read is in the OS page cache, then the service time (from Oracle’s perspective) will be very fast. If it isn’t in the cache, or if the I/O is a write then the story is entirely different. The overhead associated with acquiring an OS buffer, performing the DMA into that buffer from disk and then copied into the SGA/PGA is too costly in processor terms than most systems can afford. Not to mention at that time the buffer is in memory twice…which is not very efficient by any means.

In the end it really depends on what your workload is. If for some reason you have a workload that you just can’t seem to get resident in Oracle’s buffering, then the OS page cache can be helpful.

In the past I’ve taken the rigid stance that direct or raw I/O is the only acceptable deployment option only to be proven wrong by peculiar customer workloads. Over time I started to realize that it is insane for someone like me—or anyone out there for that matter—to tell a customer that direct or raw I/O is the only answer to their problems in spite of what their actual performance is. I’m not saying that it is common for workloads to benefit from OS buffering, but if it does for some particular workload then fine. No religion here.

It turned out that the reader increased his SGA and found parity between the filesystem and ASM test cases as is to be expected. I’ll add, however, that only a filesystem gives you the option of both buffered and unbuffered I/O including mixed on a per-tablespace basis if it helps solve a problem.

My old buddy Glenn Fawcett puts a little extra coverage on the topic from a Solaris perspective here.

The fact remains that there is no such thing as a stupid question.

16 Responses to “What Performs Better, Direct I/O or Direct I/O? There is No Such Thing As a Stupid Question!”

Feed for this Entry Trackback Address

1 Alex Gorbachev February 12, 2007 at 3:20 am

Frankly, Oracle buffer cache is usually way smarter than OS cache for database workloads these days. But, indeed, never say never.

Reply
2 kevinclosson February 12, 2007 at 3:51 am

“Frankly, Oracle buffer cache is usually way smarter than OS cache for database workloads these days.”

..really…you were starting to worry me there, Alex…

Reply
3 Andrey Goryunov February 14, 2007 at 4:55 am

Hello Kevin,

sorry if that question is out of the scope of this article.

But, unfortunately, I still can not understand the behaviour
of CIO mode for Oracle 9.2.0.6 64-bit database on AIX 5300-03 64 bit.

We have file systems for data files and redo logs mounted
with cio and filesystemio_options set to async (by default
for Oracle 9.2.0.6 on AIX).

According to the note “Direct I/O or Concurrent I/O on AIX 5L (272520.1)” under such conditions only asynchronous mode
will be used for the database. And to get CIO mode for
file system filesystemio_options should be set to SETALL.
But using lsof +fg command
(there was a topic on oracle-l forum about CIO on AIX)
I see that datafiles were opened with the following flags
“R,W,DSYN,LG,0x80;CX,BR”. From the post on
http://unix.derkeiler.com/Mailing-Lists/AIX-L/2005-01/0013.html a guy stated that 0x80 flag is the definition
of CIO mode. At the same time using truss -o -p
I can not find any open() subroutine that
uses O_CIO flag.

But based on the document provided from IBM / Oracle International Competency Center
Oracle 9i & 10g on IBM AIX5L:
Tips & Considerations
Version 2.1
September, 2005
table on a page 8 – in case of ASYNC parameter and file system mounted with cio there will be “asynchronous, CIO” mode.

I apologise for such long “comment” and might be you give
some clue and/or document, command to be exactly sure
what mode is used for the database.

Thank you very much for your time and look forward to hearing from you soon.

Kind Regards,
Andrey Goryunov

Reply
4 sunny July 13, 2007 at 1:54 pm

Can anyone conclude which one is the BEST practise.

ASM with filesystem or rawdisk.

Regards
Sunny

Reply
5 kevinclosson July 13, 2007 at 3:10 pm

“Can anyone conclude which one is the BEST practise.

ASM with filesystem or rawdisk.”

…ASM with filesystem? Well, the only filesystem that allows you to use files as “disks” for ASM is NFS. I’ve blogged in the past about layering ASM over NFS and how I can see how it would have benefit under rare circumstances, but rare. Did I misunderstand the question?

Reply
6 Amit October 23, 2008 at 11:12 am

Hi Kevin,

I have read that Oracle always open()s datafiles with “Direct I/O flag” (O_DIRECT, O_DSYNC etc based upon the UNIX/Linux flavour. Please correct me if these flag themselves have different interpretations!). So can I say that essentially,
(1) Oracle will always do Direct I/O (considering that all modern filesystem implentations support it)
(2) But if (1) above is true then we ever bother to talk about Direct I/O
(3) I don’t understand what do you mean by “Direct I/O is concurrent writes” (I am an amateur on these topics). Isn’t it the async I/O that provides concurrent writes by allowing a process to issue multiple unblocked writes and be INTERRUPTed when any write completes?

Thanks
Regards
Amit

Regards
Amit

Reply
7 kevinclosson October 28, 2008 at 2:37 pm

Amit,

I can’t speak about all ports (because I haven’t read them all), but the reason there is a filesystemio_options parameter that allows you to control whether the server should open(,,O_DIRECT) is because it does not in fact open with O_DIRECT by default. BTW, there is no O_DIRECT on Solaris. Direct I/O on that platform comes by pushing the direct I/O quality upon an open file descriptor with an ioctl()…but it does the same thing.

Concurrent I/O means the OS will allow more than one I/O in flight on a file concurrently. Believe it or not that has not always been a given. Further, without concurrent I/O you will not get multiple in-flight via async calls nor multiple in-flight sync I/Os from different processes. The point is that there was a day when direct I/O did not guarantee concurrent I/O thus the emergence of stuff like Veritas Quick I/O. Those days are long gone. I know of no platform that does not support concurrent I/O when the file is open for direct I/O.

Reply
8 Val October 28, 2008 at 7:59 pm

Amit,

Oracle always opens its data files, as well as the transaction log, with the O_DSYNC flag (O_SYNC in Linux) for the obvious reason — to ensure data integrity. It employs “synchronous” writes (not to be confused with asynchronous IO) to make sure that the data is actually on the disk, or at least in the battery backed up RAID cache. Without the O_DSYNC/O_SYNC option, after issuing a WRITE call, the control would return to the caller as soon as the data is in the file system cache.

Funnily enough, Oracle opens the TEMP tablespace files with the same O_DSYNC option for which there is no rational explanation.

Oracle does not use the O_DIRECT option though when opening its files.

Val

Reply
9 Val October 28, 2008 at 8:01 pm

Sorry,

I should have said “Oracle does not use the O_DIRECT option though when opening its files” by default.

Val

Reply
10 kevinclosson October 28, 2008 at 11:55 pm

Val,

What is your point? Of course Oracle doesn’t throw in O_DIRECT by default. Didn’t I just write that in my last comment on this thread?

Reply
11 Val October 29, 2008 at 2:01 pm

Kevin,

I was primarily commenting on ODSYNC and mentioned ODIRECT in passing without reading your response first.

My secondary point was that as far as I know there is no reasonable explanation of why Oracle uses O(D)SYNC with TEMP since the option has certain performance implications.

Reply
12 PC October 6, 2010 at 12:47 pm

Kevin, I am breaking my head on this since last 1 month and hence this query: Until today, enabling DirectIO has always benefitted (atleast in all our deployments). But recently, on a V440 server with Solaris 10 update 2 and Oracle 9.2.0.7 (with 2 dual controller 3320s attached – each with 12 disks RAID10), after enabling DIO, io svc times have increased 6-7 times (on 1 LUN i.e. 1 array it has increased from 9ms to 43ms and on other it has increased from 23ms to 142ms).
Another observation is that DBWR is now consuming very high CPU (19%) as compared to what it was pre-DIO (1.8%). I am unable to understand both of these – why is it happening? Can somebody please help me understand why has DBWR suddenly started eating more CPU?
Thanks.

Reply
- 13 kevinclosson October 7, 2010 at 4:58 pm
  
  PC,
  
  How do you measure “svc times have increased 6-7 times?”
  
  Reply
14 PC October 8, 2010 at 5:58 am

Kevin, pre-DIO iostat -nxce 2 would show 9ms in the asvc_t column whereas post-DIO it shows 43ms in the same column. Pre-DIO, total IO i.e. kr/s+kw/s column has decreased and total IOPS i.e. r/s+w/s has increased marginally.

Thanks.

Reply
- 15 kevinclosson October 8, 2010 at 3:07 pm
  
  PC,
  
  I can only take what you’ve told me and formulate a hypothesis. If you enable DIO you are converting some amount of database I/O from page cache logical reads to physical reads. More hits to the spindles means higher service times. When you say “marginally” I have a hard time putting that into perspective. Sometimes these things can be frail. A “small” or “marginal” increase in physical transfers can hurt. You’ve got 12-disks worth of effective I/O bandwidth (if I understand your setup correctly).
  
  If you change service times under DBWR dramatically it is understandable to see increase in processor utilization as he will spend more time polling for I/O completions. I con’t imagine 1.8->19% though. The problem I have is I don’t know for certain you are collecting processor utilization data for the exact same workload. You are not benchmarking ( I presume) so comparing DBWR cpu between different times is not totally scientific. Now, on the other hand, if you were to normalize DBWR processor utilization to an exact number of DBWR writes/s that would help.
  
  Here is how I’d rather treat this topic. I mean no disrespect. If I had a system of V440 vintage attached to 24 raid 10 disks for 9.2.0.7 and it worked at all I would not fiddle with anything.
  
  PS. Not being a Sol expert I am sort of curious whether your file system and the rest of the stack are giving you both direct I/O and concurrent I/O. If this is a Veritas stack you might not be getting both (mount options, Quick I/O, etc,etc). In short, it’s complex.
  
  Reply
16 PC October 8, 2010 at 10:38 am

Also confirmed from within statspack – log write time has increased substantially.

Thanks

Reply

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage