When I was just starting out in IT, one of my first mentors told me that the only stupid question is the one you don’t ask. I really appreciated that mentality because I asked a lot of them—a lot. I’m not blogging about a stupid question, per se, but one that does get asked so often that it seems it shouldn’t have to be answered ever again. The topic is Oracle performance with direct/raw disk I/O versus buffered filesystems.
In a side-channel discussion with a reader, I was asked the following question:
We have created table with same size of data into three different tablespaces respetively. To our surprise, the query was so quick on normal filesystem, unfortunately, query where the table resides on ASM disk groups, ran very slow comparatively to the normal filesystem.
If an AIX filesystem is using both direct I/O and concurrent I/O, there should be no difference in I/O service times between the filesystem and ASM tests. Note to self: The whole direct I/O plus concurrent I/O deserves another blog entry.
My reply to the question was:
If it was dramatically quicker on the AIX filesystem (JFS?) then you are likely getting cached in the OS buffer cache and comparing to raw transfers with ASM. If the dataset is small enough to stay cached in the OS cache, then it will always be quicker than raw/direct IO (e.g., ASM or a filesystem with direct mount options).
The reader followed up with:
People say, sometime is direct IO, i.e. bypassing the OS cache is more faster.
If so, what we are doing with raw device is the same, i.e. direct IO?
And the thread continued with my reply:
Complex topic. “Faster” is not the word. Faster means service times and there is no way an I/O direct from the disk into the address space of an Oracle process can have faster service times because it is raw. In fact, the opposite is true. If the block being read is in the OS page cache, then the service time (from Oracle’s perspective) will be very fast. If it isn’t in the cache, or if the I/O is a write then the story is entirely different. The overhead associated with acquiring an OS buffer, performing the DMA into that buffer from disk and then copied into the SGA/PGA is too costly in processor terms than most systems can afford. Not to mention at that time the buffer is in memory twice…which is not very efficient by any means.
In the end it really depends on what your workload is. If for some reason you have a workload that you just can’t seem to get resident in Oracle’s buffering, then the OS page cache can be helpful.
In the past I’ve taken the rigid stance that direct or raw I/O is the only acceptable deployment option only to be proven wrong by peculiar customer workloads. Over time I started to realize that it is insane for someone like me—or anyone out there for that matter—to tell a customer that direct or raw I/O is the only answer to their problems in spite of what their actual performance is. I’m not saying that it is common for workloads to benefit from OS buffering, but if it does for some particular workload then fine. No religion here.
It turned out that the reader increased his SGA and found parity between the filesystem and ASM test cases as is to be expected. I’ll add, however, that only a filesystem gives you the option of both buffered and unbuffered I/O including mixed on a per-tablespace basis if it helps solve a problem.
My old buddy Glenn Fawcett puts a little extra coverage on the topic from a Solaris perspective here.
The fact remains that there is no such thing as a stupid question.