The blogging platform I use, WordPress, allows me to see from whence readers are being referred to my blog. I’ve gotten a couple of hits from this post on Jeff Browning’s blog. I’m not sure if that particular post is a critique or a caricature of my paper about Oracle Database 11g Direct NFS. Nonetheless, shortly after I first saw the post I privately informed Jeff of four inaccuracies contained in the post. They were:
1) Jeff’s assertion that Direct NFS will work on any NAS device is incorrect. I know of one NAS device in particular that DNFS will not function on whatsoever. Well, that’s not exactly true. It will function to the point of creating a database, but will corrupt the control file in doing so. I’m not going to state the particular manufacturer, because it is still unclear if there is a design incompatibility problem or simply a bug in that prevents that particular NAS device from functioning with DNFS. Nonetheless, Jeff still hasn’t corrected the following quote, so I am here and now. Jeff said:
As such dNFS will work in exactly the same manner, with identical performance benefits, on any NAS device from any vendor.
That is incorrect.
2) Jeff uses incorrect terminology to explain one of the benefits of Direct NFS. That term is “context swapping”, and I quote:
Here is the theory behind dNFS. I/O on a database server occurs in a combination of user space and kernel space. Context swaps between the two spaces are expensive in terms of CPU cost. If you can move a part of that activity from kernel space into user space, you can save CPU cost due to reduced context swapping.
I’m not playing word games. Transitions from user to kernel mode are routinely, and inaccurately, referred to as context switching and while that is one of my pet peeves I suspect from the context of Jeff’s post that context switching is the term he actually meant. The problem is that system calls do not result in a context switch.
A context switch is the stopping of one process and switching to another for the sake of process scheduling. If a system call does not block, there is no context switch. An example would be getpid(2). On the other hand, a system call that does block (e.g., synchronous physical I/O) will count as a context switch because the process parked itself on a blocking system call. The scheduler will switch to a runable process according to such criteria as mode, priority, processor affinity and so on. The sum total of context switch categories is voluntary and involuntary. The former is when a process calls a blocking system call (or any other voluntary yield such as sched_yield(2)), the latter is when a user mode process executes to the end of its time slice at such a time as there is a runable process for the kernel to switch to.
And yes, I say all this about context switching versus context swapping with total disregard for this patent. In Unix/Linux, the term is context switch and knowing what that really means makes the output of monitoring commands such as vmstat(1) a little more meaningful I should think.
3) He stated 11g x86_64 was not available yet at the time of that blog post it was indeed available for download.
4) He misspelled my name in the blog post.
What is This Blog Post Really About?
Now, all of that aside, none of that is really what this blog post is about. This blog post is about a comment on Jeff’s blog. The reader posted a comment as follows:
One would assume that an OS vendor would be better at making NFS and multipathing work rather than a database vendor. Oracle has a disadvantage in that it has to test its client on all the various flavors of OS out there.
One would indeed assume just that-correctly in fact. Oracle Database 11g doesn’t make NFS better. Direct NFS is a replacement for NFS. While NFS is a great storage presentation model for Oracle (as I’ve said so many times), it does much more than Oracle requires. So, DNFS strips all that overhead out. Direct NFS is essentially an RPC shot straight at the NAS device-no file-related system calls (e.g., open(2), pread(2), io_submit(2), etc). And, oh, while I’m at it, I’ll point out that open(2), read(2) and io_submit(2) are system calls that do not result in a context switch (unless the read suffers a page cache miss or is O_DIRECT or raw(8)). But that is not what this blog post is about.
No matter if you call it Context Switching or Context Swapping, sending a NFS (TCP or UDP) packet from user mode will change to kernel mode as well as writing to a block device. (However if you have a user mode NFS Client, you will get another swap/switch, which would be avoided by dNFS).
However I am not sure if zero-copy is easyly made portable with network sockets. It can be implemented by the kernel vor NFS, however.
I find dNFS very cool, especially on Windows. Would be good if ORacle provides a simple “map into filesystem space” service as well, so you dont need the additional SMB Connection.
BTW: what about direct iSCSI (for ASM?)
Greetings
Bernd
OK, let me get this right, Bernd. You read this whole post then comment with words to the effect of “No matter if you call it […].” It’s not about what I call it. It’s about what it is called. The difference is important. It is especially sloppy suggest that dNFS will avoid “another swap/switch[…]” Swapping and switching are two entirely different things.
Also, user mode processes don’t send packets. That happens way down in the transport layer. DNFS sends messages.
We won’t be using this thread to “dispute” these facts though. Please don’t bother posting another comment suggesting that I am playing word games. I’ll moderate it out.
Kevin,
This blog entry motivated me to dive deeper into O_DIRECT (on Linux). During my research, I stumbled across an email posting of Linus Torvalds to the Linux Kernel Mailing List in which he quite harshly expresses his dislike of the O_DIRECT operation: http://lkml.org/lkml/2007/1/11/129
Given your OS background, what’s your opinion on the O_DIRECT method? Is is really that flawed as Linus says?
Hi Chris,
See the following post for more background:
https://kevinclosson.wordpress.com/2007/02/23/oracle-direct-io-brought-to-you-by-deranged-monkeys/
I think O_DIRECT is just fine, but would prefer also having mount option-based available either way.
Hi Kevin,
Thank you for the pointer… excellent content.
We’re currently running RAC on standard edition, which requires us to use ASM. We had to create empty files on top of mount points using NFS protocol and then allocate those files to ASM. Will DNFS help us in any way here?
Daniel,
Remember that ASM does not do I/O. The I/O calls will be the same with ASM or regular files-based tablespaces stored on NFS mounts. So, yes, DNFS will help. There are religious wars I don’t care to get into about the concept of layering ASM on top of files (ala the NFS model), but I won’t go into that. Here is a fact: you can only stripe across filers (forget OnTap GX for a moment) if you use ASM. I wouldn’t expect ASM performance on NFS to be worse than regular files for tablespaces. In fact I blogged the point several times–there should never be a performance difference between ASM and raw disk given the same storage (paths, controllers, slices of disk, etc).