Beware, lot’s of tongue in cheek in this one. If you’re not the least bit interested in storage protocols, saving money or a helpful formula for safely configuring I/O bandwidth for Oracle, don’t read this.
I was reading Pawel Barut’s Log Buffer #48 when the following phrase caught my attention:
For many of Oracle DBAs it might be weird idea: Kevin Closson is proposing to install Oracle over NFS. He states that it’s cheaper, simpler and will be even better with upcoming Oracle 11g.
Yes, I have links to several of my blog entries about Oracle over NFS on my CFS, NFS, ASM page, but that is not what I want to blog about. I’m blogging specifically about Powet’s assertion that “it might be a weird idea”—referring to using NAS via NFS for Oracle database deployments.
I think the most common misconception people have is regarding the performance of such a configuration. True, NFS has a lot of overhead that would surely tax the Oracle server way too much—that is if Oracle didn’t take steps to alleviate the overhead. The primary overhead is in NFS client-side caching. Forget about it. Direct I/O and asynchronous I/O are available to the Oracle server for NFS files with just about every NFS client out there.
Manly Men™ Choose Fibre Channel
I hear it all the time when I’m out in the field or on the phone with prospects. First I see the wheels turning while math is being done in the head. Then, one of those cartoon thought bubbles pops up with the following:
Hold it, that Closson guy must not be a Manly Man™. Did he just say NFS over Gigabit Ethernet? Ugh, I am Manly Man and I must have 4Gb Fibre Channel or my Oracle database will surely starve for I/O!
Yep, I’ve been caught! Gasp, 4Gb has more bandwidth than 1Gb. I have never recommended running a single path to storage though.
Bonding Network Interfaces
Yes, it can be tricky to work out 802.3ad Link Aggregation, but it is more than possible to have double or triple bonded paths to the storage. And yes, scalability of bonded NICs varies, but there is a simplicity and cost savings (e.g., no FCP HBAs or expensive FC switches) with NFS that cannot be overlooked. And, come in closely and don’t tell a soul, you won’t have to think about bonding NICs for Oracle over NFS forever, wink, wink, nudge, nudge.
But, alas, Manly Man doesn’t need simplicity! Ok, ok, I’m just funning around.
No More Wild Guesses
A very safe rule of thumb to keep your Oracle database servers from starving for I/O is:
100Mb I/O per GHz CPU
So, for example, if you wanted to make sure an HP c-Class server blade with 2-socket 2.66 GHz “Cloverdale” Xeon processors had sufficient I/O for Oracle, the math would look like this:
12 * 2.66 * 4 * 2 == 255 MB/s
Since the Xeon 5355 is a quad-core processor and the 480c c-Class blade supports two of them there are 21.28 GHz for the formula. And, 100 Mb is about 12 MB. So if Manly Man configures, say, two 4Gb FC paths (for redundancy) to the same c-Class blade he is allocating about 1000 MB/s bandwidth. Simply put, that is expensive overkill. Why? Well, for starters, the blade would be 100% saturated at the bus level if it did anything with 1000 MB/s so it certainly couldn’t satisfy Oracle performing physical I/O and actually touching the blocks (e.g., filtering, sorting, grouping, etc). But what if Manly Man configured the two 4Gb FCP paths for failover with only 1 path active path (approximately 500 MB/s bandwidth)? That is still overkill.
Now don’t get me wrong. I am well aware that 2 “Cloverdale” Xeons running Parallel Query can scoop up 500MB/s from disk without saturating the server. It turns out that simple light weight scans (e.g., select count(*) ) are about the only Oracle functionality that breaks the rule of 100Mb I/O per GHz CPU. I’ve even proven that countless times such as in this dual processor, single core Opteron 2.8 Ghz proof point. In that test I had IBM LS20 blades configured with dual processor, single-core Opterons clocked at 2.8 GHz. So if I plug that into the formula I’d use 5.6 for the GHz figure which supposedly yields 67 MB/s as the throughput at which those processors should have been saturated. However, on page 16 of this paper I show those two little single-core Opterons scanning disk at the rate of approximately 380MB/s. How is that? The formula must be wrong!
No, it’s not wrong. When Oracle is doing a light weight scan it is doing very, very little with the blocks of data being returned from disk. On the other hand, if you read further in that paper, you’ll see on page 17 that a measly 21MB/s of data loading saturated both processors on a single node-due to the amount of data manipulation required by SQL*Loader. OLTP goes further. Generally, when Oracle is doing OLTP, as few as 3,000 IOps from each processor core will result in total saturation. There is a lot of CPU intensive stuff wrapped around those 3,000 IOps. Yes, it varies, but look at your OLTP workload and take note of the processor utilization when/if the cores are performing on the order of 3,000 IOps each. Yes, I know, most real-world Oracle databases don’t even do 3,000 IOps for an entire server which takes us right back to the point: 100Mb I/O per GHz CPU is a good, safe reference point.
What Does the 800 Pound Gorilla Have To Say?
When it comes to NFS, Network Appliance is the 800lb gorilla. They have worked very hard to get to where they are. See, Network Appliance likely doesn’t care if Manly Man would rather deploy FCP for Oracle instead of NFS since their products do both protocols-and iSCSI too. All told, they may stand to make more money if Manly Man does in fact go with FCP since they may have the opportunity to sell expensive switches too. But, no, Network Appliance dispels the notion that 4Gb (or even 2Gb) FCP for Oracle is a must.
In this NetApp paper about FCP vs iSCSI and NFS, measurements are offered that show equal performance with DSS-style workloads (Figure 4) and only about 21% deficit when comparing OLTP on FCP to NFS. How’s that? The paper points out that the FCP test was fitted with 2Gb Fibre Channel HBAs and the NFS case had two GbE paths to storage yet Manly Man only achieved 21% more OLTP throughput. If NFS was so inherently unfit for Oracle, this test case with bandwidth parity would have surely made the point clear. But that wasn’t the case.
If you look at Figure 2 in that paper, you’ll see that the NFS case (with jumbo frames) spent 31.5% of cycles in kernel mode compared to 22.4% in the FCP case. How interesting. The NFS case lost 28% more CPU to kernel mode overhead and delivered 21% less OLTP throughput. Manly Man must surely see that addressing that 28% extra kernel mode overhead associated with NFS will bring OLTP throughput right in line with FCP and:
– NFS is simpler to configure
– NFS can be used for RAC and non-RAC
– NFS is cheaper since GbE is cheaper (per throughout) than FCP
Now isn’t that weird?
I can’t tell you how and when the 28% additional kernel-mode overhead gets addressed, but, um, it does. So, Manly Man, time to invent the wheel.