Archive Page 29

Manly Men Only Deploy Oracle with Fibre Channel – Part II. What’s So Simple and Inexpensive About NFS for Oracle?

The things I routinely hear from DBAs leads me to believe that they often don’t understand storage. Likewise, the things I hear from Storage Administrators convinces me they don’t always know what DBAs and system administrators have to do with those chunks of disk they dole out for Oracle. This is a long blog entry aimed at closing that gap with a particular slant to Oracle over NFS. Hey, it is my blog after all.

I also want to clear up some confusion about points I made in a recent blog entry. The confusion was rampant as my email box will attest so I clearly need to fix this.

I was catching up on some blog reading the other day when I ran across this post on Nuno Souto’s blog dated March 18, 2006. The blog entry was about how Noon’s datacenter had just taken on some new SAN gear. The gist of the blog entry is that they did a pretty major migration from one set of SAN gear to the other with very limited impact—largely due to apparent 6-Ps style forethought. Noons speaks highly of the SAN technology they have.

Anyone that participates in the oracle-l email list knows Noons and his important contributions to the list. In short, he knows his stuff—really well. So why am I blogging about this? It dawned on me that my recent post about Manly Men Only Deploy Oracle with Fibre Channel Storage jumped over a lot of ground work. I assure you all that neither Noons nor the tens of thousands of Oracle shops using Oracle on FCP are Manly Men as I depicted in my blog entry. I’m not trying to suggest that people are fools for using Fibre Channel SANs. Indeed, after waiting patiently from about 1997 to about 2001 for the stuff to actually work warrants at least some commitment to the technology. OK, ok, I’m being snarky again. But wait, I do have a point to make.

Deploying Oracle on NAS is Simpler and Cheaper, Isn’t It?
In my blog entry about “Manly Man”, I stated matter-of-factly that it is less expensive to deploy Oracle on NAS using NFS than on SANs. Guess what, I’m right, it is. But I didn’t sufficiently qualify what I was talking about. I produced that blog entry presuming readers would have the collective information of my prior blog posts about Oracle over NFS in mind. That was a weak presumption. No, when someone like Noons says his life is easier with SAN he means it. Bear in mind his post was comparing SAN to DAS, but no matter. Yes, Fibre Channel SAN was a life saver for too many sites to count in the late 90s. For instance, sites that bought into the “server consolidation” play of the late 1990s. In those days, people turned off their little mid-range Unix servers with DAS and crammed the workloads into a large SMP. The problem was that eventually the large SMP couldn’t physically attach any more DAS. It turns out that Fibre was needed first and foremost to get large numbers of disks connected to the huge SMPs of the era. That is an entirely different problem to solve than getting large numbers of servers connected to storage.

Put Your Feet in the Concrete
Most people presume that Oracle over NFS must be exponentially slower than Fibre Channel SAN. They presume this because at face value the wires are faster (e.g., 4Gb FCP versus 1Gb Ethernet). True, 4Gb is more bandwidth than 1Gb, but you can have more than one NFS path to storage and the latencies are a wash. I wanted to provide some numbers so I thought I’d use Network Appliance’s data that suggested a particular test of 8-way Solaris servers running Oracle OLTP over NFS comes within 21% of what is possible on a SAN. Using someone else’s results was mistake number 1. Folks, 21% degredation for NFS compared to SAN is not a number cast in stone. I just wanted to show that it is not a day and night difference and I wanted to use Network Appliance numbers for validity. I would not be happy with 21% either and that is good, because the numbers I typically see are not even in that range to start with. I see more like 10% and that is with 10g. 11g closes the gap nicely.

I’ll be producing data for those results soon enough, but let’s get back to the point. 21% of 8 CPUs worth of Oracle licenses would put quite a cost-savings burden on NAS in order to yield a net gain. That is, unless you accept the fact that we are comparing Oracle on NAS versus Oracle on SAN in which case the Oracle licensing gets cancelled out. And, again, let’s not hang every thought on that 21% of 8 CPUs performance difference because it is by no means a constant.

Snarky Email
After my Manly Man post, a fellow member of the OakTable Network emailed me the viewpoint of their very well-studied Storage Administrator. He calculated the cost of SAN connectivity for a very, very small SAN (using inexpensive 8-port FC switches) and factored in Oracle Enterprise Edition licensing to produce a cost per throughput using the data from that Network Appliance paper—the one with the 21% deficit. That is, he used the numbers at hand (21% degradation), Oracle Enterprise Edition licensing cost and his definition of a SAN (low connectivity requirements) and did the math correctly. Given those inputs, the case for NAS was pretty weak. To my discredit, I lashed back with the following:

…of course he is right that Oracle licensing is the lion’s share of the cost. Resting on those laurels might enable him to end up the last living SAN admin.

Folks, I know that 21% of 8 is 1.7 and that 1.7 Enterprise Edition Licenses can buy a lot of dual-port FCP HBAs and even a midrange Fibre Channel switch, but that is not the point I failed to make. The point I failed to make was that I’m not talking about solving the supposed difficulties of provisioning storage to those one or two remaining refrigerator-sized Legacy Unix boxes you might have. There is no there, there. It is not difficult at all to run a few 4Gb FCP wires to separate 8 or 16 port FC switches and then back to the storage array. Even Manly Man can do that. That is not a problem that needs solved because that is neither difficult nor is it expensive (at least the SAN aspect isn’t). As the adage goes, a picture speaks a thousand words. The following is a visual of a problem that doesn’t need to be solved—a simple SAN connected to a single server. Ironically, what it depicts is potentially millions of dollars worth of server and storage connected with merely thousands of dollars worth of Fibre Channel connectivity gear. In case the photo isn’t vivid enough, I’ll point out that on the left is a huge SMP (e.g., HP Superdome) and on the right is an EMC DMX. In the middle is a redundant set of 8-port switches—cheap, and simple. Even providing private and public Ethernet connectivity in such a deployment is a breeze by the way.

simplesan.jpg

I Ain’t Never Doing That Grid Thing.
Simply put, if the only Oracle you have deployed—now and forever—sits in a couple of refrigerator-sized legacy SMP boxes, I’m going to sound like a loon on this topic. I’m talking about provisioning storage to commodity servers—grid computing. Grid may not be where you are today, but it is in fact where you will be someday. Consider the fact that most datacenters are taking their huge machines and chopping them up into little machines with hardware/software virtualization anyway so we might as well just get to the punch and deploy commodity servers. When we do, we feel the pain of Fibre Channel SAN connectivity and storage provisioning. Because connecting large numbers of servers to storage was not exactly a design center for Fibre Channel SAN technology. Just the opposite is true; SANs were originally meant to connect a few servers to a huge number of disks—more than was possible with DAS.

Commodity Computing (Grid) == Huge SAN
Large numbers of servers connected to a SAN makes the SAN very complex. Not necessarily more disks, but the presentation and connectivity aspects get very difficult to deal with.

If you are unlucky enough to be up to your knees in the storage provisioning, connectivity and cost nightmare associated with even a moderate number of commodity servers in a SAN environment you know what I’m talking about. In these types of environments, people are deploying and managing director-class Fibre Channel switches where each port can cost up to $5,000 and they are deploying more than one switch for redundancy sake. That is, each commodity server needs a 2 port FC HBA and 2 paths to two different switches. Between the HBAs and the FC switch ports, the cost is as much as $10,000-$12,000 just to connect a “pizza box” to the SAN. That’s the connectivity story and the provisioning story is not much prettier.

Once the cabling is done, the Storage Administrator has to zone the switches and provision storage (e.g., create LUNs, LUN masking, etc). For RAC, that would be a minimum of 3 masked LUNs for each database. Then the System Administrator has to make sure Oracle has access to those LUNs. That is a lot of management overhead. NAS on the other hand uses very inexpensive NICs and switches. Ah, now there is an interesting point. Using NAS means each server only has one type of network connectivity instead of two (e.g., FC and Ethernet). Storage provisioning is also simpler—the database server administrator simply mounts the NFS filesystem and the DBA can go straight to work with RAC or non-RAC Oracle databases. How simple. And yes, the Oracle licensing cost is a constant, so in this paradigm, the only way to recuperate cost is in the storage connectivity side. The savings are worth consideration, and the simplicity is very difficult to argue.

It’s time for another picture. The picture below depicts a small commodity server deployment—38 servers that need storage.

complexsan.jpg

Let’s consider the total connectivity problem starting with the constant—Ethernet. Yes, every one of these 38 servers needs both Ethernet and Fibre Channel connectivity. For simplicity, let’s say only 8 of these servers are using RAC. The 8 that host RAC will need a minimum of 4 Gigabit Ethernet NICs/cables—2 for the public interfaces and two for a bonded, private network for Oracle Cache Fusion (GCS, GES) for a total of 32. The remaining 30 could conceivably do fine with 2 public networks each for a subtotal of 60. All told, we have 92 Ethernet paths to deal with before we look at storage networking.

On the storage side, we’ll need redundant paths for all 38 server to multiple switches so we start with 38 dual-port HBAs and 76 front-side Fibre Channel switch ports. Each switch will need a minimum of 2 paths back to storage, but honestly, would anyone try to feed 38 modern commodity servers with 2 4Gb paths worth of storage bandwidth? Likely not. On the other hand, it is unlikely the 30 smaller servers will each need dedicated 4Gb I/O bandwidth to storage so we’ll play zone trickery on the switch and group sets of 2 from the 30 yielding a requirement for 15 back-side I/O paths from each switch for a subtotal of 30 back-side paths. Following in suit, the remaining 8 RAC servers will require 4 back-side paths from each of the two switches for a subtotal of 8 back-side paths. To sum it up, we have 76 front-side and 38 back-side paths for a total of 114 storage paths. Yes, I know this can be a lot simpler by limiting the number of switch-to-storage paths. That’s a game called Which Servers Should We Starve for I/O and it isn’t fun to play. These arrangements are never attempted with small switches. That’s why the picture depicts large, expensive director-class switches.

Here’s our mess. We have 92 Ethernet paths and 114 storage paths. How would NAS make this simpler? Well, Ethernet is the constant here so we simply add more inexpensive Ethernet infrastructure. We still need redundant switches and I/O paths, but Ethernet is cheap and simple and we are down to a single network topology instead of two. Just add some simple NICs and simple Ethernet switches and go. And oh, by the way, the two network-topologies-model (e.g., GbE_+ FCP) generally means two different “owners” since the SAN would generally be owned by the Storage Group and the Ethernet would be owned by the Networking Group. With NAS, all connectivity from the Ethernet switches forward can be owned by the Networking Group freeing the Storage Group to focus on storage—as opposed to storage networking.

And, yes, Oracle11g has features that make the connectivity requirement on the Ethernet side simpler but 10g environments can benefit from this architecture too.

Not a Sales Pitch
Thus far, this blog entry has been the what. This would make a pretty hollow blog entry if I didn’t at least mention the how. The odds are very slim that your datacenter would be able to do a 100% NAS storage deployment. So Network Appliance handles this by offering multiple protocol storage from their Filers. The devil shall not remain with the details.

Total NAS? Nope. Multi-Protocol Storage.
I’ll be brief. You are going to need both FCP and NAS, I know that. If you have SQL Server (ugh) you certainly aren’t going to connect those servers to NAS. There are other reasons FCP isn’t going to go away soon enough. I accept the fact that both protocols are required in real life. So let’s take a look a multi-protocol storage and how it fits into this thread.

Network Appliance Multi-Protocol Support
Network Appliance is an NFS device. If you want to use it for FCP or iSCSI SAN, large files in the Filer’s filesystem (WAFL) are served with either FCP or iSCSI protocol and connectivity. Fine. It works. I don’t like it that much, but it works. In this paradigm, you’d choose to run the simplest connectivity type you deem fit. You could run some FCP to a few huge Legacy SMPs, FCP to some servers running SQL Server (ugh), and most importantly Ethernet for NFS to whatever you choose—including Oracle on commodity servers. Multi-protocol storage in this fashion means total vendor lock-in, but it would allow you to choose between the protocols and it works.

SAN Gateway Multi-Protocol Support
Don’t get rid of your SAN until there is something reasonable to replace it with. How does that statement fit this thread? Well, as I point out in this paper, SAN-NAS gateway devices are worth consideration. Products in this space are the HP Enterprise File Services Clustered Gateway and EMC Celerra. With these devices you leverage your existing SAN by connecting the “NAS Heads” to the SAN using very low-end, simple Fibre Channel SAN connectivity (e.g., small switches, few cables). From there, you can provision NFS mounts to untold numbers of NFS clients—a few, dozens or hundreds. The mental picture here should be a very small amount of the complex, expensive connectivity (Fibre Channel) and a very large amount of the inexpensive, simple connectivity (Ethernet). What a pleasant mental picture that is. So what’s the multi-protocol angle? Well, since there is a down-wind SAN behind the NAS gateway, you can still directly cable your remaining Legacy Unix boxes with FCP. You get native FCP storage (unlike NetApp with the blocks-from-file approach) for the systems that need it and NAS for the ones that don’t.

I’m a Oracle DBA, What’s in It for Me?
Excellent question and the answer is simply simplicity! I’m not just talking simplicity, I’m talking simple, simple, simple. I’m not just talking about simplicity in the database tier either. As I’ve pointed out upteen times, NFS will support you from top to bottom—not just the database tier, but all your unstructured data such as software installations as well. Steve Chan chimes in on the simplicity of shared software installs in the E-Biz world too. After the NFS filesystem is mounted, you can do everything from ORACLE_HOME, APPL_TOP, clusterware files (e.g., the OCR and CSS disks), databases, RMAN, imp/exp, SQL*Loader/External Tables, ETL, compiled PL/SQL, UTL_FILE, BFILE, trace/logging, scripts, and on and on. Without NFS, what sort of mix-match of raw, filesystem, raw+ASM combination would be required? A complex one—and the really ironic part is you’d probably still end up with some NFS mounts in addition to all that raw disk and non-CFS filesystem space as well!

Whew. That was a long blog entry.

Mount Options for Oracle over NFS. It’s All About the Port.

BLOG UPDATE: This post has developed an interesting comment thread worth noting.

I currently have a nearly chaotic set of differing configurations to deal with that run the gamut of x86_64 servers attached to 2/4 Gb FCP SANs and others to NAS via GbE. So sometimes I miss the mark. I just tried to fire up one of my databases on a DL585 running RHEL4 attached to the Enterprise File Services Clustered Gateway NAS device. In the midst of the chaos I mistakenly mounted the filesystem containing the Oracle Database 10g test database using the wrong mount options, so:

$ tail alert*log
ALTER DATABASE MOUNT
Mon Jun 18 15:04:28 2007
WARNING:NFS file system /mnt mounted with incorrect options
WARNING:Expected NFS mount options: rsize>=32768,wsize>=32768,hard
Mon Jun 18 15:04:28 2007
ORA-00202: control file: ‘/u01/app/oracle/product/10.2.0/db_1/rw/DATA/cntlbench_1’
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 3
Mon Jun 18 15:04:28 2007

I clearly did not mount the filesystems correctly. After remounting with the following options, everything was OK:

rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0

But then these mount options are port-specific and as they say in true Clintonian form, “It’s the Port Stupid.”

It’s All About the Port
The only complaint I have about Oracle over NFS is at the port level. I intend to start blogging about the idiosyncrasies between, say, certain Legacy Unix and Linux ports of Oracle with regard to NAS mount options. I think RMAN has the most issues and, again, these are always port level. For instance, certain ports inspect the mount options of the actual mounted filesystem and others will look at the mnttab. And then, in some cases, certain ports do it one way for the instance and then another for functionality such as RMAN. Sometimes when the database or tools don’t like the mount options they return an error message spelling out what is missing and other times just a generic complaint that the mount options are incorrect—and that too varies by port and version of Oracle as well. Recently I found that the HP-UX port of Oracle10g needs the llock mount option which is apparently not documented very well.

In all cases, issues regarding mount options are the responsibility of the Oracle port team for the release. That is where this functionality is built. The layers above the I/O layer (Operating System Dependent code) have no idea whether there is DAS, SAN or NAS down stream. That abstraction is one of the main reasons Oracle is the best database out there. That porting heritage goes back to Oracle version 4. Anyway, I digress…

Complicated.
Yes, these mount option topics are more complicated than they should be, but this situation is not permanent. As we get closer to July 11, I’ll be blogging more about what that means. Regardless, I stand fast in my view that provisioning storage for Oracle via NFS is simpler, simpler, simpler than SANs and that goes for both RAC or non-RAC databases. Just mount the filesystem and go…

In the meantime, if you have a particular port of Oracle10g that isn’t getting along with your NAS, remember our motto, “It’s the port[…]” so log an SR and Oracle will get you on your way.

10 Years to Replace 64 CPU Systems with a Single Socket.

I’ve been wondering how much longer I’d have to wait for this to happen…

Back in 1998 I was a part of the team at Sequent Computer Systems that delivered the first non-clustered TPC-C result of 100,000 Tpm-C. Well, OK, I exaggerated, it was 93,901 but the prior Oracle record was held by HP with the V2200 (a Convex system actually) at 40,794 TpmC. Non-Clustered TPC-C results do not get doubled within a year’s time so it was quite the accomplishment at the time. Even more so given the detractors and skeptics at the time that couldn’t imagine a bunch of those “little Intel processors” churning out a record TPC-C result. Well, we did. It was a huge, expensive system with 64 processors, 64GB RAM, and hundreds upon hundreds of disk drives. The system cost for that big system was $131.67 per TpmC.

That was then, this is now. On June 8, Hewlett-Packard and Oracle joined to produce a 100,926 TpmC result with a single processor—a Xeon 5355 “Cloverdale”. While it has 4 cores, it is technically a single processor. So, they beat our 1998 number with 1/64th the number of CPUs and only 24GB memory as opposed to the 64GB we used. Interesting as well is that they used 100 disks in MSA-1000 enclosures. And the cost? Only $.78 per TpmC!

Mistaken Identity?
The benchmark was executed with Oracle Enterprise Linux, Standard Edition One and no ASM. I can’t find anything in the full disclosure report suggesting the datafiles were on raw partitions, but I’d be very surprised to find they used Ext3. Besides, there was no value assigned to filesystemio_options that would have enabled Direct I/O and I would be dollars to doughnuts that there was no filesystem caching overhead on a TPC-C run.

On the humorous side of things, you might get a chuckle to find that on page 10 of the Full Disclosure Report, the auditors missed the subtle distinction between Oracle Enterprise Linux and Red Hat Linux 4 as the following quote will attest:

Overview
This report documents the methodology and results of the TPC Benchmark C test conducted on the hp ProLiant ML350 G5. The operating system used for the benchmark was Red Hat Enterprise Linux 4. The DBMS used was Oracle Database 10g Standard Edition One.

This was a phenomenal result actually. I was quite glad to see it. I plan to share a couple other interesting things I noticed in the FDR as well in another blog entry.

Change. Rapid, Very Rapid, Change.

If you are a xenophobe or an ostrich with your head in the sand I encourage you not to watch this video.

A New Blog to Follow.

The young’n has caught the blogging itch. Fellow OakTable Network member Tanel Poder has started blogging at:

http://blog.tanelpoder.com/

I’m looking forward to good stuff there.

Note, the “young’n” bit is an inside joke. Tanel is brilliant and his blog should be a good read.

Manly Men Only Deploy Oracle with Fibre Channel – Part 1. Oracle Over NFS is Weird.

Beware, lot’s of tongue in cheek in this one. If you’re not the least bit interested in storage protocols, saving money or a helpful formula for safely configuring I/O bandwidth for Oracle, don’t read this.

I was reading Pawel Barut’s Log Buffer #48 when the following phrase caught my attention:

For many of Oracle DBAs it might be weird idea: Kevin Closson is proposing to install Oracle over NFS. He states that it’s cheaper, simpler and will be even better with upcoming Oracle 11g.

Yes, I have links to several of my blog entries about Oracle over NFS on my CFS, NFS, ASM page, but that is not what I want to blog about. I’m blogging specifically about Powet’s assertion that “it might be a weird idea”—referring to using NAS via NFS for Oracle database deployments.

Weird
I think the most common misconception people have is regarding the performance of such a configuration. True, NFS has a lot of overhead that would surely tax the Oracle server way too much—that is if Oracle didn’t take steps to alleviate the overhead. The primary overhead is in NFS client-side caching. Forget about it. Direct I/O and asynchronous I/O are available to the Oracle server for NFS files with just about every NFS client out there.

Manly Men™ Choose Fibre Channel
I hear it all the time when I’m out in the field or on the phone with prospects. First I see the wheels turning while math is being done in the head. Then, one of those cartoon thought bubbles pops up with the following:

Hold it, that Closson guy must not be a Manly Man™. Did he just say NFS over Gigabit Ethernet? Ugh, I am Manly Man and I must have 4Gb Fibre Channel or my Oracle database will surely starve for I/O!

Yep, I’ve been caught! Gasp, 4Gb has more bandwidth than 1Gb. I have never recommended running a single path to storage though.

Bonding Network Interfaces
Yes, it can be tricky to work out 802.3ad Link Aggregation, but it is more than possible to have double or triple bonded paths to the storage. And yes, scalability of bonded NICs varies, but there is a simplicity and cost savings (e.g., no FCP HBAs or expensive FC switches) with NFS that cannot be overlooked. And, come in closely and don’t tell a soul, you won’t have to think about bonding NICs for Oracle over NFS forever, wink, wink, nudge, nudge.

But, alas, Manly Man doesn’t need simplicity! Ok, ok, I’m just funning around.

No More Wild Guesses
A very safe rule of thumb to keep your Oracle database servers from starving for I/O is:
100Mb I/O per GHz CPU

So, for example, if you wanted to make sure an HP c-Class server blade with 2-socket 2.66 GHz “Cloverdale” Xeon processors had sufficient I/O for Oracle, the math would look like this:

12 * 2.66 * 4 * 2 == 255 MB/s

Since the Xeon 5355 is a quad-core processor and the 480c c-Class blade supports two of them there are 21.28 GHz for the formula. And, 100 Mb is about 12 MB. So if Manly Man configures, say, two 4Gb FC paths (for redundancy) to the same c-Class blade he is allocating about 1000 MB/s bandwidth. Simply put, that is expensive overkill. Why? Well, for starters, the blade would be 100% saturated at the bus level if it did anything with 1000 MB/s so it certainly couldn’t satisfy Oracle performing physical I/O and actually touching the blocks (e.g., filtering, sorting, grouping, etc). But what if Manly Man configured the two 4Gb FCP paths for failover with only 1 path active path (approximately 500 MB/s bandwidth)? That is still overkill.

Now don’t get me wrong. I am well aware that 2 “Cloverdale” Xeons running Parallel Query can scoop up 500MB/s from disk without saturating the server. It turns out that simple light weight scans (e.g., select count(*) ) are about the only Oracle functionality that breaks the rule of 100Mb I/O per GHz CPU. I’ve even proven that countless times such as in this dual processor, single core Opteron 2.8 Ghz proof point. In that test I had IBM LS20 blades configured with dual processor, single-core Opterons clocked at 2.8 GHz. So if I plug that into the formula I’d use 5.6 for the GHz figure which supposedly yields 67 MB/s as the throughput at which those processors should have been saturated. However, on page 16 of this paper I show those two little single-core Opterons scanning disk at the rate of approximately 380MB/s. How is that? The formula must be wrong!

No, it’s not wrong. When Oracle is doing a light weight scan it is doing very, very little with the blocks of data being returned from disk. On the other hand, if you read further in that paper, you’ll see on page 17 that a measly 21MB/s of data loading saturated both processors on a single node-due to the amount of data manipulation required by SQL*Loader. OLTP goes further. Generally, when Oracle is doing OLTP, as few as 3,000 IOps from each processor core will result in total saturation. There is a lot of CPU intensive stuff wrapped around those 3,000 IOps. Yes, it varies, but look at your OLTP workload and take note of the processor utilization when/if the cores are performing on the order of 3,000 IOps each. Yes, I know, most real-world Oracle databases don’t even do 3,000 IOps for an entire server which takes us right back to the point: 100Mb I/O per GHz CPU is a good, safe reference point.

What Does the 800 Pound Gorilla Have To Say?
When it comes to NFS, Network Appliance is the 800lb gorilla. They have worked very hard to get to where they are. See, Network Appliance likely doesn’t care if Manly Man would rather deploy FCP for Oracle instead of NFS since their products do both protocols-and iSCSI too. All told, they may stand to make more money if Manly Man does in fact go with FCP since they may have the opportunity to sell expensive switches too. But, no, Network Appliance dispels the notion that 4Gb (or even 2Gb) FCP for Oracle is a must.

In this NetApp paper about FCP vs iSCSI and NFS, measurements are offered that show equal performance with DSS-style workloads (Figure 4) and only about 21% deficit when comparing OLTP on FCP to NFS. How’s that? The paper points out that the FCP test was fitted with 2Gb Fibre Channel HBAs and the NFS case had two GbE paths to storage yet Manly Man only achieved 21% more OLTP throughput. If NFS was so inherently unfit for Oracle, this test case with bandwidth parity would have surely made the point clear. But that wasn’t the case.

If you look at Figure 2 in that paper, you’ll see that the NFS case (with jumbo frames) spent 31.5% of cycles in kernel mode compared to 22.4% in the FCP case. How interesting. The NFS case lost 28% more CPU to kernel mode overhead and delivered 21% less OLTP throughput. Manly Man must surely see that addressing that 28% extra kernel mode overhead associated with NFS will bring OLTP throughput right in line with FCP and:

– NFS is simpler to configure

– NFS can be used for RAC and non-RAC

– NFS is cheaper since GbE is cheaper (per throughout) than FCP

Now isn’t that weird?

The 28%.

I can’t tell you how and when the 28% additional kernel-mode overhead gets addressed, but, um, it does. So, Manly Man, time to invent the wheel.

A Good Blog Post About Monitoring Oracle Over NFS

I’d like to give a shout out to a very good blog post about monitoring Oracle on NFS by Jeremy Schneider.

Gun Violence in Oracle IT Shops

I won’t blog at all about the actual array they are doing this to because I have never tested one. In the video a bullet is fired through the SAN array and it is portrayed to continue operations. The part of the video I like is the disclaimer at the end.

Whew, close call for the goldfish. Can you imagine the anxiety it felt not knowing it was to be scooped off to safety? No worries, there is treatment. Perhaps preventive treatment would be more effective?



1.2 Transactions Per Second! Enterprise Software is Infinitely Partitionable

I read a post on blogs.zdnet.com about MySQL that I think was interesting. In the post, Dana Blankenhorn is posing that MySQL is “enterprise class” using the Booking.com deployment as case-in-point.

What is “Enterprise Class?”
The post got me thinking. What is “Enterprise Class” anyway? Is it any software used in any enterprise datacenter? I tend to think of an enterprise class database server as one that can vertically scale to exploit the largest servers in support of a single, large application. Using those criteria leaves MySQL out I should think. Or am I behind the times on that? Are there any single MySQL databases running on a 64CPU Superdome for instance? It appears as though MySQL is supported on Itanium HP-UX for 2-processor systems.

Enterprise MySQL
In this computerworlduk.com article, it looks as though Booking.com uses something like 20 MySQL database servers to handle “tens of thousands” of bookings for 30,000 hotels spanning some 8,000 destinations. Let’s say for the sake of argument that it is 20 database servers and “tens of thousands” is 100,000. I admit I don’t know anything about the richness of this application, but I don’t see anything too brutal here. These sorts of applications lend themselves to partitioning naturally. It wouldn’t surprise any of us Oracle types to find out that they partition based upon hotel. That seems like a natural line to partition on. If that is the case, I get 1,500 hotels per database server handling their fair share of about 1.2 transactions per second (100,000/86,400 seconds in a day). I know these things are not that simple, but folks, we are talking about 20 database servers. Even if they are 2-socket/dual core systems you’ve got some 80 cores to work with! At first glance it just doesn’t seem as though these systems would be working that hard. And MySQL? Well, it doesn’t have to work that hard at all since the workload is partitionable. Who knows, maybe all workloads are partitionable and we Oracle-types are just missing the ball. Anyway, I can’t seem to find what storage engine is being used at Booking.com. And speaking of MySQL storage engines…

A 3-legged Pink Elephant
If you’re interested in 3-legged pink elephants, I’ve got one for you since we are on the topic of MySQL. In computerworlduk.com article we find that MySQL announced support for MySQL on IBM System I (yes, OS-400) with DB2 as the storage engine. Wow, that would be weird. Or it seems so at least.

What’s this Really Have to do with Oracle?
Oracle Database can do everything MySQL can do. The opposite is not true. ‘Nuff said. Oh, did I mention that Oracle Corporation is not a “Database Company” anymore. They’ve got the database now they are getting everything else.

 

SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!

I was catching up on my mojo reading when I caught a little snippet I’d like to blog about. Oh, by the way, have I mentioned recently that StorageMojo is one of my favorite blogs?

In Robin Harris’ latest installment about ZFS action at Apple, he let out a glimpse of one of his other apparent morbid curiosities—flash. Just joking, I don’t think ZFS on Mac or flash technology are morbid, it just sounded catchy. Anyway, he says:

I’ve been delving deep into flash disks. Can you say “weird”? My take now is that flash drives are to disk drives what quantum mechanics is to Newtonian physics. I’m planning to have something out next week.

I look forward to what he has to say. I too have a great interest in flash.

Now, folks, just because we are Oracle-types and Jim Grey was/is a Microsoft researcher, you cannot overlook what sorts of things Jim was/is interesting in. Jim’s work has had a huge impact on technology over the years and it turns out that Jim took/takes an interest in flash technology with servers in mind. Just the abstract of that paper makes it a natural must-read for Oracle performance minded individuals. Why? Because it states (with emphasis added by me):

Executive summary: Future flash-based disks could provide breakthroughs in IOps, power, reliability, and volumetric capacity when compared to conventional disks.

 

Yes, IOps! Nothing else really matter where Oracle database is concerned. How can I say that? Folks, round-brown spinning things do sequential I/O just fine—naturally. What they don’t do is random I/O. To make it worse, most SAN array controllers (you know, that late 1990’s technology) pile on overhead that further choke off random I/O performance. Combine all that with the standard IT blunder of allocating space for Oracle on a pure capacity basis and you get the classic OakTable Network response:

Attention DBAs, it’s time for some déjà vu. I’ll state with belligerent repetition, redundantly, over and over, monotonously reiterating this one very important recurrent bit of advice: Do everything you can to get spindles from your storage group—not just capacity.

 

Flash
Yes that’s right, it wont be long (in relative terms) until you see flash memory storage fit for Oracle databases. The aspect of this likely future trend that I can’t predict, however, is what impact such technology would have on the entrenched SAN array providers. Will it make it more difficult to keep the margins at the levels they demand, or will flash be the final straw that commoditizes enterprise storage? Then again, and Jim Grey points out in that paper, flash density isn’t even being driven by the PC—and most certainly not enterprise storage—ecosystem. The density is being driven by consumer and mobile applications. Hey, I want my MTV. Um, like all of it, crammed into my credit-card sized mpeg player too.

When?
When it gets cheaper and higher capacity of course. Well, its not exactly that simple. I went spelunking for that Samsung 1.8” 32GB SSD and found two providers with street price of roughly USD $700.00 for 32GB here and here. In fact, upon further investigation, Ritek may soon offer a 32GB device at some $8 per GB. But let’s stick with current product for the moment. At $22 per GB, we’re not exactly talking SATA which runs more on the order of $.35 per GB. But then we are talking enterprise applications here, so a better comparison would be to Fibre drives which go for about $3-$4 per GB.

Now that is interesting since Jim Grey pointed out that in-spite of some industry predictions setting the stage for NAND to double every year, NAND had in fact gained 16 fold in 4 years–off by year. If that pace continues, could we really expect 512GB 1.8″ SSD devices in the next 4 years? And would the price stay relatively constant yielding a cost of something like $1.35 per GB? Remember, even the current state of the art (e.g., the Samsung 1.8″ 32GB SSD) delivers on the order of 130,000 random single-sector IOps–that’s approximately 7usec latency for a random I/O. At least that is what Samsung’s literature claims. Jim’s paper, on the other hand reports grim current art performance when measured with DskSpd.exe:

The story for random IOs is more complex – and disappointing. For the typical 4-deep 8KB random request, read performance is a spectacular 2,800 requests/second but write performance is a disappointing 27 requests/second.

The technology is young and technically superior, but there is work to do in getting the most out of NSSD as the paper reports. Jim suspects that short term quick fixes could be made to bring the random I/O performance for 8KB transfers on today’s NSSD technology up to about 1,700 IOps split evenly between read and write. Consider, however, that real world applications seldom exhibit a read:write ratio of 50:50. Jim generalized on the TPC-C workload as a case in point. It seems with “some re-engineering” (Jim’s words) even today’s SSD would be a great replacement for hard drives for typical Oracle OLTP workloads since you’ll see more 70:30 read:write ratios in the real world. And what about sequential writes? Well, there again, even today’s technology can handle some 35MB/s of sequential writes so direct path writes (e.g., sort spills) and redo log writes would be well taken care of. But alas, the $$/GB is still off. Time will fix that problem and when it does, NSSD will be a great fit for databases.

Don’t think for a moment Oracle Corporation is going to pass up on enabling customers to exploit that sort of performance–with or without the major SAN vendors.

But flash burns out, right? Well, yes and no. The thing that matters is how long the device lasts-the sum of its parts. MTBF numbers are crude, but Samsung sticks a 1,000,000hr MTBF on this little jewel-how cool.
Well, I’ve got the cart well ahead of the horse here for sure because it is still too expensive, but put it on the back burner, because we aren’t using Betamax now and I expect we’ll be using fewer round-brown spinning things in the span of our careers.

More Linux Distributions Please! Balkanization Improves Oracle IT!

Sometimes small problems really upset me. I have a test harness that nests shell scripts deeply. Why not—that is why fork() and exec() exist after all. I was getting odd failures deep down in the bowels of the harness that were a bit tricky to track down.

This particular test harness was failing with Segmentation fault on SuSE SLES 9 and Memory fault on Red Hat Enterprise Linux RHEL4. Same problem just different error text. When I found the problem I started to look into it on another test system that happened to be Fedora Core 5. I was frustrated to find that the problem did not happen on Fedora, but did on 32 and 64 bit releases of both RHEL4 and SLES9. Ugh.

The problem rested in the first 2 lines of a simple shell script. What was the offending code? Get this:

#!/bin/bash
time

Wow, frightening stuff! In the first line of some stupid shell script I happened to execute the shell built-in time without timing anything, which was a goof on my part.  But really, bash seg faults because the time built-in doesn’t have an arg and only when that is the first line of the script? As long as time is the first command in the script it causes bash to freak out—regardless of what follows. Notice in the following screen shot how simply putting the shell built-in : as the first line of code—after forcing bash—alleviates the problem. Notice also that the problem didn’t happen on Fedora.

Balkanization
Oh how I hate problems like this—simple, stupid little shell scripts behaving differently on one Linux distro than on another. Yes, I know it isn’t the shell script that is technically behaving differently—it’s the implementation of bash. And if there were symbols in /bin/bash (it’s stripped) I’d tunnel-rat my way through gdb to see why it is freaking out, however, with limited information I can only determine the routine, which is called execute_command_internal().

I also know this is not technically a Linux issue since bash is just GNU stuff. To that end, some googling reveals that this bug has been reported in the Solaris camp in Open Solaris Bug 6328339 which was opened back in 2005. Even though this is technically a GNU issue I still get varying behavior from different Linux distributions which I believe is a by-product of balkanization.

I thought Linux was going to save us from all the balkanization we supposedly suffered under Unix rule. This is just a simple example of balkanization. We endured much worse side-affects of Linux balkanization with the pre-2.6 Kernel virtual memory fiasco. Some my recall the ill-fated clash of the virtual memory titans featuring the Archangelites opposing the Van Rielians—a war with a large number of customer casualties. Going back further, I recall plenty of workloads (most notably The Tens) that would completely crush one Linux distribution but function just fine with the other. Indeed, during that project we tested both RH2.1 and SLES7 and stayed with the one that could hold up under pressure—and if you read the paper you can envision the pressure!

How many Linux distributions are there now? Does Linux balkanization make life better for the average Oracle IT shop?

Every Release of Oracle Database is “The Best Ever”, Right? Enter Oracle11g!

 

I see Eddie Awad has hit the press about the July 11 launch of Oracle11g. From everything I’ve seen in this release there should be no technical reasons holding back customers’ adoption. There are certain folks out there that say every release of Oracle “is the best I’ve seen yet.” A good way to sell books for sure, but I’m not that way. I most certainly didn’t say that about Oracle 7.2! Anyway, I have been thoroughly impressed with my testing—most particularly in the area of stability. And, as I keep hinting, there are features that neither Rich Niemiec, Mike Ault nor Don Burleson have been discussing that I think will be very attractive—especially in the commodity computing space. Unfortunately I have to remind myself of the real world and how long it takes to get applications qualified for a release of the server. Let’s hope E-Biz gets there as soon as possible.

Blogging or Bashing Oracle for Fun, not Profit. Got Cheezburger?

I stood on the sidelines of this thread too long. Today, Justin Kestelyn made another post about the Oracle blogging community. This thread goes back to these original posts where Justin posed that although there is a lot of blogging activity around Oracle, there doesn’t seem to be the same Web 2.0 buzz that someone like Robert Scoble would take notice of. Fellow OakTable Network member Doug Burns stepped in with this post. So what’s my take?

Aristocracy or Meritocracy?
I started this blog in October 2006 and at one point found a reference to my blog on blogs.oracle.com. Not that it generated any traffic, but I thought that was interesting because I didn’t ask to get a reference there. But the fact that it generated no appreciable traffic to my site is what I think Scoble is talking about. When I think about it, it seems my blog more than deserves at least a link from blogs.oracle.com, the bigger question is what criteria goes into that blogroll? Is it aristocracy, or meritocracy?

These days when I find myself sitting with Vice Presidents or members of the technical staff in Oracle Server Technologies Division (ST) or doing something like writing a jointly produced whitepaper with ST (as I am right now on a cool Oracle11g feature), I wonder why there are those small circles of relative late-comers to this Oracle stuff that mistake me for being an “Oracle-basher.” Folks, I spent an entire decade as a member of a small team of platform engineers optimizing Oracle at the port level for improved SMP, and later, NUMA. I also participated in the most important benchmarks that made Oracle money in the 1990s—customer-defined benchmarks where bake-offs between, say, Informix PDQ and IBM SP2 or Teradata or Sybase were at stake. I spent so much time in building 400 (Server Technologies) of Oracle’s HQ that I maintained a fully furnished apartment right down the street on Marine Parkway. I’m an Oracle basher? No.

Let’s say I was to state matter-of-factly that Miscrosoft Windows 3.1 was a complete pile of garbage. Does that actually make me a Microsoft basher? No, it simply means that there was an offering from Microsoft that I didn’t like. Big deal. I wasn’t much of a fan of Oracle SQL*Calc either—and neither was anyone else so Oracle discontinued it. I bet there will be no more than about 42 readers of this blog that even remember SQL*Calc.

Automatic Storage Management. Bashing?
I have taken a position that in its current form, Automatic Storage Management (ASM) is often times over-positioned. Let me be clear about this. I have never taken a stand against any Oracle revenue-generating product. It turns out that ASM is optional software that eases storage management pains most common to SAN environments that are also devoid of any optional software such as clustered filesystems. Indeed, install Oracle10g sometime and pay close attention to the fact that the default DBCA placement for a Real Application Clusters database is in fact cluster filesystem. You have to cursor down to select ASM. What is my point? My point is that either way the customer using DBCA with RAC has already paid Oracle the same amount of money regardless of where they put their database—whether in the default locale of cluster filesystem or ASM.

ASM is routinely referred to as a “replacement for filesystems and volume managers.” That is incorrect. You still have to install Oracle and do things like imp/exp, SQL*Loader, BFILE/UTIL_FILE, logging, trace, scripts, etc, etc, etc. Until such time as ASM is a part of a fully-baked general purpose filesystem—which anyone skilled in the reading of tea leaves should easily be able to foresee—I prefer NFS. And, get this, Oracle makes the same amount of money when you deploy on NAS (NFS) as they do if you choose iSCSI or FCP with CFS or ASM. These choices don’t affect Oracle’s bottom line. I make my points about my preference for NFS over block protocols (and therefore ASM) in this set of postings. No, I am not an Oracle basher. But do my views fit in the aristocracy that blogs.oracle.com seems to be? It doesn’t seem so.

More on ASM
I am excited about where ASM will make a showing in the future. It is a component of bigger and better things that I cannot discuss openly. In those future technology offerings, ASM will perform wonderfully and provide vital functionality—and I’m not just talking about some passé short-term market vision like displacing Veritas VxVM from Oracle implementations. There are much bigger and better things ahead for ASM…’nuff said. In the meantime, choose success, go with NFS. Ok, I’m off that soapbox. Back to this Web 2.0 mystery.

The Popularity Contest
So if Justin is honestly concerned about building a vibrant Web 2.0 community, it seems blogs.oracle.com should be more in tune with the readership. Let’s consult Technorati.

In defense of Oracle’s Web 2.0 community, I saw Justin mention blogs such as Steve Chan’s blog and Doug Burns’ blog. I think I’ll walk through Technorati for those and, of course, the ueberblog (where’s my umlaut?): Jonathan Lewis’ blog. To spice things up, how about comparing to one of my favorite blogs, StorageMojo.com. Of course I’ll have to include Scobleizer to show Web 2.0 weight. Be aware that of the following blogs, Jonathan Lewis’ and mine are the youngest blogs—by a long shot.

pop2.jpg

Proper Perspective
Now, just to put things into perspective, consider a true Web 2.0 phenomenon: I CAN HAS CHEEZBURGER. This site—which has a Technorati authority of 5,025 and rank of 98—is proof positive that the Internet and social networking are as mainstream as Pet Trusts(for real), designer pet supplies, fluffy with a sniffle, pets with stress and of course Barbi with a scooper.

 

Oracle on Opteron with Linux-The NUMA Angle (Part VII).

This installment in my series about Oracle on Linux with NUMA hardware is very, very late. I started this series at the end of last year and it just kept getting put off—mostly because the hardware I needed to use was being used for other projects (my own projects). This is the seventh in the series and it’s time to show some Oracle numbers. Previously, I laid groundwork about such topics as SUMA/NUMA, NUMA API and so forth. To make those points I relied on microbenchmarks such as the Silly Little Benchmark. The previous installments can be found here.

To bring home the point that Oracle should be run on AMD boxes in NUMA mode (as opposed to SUMA), I decided to pick an Oracle workload that is very easy to understand as well as processor intensive. After all, the difference between SUMA and NUMA is higher memory latency so testing at any level below processor saturation actually provides the same throughput-albeit the SUMA result would come at a higher processor cost. To that end, measuring SUMA and NUMA at processor saturation is the best way to see the difference.

The workload I’ll use for this testing is what my friend Anjo Kolk refers to as the Jonathan Lewis Oracle Computing Index workload. The workload comes in script form and is very straightforward. The important thing about the workload is that it hammers memory which, of course, is the best way to see the NUMA effect. Jonathan Lewis needs no introduction of course.

The test was set up to execute 4, 8 16 and 32 concurrent invocations of the JL Comp script. The only difference in the test setup was that in one case I booted the server in SUMA mode and in another I booted in NUMA mode and allocated hugepages. As I point out in this post about SUMA, hugepages are allocated in a NUMA fashion and booting an SGA into this memory offers at least crude fairness placement of the SGA pages—certainly much better than a Cyclops. In short, what is being tested here one case where memory is allocated at boot time in a completely round-robin fashion versus the SGA being quasi-round robin yet page tables, kernel-side process-related structures and heap are all NUMA-optimized. Remember, this is no more difficult than a system boot option. Let’s get to the numbers.

jlcomp.jpg

I have also rolled up all the statspack reports into a word document (as required by WordPress). The document is numa-statspack.doc and it consist of 8 statspacks each prefaced by the name of what the specific test was. If you pattern search for REPORT NAME you will see each entry. Since this is a simple memory latency improvement, you might not be surprised how uninteresting the stats are-except of course the vast improvement in the number of logical reads per second the NUMA tests were able to push through the system.

SUMA or NUMA
A picture speaks a thousand words. This simple test combined with this simple graph covers it all pretty well. The job complete time ranged from about 12 to 15 percent better with NUMA at each of the concurrent session counts. While 12 to 15% isn’t astounding, remember this workload is completely processor bound. How do you usually recuperate 12-15% from a totally processor-bound workload without changing even a single line of code? Besides, this is only one workload and the fact remains that the more your particular workload does outside the SGA (e.g., sorting, etc) the more likely you are to see improvement. But by all means, do not run Oracle with Cyclops memory.

The Moral of the Story

Processors are going to get more cores and slower clock rates and memory topologies will look a lot more NUMA than SUMA as time progresses. I think it is important to understand NUMA.

What is Oracle Doing About It?
Well, I’ve blogged about the fact that the Linux ports of 10g do not integrate with libnuma. That means it is not NUMA-aware. What I’ve tried to show in this series is that the world of NUMA is not binary. There is more to it than SUMA or NUMA-aware. In the middle is booting the server and database in a fashion that at least allows benefit from the OS-side NUMA-awareness. The next step is Oracle NUMA-awareness.

Just recently I was sitting in a developer’s office in bldg 400 of Oracle HQ talking about NUMA. It was a good conversation. He stated that Oracle actually has NUMA awareness in it and I said, “I know.” I don’t think Sequent was on his mind and I can’t blame him—that was a long time ago. The vestiges of NUMA awareness in Oracle 10g trace back to the high-end proprietary NUMA implementations of the 1990s.  So if “it’s in there” what’s missing? We both said vgetcpu() at the same time. You see, you can’t have Oracle making runtime decisions about local versus remote memory if a process doesn’t know what CPU it is currently executing on (detection with less than a handful of instructions).  Things like vgetcpu() seem to be coming along. That means once these APIs are fully baked, I think we’ll see Oracle resurrect intrinsic NUMA awareness in the Linux port of Oracle Database akin to those wildcat ports of the late 90s…and that should be a good thing.

Oracle Over NFS. I Need More Monitoring Tools? A Bonded NIC Roller Coaster.

As you can tell by my NFS-related blog entries, I am an advocate of Oracle over NFS. Forget those expensive FC switches and HBAs in every Oracle Database server. That is just a waste. Oracle11g will make that point even more clearly soon enough. I’ll start sharing how and why as soon as I am permitted. In the meantime…

Oracle over NFS requires bonded NICs for redundant data paths and performance. That is an unfortunate requirement that Oracle10g is saddled with. And, no, I’m not going to blog about such terms as IEEE 802.3ad, PAgP, LACP, balance-tlb, balance-alb or even balance-rr. The days are numbered for those terms-at least in the Oracle database world. I’m not going to hint any futher about that though.

Monitoring Oracle over NFS
If you are using Oracle over NFS, there are a few network monitoring tools out there. I don’t like any of them. Let’s see, there’s akk@da and Andrisoft WanGuard. But don’t forget Anue and Aurora, Aware, BasicState, CommandCenter NOC, David, Dummmynet, GFI LANguard, Gomez, GroundWork, Hyperic HQ, IMMonitor, Jiploo, Monolith, moods, Network Weathermap, OidView, Pandetix, Pingdom, Pingwy, skipole-monitor, SMARTHawk, Smarts, WAPT, WFilter, XRate1, arping, Axence NetVision, BBMonitor, Cacti, CSchmidt collection, Cymphonix Network Composer, Darkstat, Etherape, EZ-NOC, Eye-on Bandwidth, Gigamon University, IPTraf, Jnettop, LITHIUM, mrtg-ping-probe, NetMRG, NetworkActiv Scanner, NimTech, NPAD, Nsauditor, Nuttcp, OpenSMART, Pandora FMS, PIAFCTM, Plab, PolyMon, PSentry, Rider, RSP, Pktstat, SecureMyCompany, SftpDrive, SNM, SpeedTest, SpiceWorks, Sysmon, TruePath, Unbrowse, Unsniff, WatchMouse, Webalizer, Web Server Stress Tool, Zenoss, Advanced HostMonitor, Alvias, Airwave, AppMonitor, BitTorrent, bulk, BWCTL, Caligare Flow Inspector, Cittio, ClearSight, Distinct Network Monitor, EM7, EZMgt, GigaMon, Host Grapher II, HPN-SSH, Javvin Packet Analyzer, Just-ping, LinkRank, MoSSHe, mturoute, N-able OnDemand, Netcool, netdisco, Netflow Monitor, NetQoS, Pathneck, OWAMP, PingER, RANCID, Scamper, SCAMPI, Simple Infrastructure Capacity Monitor, Spirent, SiteMonitor, STC, SwitchMonitor, SysUpTime, TansuTCP, thrulay, Torrus, Tstat, VSS Monitoring, WebWatchBot, WildPackets, WWW Resources for Communications & Networking Technologies, ZoneRanger, ABwE, ActivXpets, AdventNet Web NMS, Analyse It, Argus, Big Sister, CyberGauge, eGInnovations, Internet Detective, Intellipool Network Monitor, JFF Network Management System, LANsurveyor, LANWatch, LoriotPro, MonitorIT, Nagios, NetIntercept, NetMon, NetStatus, Network Diagnostic Tool, Network Performance Advisor, NimBUS, NPS, Network Probe, NetworksA-OK, NetStat Live, Open NerveCenter, OPENXTRA, Packeteer, PacketStorm, Packetyzer, PathChirp, Integrien, Sniff’em, Spong, StableNet PME, TBIT, Tcptraceroute, Tping, Trafd, Trafshow, TrapBlaster, Traceroute-nanog, Ultra Network Sniffer, Vivere Networks, ANL Web100 Network Configuration Tester, Anritsu, aslookup, AlertCenter, Alertra, AlertSite, Analyse-it, bbcp, BestFit, Bro, Chariot, CommView, Crypto-PAn, elkMonitor, DotCom, Easy Service Monitor, Etherpeek, Fidelia, Finisar, Fpinger, GDChart, HipLinkXS, ipMonitor, LANExplorer, LinkFerret, LogisoftAR, MGEN, Netarx, NetCrunch, NetDetector, NetGeo, NEPM, NetReality, NIST Net, NLANR AAD, NMIS, OpenNMS PageREnterprise, PastMon, Pathprobe, remstats, RIPmon, RFT, ROMmon, RUDE, Silverback, SmokePing, Snuffle, SysOrb, Telchemy, TCPTune, TCPurify, UDPmon, WebAttack, Zabbix, AdventNet SNMP API, Alchemy Network Monitor, Anasil analyzer, Argent, Autobuf, Bing, Clink, DSLReports, Firehose, GeoBoy, PacketBoy, Internet Control Portal, Internet Periscope, ISDNwatch, Metrica/NPR, Mon, NetPredict, NetTest, Nettimer, Net-One-1, Pathrate, RouteView, sFlow, Shunra, Third Watch, Traceping, Trellian, HighTower, WCAT, What’s Up Gold, WS_FTP, Zinger, Analyzer, bbftp, Big Brother, Bronc, Cricket, EdgeScape, Ethereal (now renamed Wireshark), gen_send/gen_recv, GSIFTP, Gtrace, Holistix, InMon, NcFTP, Natas, NetAlly, NetScout, Network Simulator, Ntop, PingGraph, PingPlotter, Pipechar, RRD, Sniffer, Snoop, StatScope, Synack, View2000, VisualPulse, WinPcap, WU-FTPD, WWW performance monitoring, Xplot, Cheops, Ganymede, hping2, Iperf, JetMon, MeasureNet, MatLab, MTR, NeoTrace, Netflow, NetLogger, Network health, NextPoint, Nmap, Pchar, Qcheck, SAA, SafeTP, Sniffit, SNMP from UCSD, Sting, ResponseNetworks, Tcpshow, Tcptrace WinTDS, INS Net Perf Mgmt survey, tcpspray, Mapnet, Keynote, prtraceroute clflowd flstats, fping, tcpdpriv, NetMedic Pathchar, CAIDA Measurement Tool Taxonomy, bprobe & cprobe, mrtg, NetNow, NetraMet, Network Probe Daemon, InterMapper, Lachesis, Optimal Networks and last but not least, Digex.

Simplicity Please
The networking aspect of Oracle over NFS is the simplest type of networking imaginable. The database server issues I/O to NFS filesystems being exported over simple, age old Class C private networks (192.168.N.N). We have Oracle statspack to monitor what Oracle is asking of the filesystem. However, if the NFS traffic is being sent over a bonded NIC, monitoring the flow of data is important as well. That is also a simple feat on Linux since /proc tracks all that on a per-NIC basis.

I hacked out a very simple little script to monitor eth2 and eth3 on my system. It isn’t anything special, but it shows some interesting behavior with bonded NICS. The following screen shot shows the simple script executing. A few seconds after starting the script, I executed an Oracle full table scan with Parallel Query in another window. Notice how /proc data shows that the throughput has peaks and valleys on a per-second basis. The values being reported are Megabytes so it is apparent that the bursts of I/O are achieving full bandwidth of the GbE network storage paths, but what’s up with the pulsating action? Is that Oracle, or the network? I can’t tell you just yet. Here is the screen shot nonetheless:

ntput.jpg

For what it is worth, here is a listing of that silly little script. Its accuracy compensates for its lack of elegance. Cut and paste this on a Linux server and tailor eth[23] to whatever you happen to have.

$ cat ntput.sh
#!/bin/bash
function get_data() {
cat /proc/net/dev | egrep “${token1}|${token2}” \
| sed ‘s/^.*://g’ | awk ‘{ print $1 }’ | xargs echo
}

token1=eth2
token2=eth3
INTVL=1

while true
do

set – `get_data`
b_if1=$1
b_if2=$2

sleep $INTVL

set – `get_data`
a_if1=$1
a_if2=$2

echo $a_if1 $b_if1 | awk -v intvl=$INTVL ‘{
printf(“%7.3f\t”, (($1 – $2) / 1048576) / intvl) }’
echo $a_if2 $b_if2 | awk -v intvl=$INTVL ‘{
printf(“%7.3f\n”, (($1 – $2) / 1048576) / intvl) }’

done


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 741 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.