Archive for the 'Shared APPL_TOP' Category

Manly Men Only Deploy Oracle with Fibre Channel – Part II. What’s So Simple and Inexpensive About NFS for Oracle?

The things I routinely hear from DBAs leads me to believe that they often don’t understand storage. Likewise, the things I hear from Storage Administrators convinces me they don’t always know what DBAs and system administrators have to do with those chunks of disk they dole out for Oracle. This is a long blog entry aimed at closing that gap with a particular slant to Oracle over NFS. Hey, it is my blog after all.

I also want to clear up some confusion about points I made in a recent blog entry. The confusion was rampant as my email box will attest so I clearly need to fix this.

I was catching up on some blog reading the other day when I ran across this post on Nuno Souto’s blog dated March 18, 2006. The blog entry was about how Noon’s datacenter had just taken on some new SAN gear. The gist of the blog entry is that they did a pretty major migration from one set of SAN gear to the other with very limited impact—largely due to apparent 6-Ps style forethought. Noons speaks highly of the SAN technology they have.

Anyone that participates in the oracle-l email list knows Noons and his important contributions to the list. In short, he knows his stuff—really well. So why am I blogging about this? It dawned on me that my recent post about Manly Men Only Deploy Oracle with Fibre Channel Storage jumped over a lot of ground work. I assure you all that neither Noons nor the tens of thousands of Oracle shops using Oracle on FCP are Manly Men as I depicted in my blog entry. I’m not trying to suggest that people are fools for using Fibre Channel SANs. Indeed, after waiting patiently from about 1997 to about 2001 for the stuff to actually work warrants at least some commitment to the technology. OK, ok, I’m being snarky again. But wait, I do have a point to make.

Deploying Oracle on NAS is Simpler and Cheaper, Isn’t It?
In my blog entry about “Manly Man”, I stated matter-of-factly that it is less expensive to deploy Oracle on NAS using NFS than on SANs. Guess what, I’m right, it is. But I didn’t sufficiently qualify what I was talking about. I produced that blog entry presuming readers would have the collective information of my prior blog posts about Oracle over NFS in mind. That was a weak presumption. No, when someone like Noons says his life is easier with SAN he means it. Bear in mind his post was comparing SAN to DAS, but no matter. Yes, Fibre Channel SAN was a life saver for too many sites to count in the late 90s. For instance, sites that bought into the “server consolidation” play of the late 1990s. In those days, people turned off their little mid-range Unix servers with DAS and crammed the workloads into a large SMP. The problem was that eventually the large SMP couldn’t physically attach any more DAS. It turns out that Fibre was needed first and foremost to get large numbers of disks connected to the huge SMPs of the era. That is an entirely different problem to solve than getting large numbers of servers connected to storage.

Put Your Feet in the Concrete
Most people presume that Oracle over NFS must be exponentially slower than Fibre Channel SAN. They presume this because at face value the wires are faster (e.g., 4Gb FCP versus 1Gb Ethernet). True, 4Gb is more bandwidth than 1Gb, but you can have more than one NFS path to storage and the latencies are a wash. I wanted to provide some numbers so I thought I’d use Network Appliance’s data that suggested a particular test of 8-way Solaris servers running Oracle OLTP over NFS comes within 21% of what is possible on a SAN. Using someone else’s results was mistake number 1. Folks, 21% degredation for NFS compared to SAN is not a number cast in stone. I just wanted to show that it is not a day and night difference and I wanted to use Network Appliance numbers for validity. I would not be happy with 21% either and that is good, because the numbers I typically see are not even in that range to start with. I see more like 10% and that is with 10g. 11g closes the gap nicely.

I’ll be producing data for those results soon enough, but let’s get back to the point. 21% of 8 CPUs worth of Oracle licenses would put quite a cost-savings burden on NAS in order to yield a net gain. That is, unless you accept the fact that we are comparing Oracle on NAS versus Oracle on SAN in which case the Oracle licensing gets cancelled out. And, again, let’s not hang every thought on that 21% of 8 CPUs performance difference because it is by no means a constant.

Snarky Email
After my Manly Man post, a fellow member of the OakTable Network emailed me the viewpoint of their very well-studied Storage Administrator. He calculated the cost of SAN connectivity for a very, very small SAN (using inexpensive 8-port FC switches) and factored in Oracle Enterprise Edition licensing to produce a cost per throughput using the data from that Network Appliance paper—the one with the 21% deficit. That is, he used the numbers at hand (21% degradation), Oracle Enterprise Edition licensing cost and his definition of a SAN (low connectivity requirements) and did the math correctly. Given those inputs, the case for NAS was pretty weak. To my discredit, I lashed back with the following:

…of course he is right that Oracle licensing is the lion’s share of the cost. Resting on those laurels might enable him to end up the last living SAN admin.

Folks, I know that 21% of 8 is 1.7 and that 1.7 Enterprise Edition Licenses can buy a lot of dual-port FCP HBAs and even a midrange Fibre Channel switch, but that is not the point I failed to make. The point I failed to make was that I’m not talking about solving the supposed difficulties of provisioning storage to those one or two remaining refrigerator-sized Legacy Unix boxes you might have. There is no there, there. It is not difficult at all to run a few 4Gb FCP wires to separate 8 or 16 port FC switches and then back to the storage array. Even Manly Man can do that. That is not a problem that needs solved because that is neither difficult nor is it expensive (at least the SAN aspect isn’t). As the adage goes, a picture speaks a thousand words. The following is a visual of a problem that doesn’t need to be solved—a simple SAN connected to a single server. Ironically, what it depicts is potentially millions of dollars worth of server and storage connected with merely thousands of dollars worth of Fibre Channel connectivity gear. In case the photo isn’t vivid enough, I’ll point out that on the left is a huge SMP (e.g., HP Superdome) and on the right is an EMC DMX. In the middle is a redundant set of 8-port switches—cheap, and simple. Even providing private and public Ethernet connectivity in such a deployment is a breeze by the way.

simplesan.jpg

I Ain’t Never Doing That Grid Thing.
Simply put, if the only Oracle you have deployed—now and forever—sits in a couple of refrigerator-sized legacy SMP boxes, I’m going to sound like a loon on this topic. I’m talking about provisioning storage to commodity servers—grid computing. Grid may not be where you are today, but it is in fact where you will be someday. Consider the fact that most datacenters are taking their huge machines and chopping them up into little machines with hardware/software virtualization anyway so we might as well just get to the punch and deploy commodity servers. When we do, we feel the pain of Fibre Channel SAN connectivity and storage provisioning. Because connecting large numbers of servers to storage was not exactly a design center for Fibre Channel SAN technology. Just the opposite is true; SANs were originally meant to connect a few servers to a huge number of disks—more than was possible with DAS.

Commodity Computing (Grid) == Huge SAN
Large numbers of servers connected to a SAN makes the SAN very complex. Not necessarily more disks, but the presentation and connectivity aspects get very difficult to deal with.

If you are unlucky enough to be up to your knees in the storage provisioning, connectivity and cost nightmare associated with even a moderate number of commodity servers in a SAN environment you know what I’m talking about. In these types of environments, people are deploying and managing director-class Fibre Channel switches where each port can cost up to $5,000 and they are deploying more than one switch for redundancy sake. That is, each commodity server needs a 2 port FC HBA and 2 paths to two different switches. Between the HBAs and the FC switch ports, the cost is as much as $10,000-$12,000 just to connect a “pizza box” to the SAN. That’s the connectivity story and the provisioning story is not much prettier.

Once the cabling is done, the Storage Administrator has to zone the switches and provision storage (e.g., create LUNs, LUN masking, etc). For RAC, that would be a minimum of 3 masked LUNs for each database. Then the System Administrator has to make sure Oracle has access to those LUNs. That is a lot of management overhead. NAS on the other hand uses very inexpensive NICs and switches. Ah, now there is an interesting point. Using NAS means each server only has one type of network connectivity instead of two (e.g., FC and Ethernet). Storage provisioning is also simpler—the database server administrator simply mounts the NFS filesystem and the DBA can go straight to work with RAC or non-RAC Oracle databases. How simple. And yes, the Oracle licensing cost is a constant, so in this paradigm, the only way to recuperate cost is in the storage connectivity side. The savings are worth consideration, and the simplicity is very difficult to argue.

It’s time for another picture. The picture below depicts a small commodity server deployment—38 servers that need storage.

complexsan.jpg

Let’s consider the total connectivity problem starting with the constant—Ethernet. Yes, every one of these 38 servers needs both Ethernet and Fibre Channel connectivity. For simplicity, let’s say only 8 of these servers are using RAC. The 8 that host RAC will need a minimum of 4 Gigabit Ethernet NICs/cables—2 for the public interfaces and two for a bonded, private network for Oracle Cache Fusion (GCS, GES) for a total of 32. The remaining 30 could conceivably do fine with 2 public networks each for a subtotal of 60. All told, we have 92 Ethernet paths to deal with before we look at storage networking.

On the storage side, we’ll need redundant paths for all 38 server to multiple switches so we start with 38 dual-port HBAs and 76 front-side Fibre Channel switch ports. Each switch will need a minimum of 2 paths back to storage, but honestly, would anyone try to feed 38 modern commodity servers with 2 4Gb paths worth of storage bandwidth? Likely not. On the other hand, it is unlikely the 30 smaller servers will each need dedicated 4Gb I/O bandwidth to storage so we’ll play zone trickery on the switch and group sets of 2 from the 30 yielding a requirement for 15 back-side I/O paths from each switch for a subtotal of 30 back-side paths. Following in suit, the remaining 8 RAC servers will require 4 back-side paths from each of the two switches for a subtotal of 8 back-side paths. To sum it up, we have 76 front-side and 38 back-side paths for a total of 114 storage paths. Yes, I know this can be a lot simpler by limiting the number of switch-to-storage paths. That’s a game called Which Servers Should We Starve for I/O and it isn’t fun to play. These arrangements are never attempted with small switches. That’s why the picture depicts large, expensive director-class switches.

Here’s our mess. We have 92 Ethernet paths and 114 storage paths. How would NAS make this simpler? Well, Ethernet is the constant here so we simply add more inexpensive Ethernet infrastructure. We still need redundant switches and I/O paths, but Ethernet is cheap and simple and we are down to a single network topology instead of two. Just add some simple NICs and simple Ethernet switches and go. And oh, by the way, the two network-topologies-model (e.g., GbE_+ FCP) generally means two different “owners” since the SAN would generally be owned by the Storage Group and the Ethernet would be owned by the Networking Group. With NAS, all connectivity from the Ethernet switches forward can be owned by the Networking Group freeing the Storage Group to focus on storage—as opposed to storage networking.

And, yes, Oracle11g has features that make the connectivity requirement on the Ethernet side simpler but 10g environments can benefit from this architecture too.

Not a Sales Pitch
Thus far, this blog entry has been the what. This would make a pretty hollow blog entry if I didn’t at least mention the how. The odds are very slim that your datacenter would be able to do a 100% NAS storage deployment. So Network Appliance handles this by offering multiple protocol storage from their Filers. The devil shall not remain with the details.

Total NAS? Nope. Multi-Protocol Storage.
I’ll be brief. You are going to need both FCP and NAS, I know that. If you have SQL Server (ugh) you certainly aren’t going to connect those servers to NAS. There are other reasons FCP isn’t going to go away soon enough. I accept the fact that both protocols are required in real life. So let’s take a look a multi-protocol storage and how it fits into this thread.

Network Appliance Multi-Protocol Support
Network Appliance is an NFS device. If you want to use it for FCP or iSCSI SAN, large files in the Filer’s filesystem (WAFL) are served with either FCP or iSCSI protocol and connectivity. Fine. It works. I don’t like it that much, but it works. In this paradigm, you’d choose to run the simplest connectivity type you deem fit. You could run some FCP to a few huge Legacy SMPs, FCP to some servers running SQL Server (ugh), and most importantly Ethernet for NFS to whatever you choose—including Oracle on commodity servers. Multi-protocol storage in this fashion means total vendor lock-in, but it would allow you to choose between the protocols and it works.

SAN Gateway Multi-Protocol Support
Don’t get rid of your SAN until there is something reasonable to replace it with. How does that statement fit this thread? Well, as I point out in this paper, SAN-NAS gateway devices are worth consideration. Products in this space are the HP Enterprise File Services Clustered Gateway and EMC Celerra. With these devices you leverage your existing SAN by connecting the “NAS Heads” to the SAN using very low-end, simple Fibre Channel SAN connectivity (e.g., small switches, few cables). From there, you can provision NFS mounts to untold numbers of NFS clients—a few, dozens or hundreds. The mental picture here should be a very small amount of the complex, expensive connectivity (Fibre Channel) and a very large amount of the inexpensive, simple connectivity (Ethernet). What a pleasant mental picture that is. So what’s the multi-protocol angle? Well, since there is a down-wind SAN behind the NAS gateway, you can still directly cable your remaining Legacy Unix boxes with FCP. You get native FCP storage (unlike NetApp with the blocks-from-file approach) for the systems that need it and NAS for the ones that don’t.

I’m a Oracle DBA, What’s in It for Me?
Excellent question and the answer is simply simplicity! I’m not just talking simplicity, I’m talking simple, simple, simple. I’m not just talking about simplicity in the database tier either. As I’ve pointed out upteen times, NFS will support you from top to bottom—not just the database tier, but all your unstructured data such as software installations as well. Steve Chan chimes in on the simplicity of shared software installs in the E-Biz world too. After the NFS filesystem is mounted, you can do everything from ORACLE_HOME, APPL_TOP, clusterware files (e.g., the OCR and CSS disks), databases, RMAN, imp/exp, SQL*Loader/External Tables, ETL, compiled PL/SQL, UTL_FILE, BFILE, trace/logging, scripts, and on and on. Without NFS, what sort of mix-match of raw, filesystem, raw+ASM combination would be required? A complex one—and the really ironic part is you’d probably still end up with some NFS mounts in addition to all that raw disk and non-CFS filesystem space as well!

Whew. That was a long blog entry.

Real Application Clusters: The Shared Database Architecture for Loosely-Coupled Clusters

The typical Real Application Clusters (RAC) deployment is a true enigma. Sometimes I just scratch my head because I don’t get it. I’ve got this to say, if you think Shared Nothing Architecture is the way to go, then deploy it. But this is an Oracle blog, so let’s talk about RAC.

RAC is a shared disk architecture, just like DB2 on IBM mainframes. It is a great architecture, one that I agree with as is manifested by my working for shared data clustering companies all these years. Again, since this is an Oracle blog I think arguments about shared disk versus shared nothing are irrelevant.

Dissociative Identity Disorder
The reason I’m blogging this topic is because in my opinion the typical RAC deployment exhibits the characteristics of a person suffering from Dissociative Identity Disorder. Mind you, I’m discussing the architecture of the deployment, not the people that did the deployment. That is, we spend tremendous amounts of money for shared disk database architecture and then throw it into a completely shared nothing cluster. How much sense does that make? What areas of operations does that paradigm affect? Why does Oracle promote shared disk database deployments on shared-nothing clusters? What is the cause of this Dissociative Identity Disorder? The answer: the lack of a general purpose shared disk filesystem that is suited to Oracle database I/O that works on all Unix derivations and Linux. But wait, what about NFS?

Shared “Everything Else”
I can’t figure out any other way to label the principle I’m discussing so I’ll just call it “Shared Everything Else”. However, the term Shared Everything Else (SEE for short) insinuates that there is less importance in that particular content—an insinuation that could not be further from the truth. What do I mean? Well, consider the Oracle database software itself. How do you suppose an Oracle RAC (shared disk architecture) database can exist without having the product installed somewhere.

The product install directory for the database is called Oracle Home. Oracle has supported the concept of a shared Oracle Home since the initial release of RAC—even with Oracle9i. Yes, Metalink note 240963.1 describes the requirement for Oracle9i to have context dependent symbolic links (CDSL), but that was Oracle9i. Oracle10g requires no context dependent symbolic links. Oracle Universal Installer will install a functional shared Oracle Home without a any such requirements.

What if you don’t share a software install? It is very easy to have botched or mismatched product installs—which doesn’t sit well with a shared disk database. In a recent post on the oracle-l list, sent the following call for help:

We are trying to install a 2-node RAC with ASM (Oracle 10.2.0.2.0 on Solaris 10) and getting the error below when using dbca to create the database.The error occurs when dbca is done creating the DB (100%).Any suggestions?

We have tried starting atlprd2 instance manually and get the error below regarding an issue with spfile which is on ASM.

ORA-01565: error in identifying file ‘+SYS_DG/atlprd/spfileatlprd.ora’
ORA-17503: ksfdopn:2 Failed to open file +SYS_DG/atlprd/spfileatlprd.ora
ORA-03113: end-of-file on communication channel

OK, for those who are not Oracle-minded, this sort of deployment is what I call the Dissociative Identity Disorder since the database will be deployed on a bunch of LUNs provisioned, masked and accessed as RAW disk from the OS side—ASM is a collection of RAW disks. This is clearly not a SEE deployment.The original poster followed up with a status of the investigatory work he had to do to try and get around this problem:

[…] we have checked permissions and they are the same.We also checked and the same disk groups are mounted in both ASM instances

also.We have also tried shutting everything down (including reboot of both servers) and starting everything from scratch (nodeapps, asm, listeners, instances), but the second node won’t start.Keep getting the same error […]

What a joy. Deploying a shared disk database in a shared nothing cluster! There he was on each server checking file permissions (I just counted, there are 20,514 files in one of my Oracle10g Oracle Homes), investigating the RAW disk aspects of ASM, rebooting servers and so on. Good thing this is only a 2 node cluster. What if it was an 8 node cluster? What if he had 10 different clusters?

As usual, the oracle-l support channel comes through. Another list participant posted the following:

Seem to be a known issue (Metalink Note 390591.1). We encountered similar issue in Linux RAC cluster and has been resoled by following this note.

The cause was included in his post (emphasis added by me):

Cause

Installing the 10.2.0.2 patchset in a RAC installation on any Unix platform does not correctly update the libknlopt.a file on all nodes. The local node where the installer is run does update libknlopt.a but remote nodes do not get the updated file. This can lead to dumps or internal errors on the remote nodes if Oracle is subsequently relinked.

That was the good and bad, now the ugly—his post continues with the following excerpt from the Oracle Metalink note:

There are two solutions for this problem:

1) Manual copy of the “libknlopt.a” library to the offending nodes:

-ensure all instances are shut down
-manually copy $ORACLE_HOME/rdbms/lib/libknlopt.a from the local node to all remote nodes

-relink Oracle on all nodes :
make -f ins_rdbms.mk ioracle

2) Install the patchset on every node using the “-local” option:

What’s So Bad About Shared Nothing Clusters?
I’m not going to get into that, but one of the central knock-offs Oracle uses against shared-nothing database architecture is the fact that replication is required. Since the software used to access RAC needs to be kept in lock-step, replication is required there as well, and as we see from this oracle-l email thread, replication is not all that simple with a complex software deployment like the Oracle database product. But speaking of complex, the Oracle database software pales in comparison to the Oracle E-Business Suite. How in the world do people manage to deploy E-Biz on anything other than a huge central server? Shared Applications Tier.

Shared Applications Tier
Yes, just like Oracle Home, the huge, complex Oracle E-Business Suite can be installed in a shared fashion as well. It is called a Shared Applications Tier. One of the other blogs I read has been discussing this topic as well, but this is not just a blogosphere topic—it is mainline. Perhaps the best resource for Shared Applications Tier is Metalink note 243880.1, but Metalink notes 384248.1 and 233428.1 should not be overlooked. The long story short is that Oracle supports SEE, but they don’t promote it for who-knows-what-reason.

Is SEE Just About Product Installs?
Absolutely not. Consider intrinsic RAC functionality that doesn’t function at all without a shared filesystem:

  • External Tables with Parallel Query Option
  • UTIL_FILE
  • BFILE

I’m sure there are others (perhaps compiled PL/SQL), but who cares. The product is expensive and if you are using shared disk architecture you should be able to use all the features of shared disk architecture. However, without a shared filesystem, External Tables and the other features listed are not cluster-ready. That is, you can use External Tables, UTIL_FILE and BFILE—but only from one node. Isn’t RAC about multi-node scalability?

So Why the Rant?
The Oracle Universal Installer will install a fully functional Oracle10g shared Oracle Home to simplify things, the complex E-Business Suite software is architected for shared install and there are intrinsic database features that require shared data outside of the database so why deploy a shared database architecture product on a platform that only shares the database? You are going to have to explain it to me like I’m six years old; because I know I’m not going to understand. Oh, yes, and don’t forget that with a shared-nothing platform, all the day to day stuff like imp/exp, SQL*Loader, compressed archive redo, logging, trace, scripts, spool and so on mean you have to pick a server and go. How symmetric is that? Not as symmetric as the software for which you bought the cluster (RAC), that’s for certain.

Shared Oracle Home is a Single Point of Failure
And so is the SYSTEM tablespace in a RAC database, so what is the point?People who choose to deploy RAC on a platform that doesn’t support shared Oracle Home often say this. Yes a single shared Oracle Home is a single point of failure, but like I said, so is the SYSTEM tablespace in every RAC database out there. Shops that espouse shared software provisioning (e.g., shared Oracle Home) are not dolts, so the off-the-cuff single point of failure red herring is just that. When we say shared Oracle Home, do we mean a single shared Oracle Home? Well, not necessarily. If you have, say, a 4 or 8 node RAC cluster, why assume that SEE or not to SEE is a binary choice? It is perfectly reasonable to have 8 nodes share something like 2 Oracle Homes. That is a significant condensing factor and appeases the folks that concentrate on the possible single point of failure aspect of a shared Oracle Home (whilst often ignoring the SYSTEM tablespace single point of failure). A total availability solution requires Data Guard in my opinion, and Data Guard is really good, solid technology.

Choices
All told, NFS is the only filesystem that can be used across all Unix (and Linux) platforms for SEE. However, not all NFS offerings are suffiently scalable and resilient for SEE. This is why there is a significant technology trend towards clustered storage (e.g., NetApp OnTAP GX, PolyServe(HP) EFS Clustered Gateway, etc).

Finally, does anyone think I’m proposing some sort of mix-match NFS here with a little SAN there sort of ordeal? Well, no, I’m not. Pick a total solution and go with it…either NFS or SAN, the choice is yours, but pick a total platform solution that has shared data to complement the database architecture you’ve chosen. RAC and SEE!


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3,010 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: