Oracle Shared Latches, CAS, Porting. No mention of ASM?

I participate in the Oracle-L mail list managed by fellow OakTable network member Steve Adams. There are a lot of good folks over there. I want to hijack a thread from that list and weave it in to a post I’ve wanted to do about Oracle porting.

Oracle Port-level Optimizations
It is a little known fact that major performance ramifications are the result of port-level implementation decisions. Oracle maintains a “bridge layer” of code between the Oracle Kernel and the lower level routines that interface with the Operating System. This layer is called the Virtual Operating System layer, or VOS. Oracle was brilliant for implementing this layer of the server. Without it, there would be routines at widely varying levels of the Oracle Kernel interfacing with the Operating System—chaos! Under the VOS is where the nitty-gritty happens.

So, what does this have to do with the thread on Oracle-L?

There was a post where a list member stated:

I saw quite a few cas latch waits today, Oracle 9.2.0.4 on HP-UX 11.11 PA-RiSC. Do these CPUs support CAS instructions?

This post sparked a series of follow-ups about Oracle’s usage of shared latches. For the non-Oracle minded reader, the term shared latches means reader/writer locks. You see, for many years, all critical sections in Oracle were protected by complete mutual exclusion—a real waste for mostly-read objects. The post is referring to an optimization where shared latches are implemented using Compare and Swap primitives. Whether or not your port of Oracle has this optimization is a decision made at the port level—either the OS supports it or it doesn’t. If it doesn’t, there are tough choices to make at the porting level. But the topic is bigger than that. When Oracle uses generic terms like “CAS” and “Scattered Read”, a lot is lost in translation. That is, when the VOS calls an underlying “Scattered Read”, is it a simulation? Is it really a single system call that takes a multiblock disk read and populates the SGA buffers with DMA? Or is it more like the age old Berkeley readv(2) which actually just looped with singleton reads in library context? On the other hand, when Oracle executes CAS code, is it really getting a CAS instruction or a set of atomic instructions (with looping)? The latter is generally the case.

Another installment on that particular Oracle-L thread took it to the port level:

Correct, HP doesn’t do CAS. There are some shared read latch operations that Oracle therefore implements through the CAS latch…

Right, HP, or more precisely the PA-RISC instruction set does not have a Compare and Swap instruction—it seems HP took the “reduced” in reduced instruction set computing to the extreme! However, neither does PowerPC or Alpha for that matter. In fact, neither does x86 or IA64. Oh hold it; I’m playing a word game. Both x86 and IA64 do have CAS, but it is called cmpxchg (compare and exchange). But honestly, PowerPC and Alpha do not. So how do these platforms offer CAS?

 

The porting teams for these various platforms have to make decisions. In the case of offering CAS to the VOS layer, they either have to construct a CAS using an atomic block of instructions or punt and use the reference code which is a spinlock. In the latter case, the wait event CAS latch can pop up. You see, a CAS latch can wait, whereas a real CAS will only stall the CPU. That is, if the port implements a CAS latch where other ports go with a CAS atomic set or single instruction, the former can sleep on a miss and the latter cannot. The processor is going to do that CAS, and nothing else. A contended memory location being hammered by CAS will stall processors, because once the CPU enters that block of (or singleton) instruction(s), it stays there until the work is done. I’m not talking about pathology, just implementation subtleties. So, what does CAS really look like?

 

 

 

Sparc64, x86, IA64, S/370, and get this, Motorola 68020 all offer a CAS instruction. There are others for sure. On the other hand, PPC and Alpha require an atomic set of instructions built off of LL/SC (Load-Link/Store Conditional) which on Power is the famed “lorks and storks” (ldarx/stwcx) and Alpha with their lxd_l/stx_c. Finally, what about PA-RISC? Well, you can’t do Oracle on any CPU that totally lacks atomic instructions. In the case of PA-RISC there is a Load and Clear Word (ldcw) instruction.

 

The point is that the VOS can be given a CAS of one sort or the other, but not all architectures handle the contention that CAS can cause. For whatever reason, it seems the HP porting group punted on 9i and went with latches where other ports use a real, or constructed, CAS. Be aware that just following the masses and implementing a CAS atomic set is not always the right answer. These pieces of code can do really weird things when the words being modified by the CAS straddle a cacheline and other such issues. Hmmm, trivial pursuit?

How Subtle are These Subtleties?

They can be really really NOT subtle!

 

 

I was on a team of folks that implemented an atomic set CAS for Oracle’s System Change Number in Oracle8 or 8i (can’t remember), which is actually a multi-word structure with the SCN word and another word representing the wrap value. The SCN value always increases, albeit not serially. It just gets “larger”. We were able to pull the latch that protected the incrementing of these values and replace it with a small block of atomic assembly that incremented it without any locks. I doubt we were the first to do that. We used a CAS atomic instruction set since the processors were 32bit and the target was a 8bytes. The net result was a 25% performance increase in TPC-C. Why? The SCN used to be a real big problem. Back then propeller heads like me used to collect bus traces on workloads like TPC-C and map the physical addresses back to SGA addresses. It so happened that in older versions of Oracle, 27% of all addresses referenced on the bus (64 CPU system) was the cacheline that held the SCN structure! Granted, that was for the sum of all of load, store and coherency ops (e.g., invalidate, cache to cache transfers). Hey, Sequent was a huge MESI system OK (actually in the NUMA days it was, um, a little more complicated than simple MESI).

 

 

 

More Information

I’m sure this post looked pretty flatulent, so I better give a couple of references. I think the first and best reference for non-blocking critical sections is here. I remember reading that the day it came out. Of course it was a PostScript file back then. If there are any systems actually booted with PA-RISC Linux, you can see sysdeps/hppa/pt-machine.h which will give a little insight into what you have to do with the most reduced RISC there is to implement T,T&S. I also liked this document about PA-RISC to Itanium porting because it talks about porting—albeit for two completely dead platforms (oh, I’m going to take a beating for that one). Finally, since I had the dubious privilege of supporting Informix and Oracle on 68020 hardware way back when, memory lane can be walked here. Yes, I know there were no spinlocks or other mutual exclusion in the versions of Oracle and Informix that ran on Motorola 68020 hardware (er, hold it, we did port Informix Turbo to the Altos 3068?, hmmm, so long ago…). I did like the manuals.

 

 

 

 

 

 

 

5 Responses to “Oracle Shared Latches, CAS, Porting. No mention of ASM?”


  1. 1 Noons October 31, 2006 at 11:15 pm

    “Of course it was a PostScript file back then”? Have you dumped the contents of a pdf file? It’s a Postscript file all the way through!
    🙂

    Still: not “flatulent” at all! Please do post this sort of stuff, don’t care if Oakies can’t appreciate it: I do and I’m sure I’m not the only one.

    CAS – well, T&S back then… – used to be my bread an butter in the days back at Sperry Univac. They went through these multi-CPU synch problems back in the mid 70s when they made the Univac-1110. Tracing and tracking the problems involved and the solutions proposed was how I spent most of my free time at Sperry for 5 years. By the late 70s-early 80s they had it mostly worked out with T&S. This was just with 6 CPUs: no proof whatsoever it would work well with more although the theory appeared sound: folks like Dijkstra had a lot to do with it.

    Actually DMS-1100 – their Codasyl database at the time – still had a few left over timing problems in multi-CPU systems. They finally solved them all in the early 80s but by then nobody cared anymore: relational was about to take off.

    And the relational folks promptly ignored prior art and went their own way. Took until much later for folks like you to become aware of the multi-CPU problems again and look for solutions.

    I’m reminded of what happens today with the j2ee and other “modern” programming techniques, who have basically ignored all past db design science and spent an enormous effort and time re-inventing the basic round shape… 🙂

    Interesting the comment about PA-RISC. Didn’t know they didn’t have a hardware implementation. Must have been sheer heck to work around some of these limitations in the VOS!

  2. 2 Kevin November 1, 2006 at 12:40 am

    oh, Oakies can take it… for sure…they just love teasing me…

    As for PA-RISC, they doen’t have hardware test and set, but it is implemented instead using a loop on ldcw in an atomic set.. same as x86 really…lock,btsl or t,t&s no diff really … bus as for CAS, like I put in the post, they ain’t got it.

    glad you like the post Noons!

  3. 3 Norman Dunbar November 1, 2006 at 8:15 am

    Nice one Kevin – especially as it was me who asked the question about ‘how the flipping heck do Oracle implement CAS without an atomiic instruction. I’m far better educated now. Thanks.

    Cheers,
    Norman.

  4. 4 Alex Gorbachev November 7, 2006 at 6:14 am

    How cool is that! Nice level of dtails. Though, following refs from this post makes my brain boiling. %)

  5. 5 Narty Atomic November 16, 2007 at 2:53 pm

    Nice one Kevin – especially as it was me who asked the question about ‘how the flipping heck do Oracle implement CAS without an atomiic instruction. I’m far better educated now. Thanks.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,976 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: