Manly Men Only Deploy Oracle with 64 Bit Linux – Part I. What About a x86 Port on EM64T/AMD64 Hardware? | Kevin Closson's Blog: Platforms, Databases and Storage

Manly Men Only Deploy Oracle with 64 Bit Linux – Part I. What About a x86 Port on EM64T/AMD64 Hardware?

In the comment thread of my of my latest installment in the “Manly Man” series, a reader posted a humorous comment that included a serious question:

[…] what are your thoughts about x86 vs. x86-64 Linux, in relation to Oracle and RAC? I’d appreciate a blog entry if you could.

Tim Hall followed up in suit:

I would have thought it was obvious. The number 64 is twice the size of 32, so it must be twice as good, making you twice as manly!

Yes, when it comes to bitness, some is good so more must be better, right? Tim then continued to point out the political incorrectness of the term Manly Man by suggesting the following:

PS. I think politically correct names for this series of blog entries are:

Personally Persons only deploy…
Humanly Humans only deploy…

Before I actually touch the 32 versus 64-bit question from the thread, I’ll submit the following Manly Man entertainment:

If you have enjoyed the Manly Man series at all, or simply need a little background, you must must see this YouTube Video about Manly Men and Irish Spring.

32 or 64 Bit Linux
When people talk about 32 versus 64 bit Oracle with Linux they are actually talking about 3 topics:

Running 32 bit Oracle on 32 bit Linux with native 32 bit hardware (e.g., Pentium IV (Willamette), Xeon MP (Foster MP) ).
Running 32 bit Oracle on 32 bit Linux with x86_64 hardware.
Running 64 bit Oracle on 64 bit Linux with x86_64 hardware.

The oddball combination folks don’t talk about is 32 bit Oracle running on 64 bit Linux because Oracle doesn’t support it. I do test that however by installing Oracle on a 32 bit server and then NFS mounting that ORACLE_HOME over on a 64 bit Linux system. However, discussing this combination would therefore be moot due to the support aspect.

The most interesting comparison would be between 1 and 2 above provided both systems have precisely the same core clock speed, L2 cache and I/O subsystem. As such, the comparison would come down to how well the 32-bit optimized code is treated by the larger cache line size. There would, of course, be other factors since there are several million more transistors in an EM64T processor than a Pentium IV and other fundamental improvements. I have wished I could make that comparison though. The workload of choice would be one that “fits” in a 32 bit environment (e.g., 1GB SGA, 1GB total PGA) and therefore doesn’t necessarily benefit from 64 bitness.

If anyone were to ask me, I’d say go with 3 above. Oracle on x86_64 Linux is not new.

Bitness
In my recent blog entry about old software configurations in an Oracle over NFS situation, I took my well-deserved swipes at pre-2.6 Linux kernels. Perhaps the most frightening one in my experience was 32-bit RHEL 3. That whole 4/4 split kernel thing was a nightmare—unless you like systems that routinely lock up. But all told I was taking swipes at pre-2.6 kernels without regard for bitness. So long as the 2.6 kernel is on the table, the question of bitness is not necessarily so cut and dried.

In my recent blog entry about Tim Hall’s excellent step-by-step guide for RAC on NFS, a reader shared a very interesting situation he has gone through:

I have a Manly Man question for you. This Manly Man Wanna Be (MMWB) runs a 2-node 10g RAC on Dell 2850s with 2 dual-core Xeon CPUs (total of 4 CPUs). Each server has 16 GB of memory. While MMWB was installing this last year, he struggled mightily with 64-bit RAC on 64-bit Red Hat Linux 4.0. MMWB finally got it working after learning a lot of things about RPMs and such.

However, Boss Of Manly Man Wanna Be (BOMMWB) was nervous about 64-bit being “new,” and all of the difficulties that MMWB had with it, so we reinstalled with 32-bit RAC running on 32-bit Red Hat Linux 4.0.

My naturally petulant reaction would have been to focus on the comment about 64-bit being “new.” I’m glad I didn’t fire off. This topic deserves better treatment.

While I disagree with Boss of Manly Man’s assertion that 64-bit is “new”, I can’t take arms against the fact that this site measured different levels of pain when installing the same release of Oracle on the same release of RHEL4—only varying the bitness. It is unfortunate that this site has committed themselves to a 32 bit database based solely upon the their experiences during the installation. Yes, the x86_64 install of 10gR2 requires a bit more massaging of the platform vis a vis RPMs. In fact, I made a blog entry about 32 bit libraries required on 64 bit RHEL4. While there may occasionally be more headaches during an x86_64 install than x86, I would not deploy Oracle on a 32 bit operating system today unless there was a gun held to my head. All is not lost for this site, however. The database they created with 32 bit Oracle is perfectly usable in-place with 64 bit Oracle after a simple dictionary upgrade procedure documented in the note Metalink note entitled Changing between 32-bit and 64-bit Word Sizes (ML62290.1).

Has Anyone Ever Tested This Stuff?
I have…a lot! But I really doubt we are talking about running 32 bit Oracle on 32 bit hardware. Nobody even makes a native 32 bit x86 server these days (that I know of). I think the question at hand is more about 32 bit Oracle on 32 bit Linux with x86_64 hardware.

There has always been the big question about what 64 bit software performance is like when the workload possesses no characteristics that would naturally benefit from the larger address space. For instance, what about a small number of users attached to an SGA of 1GB and the total PGA footprint is no more than 1GB. That’s a workload that doesn’t need 64 bit. Moreover, what if the comparison is between 32 bit and 64 bit software running on the same server (e.g., AMD Opteron). In this case, the question gets more interesting. After all, the processor caches are the same, the memory->processor bandwidth is constant, the drivers can all DMA just fine. The answer is an emphatic yes! But yes, what? Yes there are occasions where 64 bit code will dramatically outperform 32 bit code on dual-personality 64 bit processors (e.g., AMD Opteron). It is all about porting. Let me explain.

The problem with running sophisticated 32 bit code on 64 bit processors is that the 32 bit code was most likely ported with a different processor cache line size in mind. This is important:

Native 32 bit x86 processors use a 32 byte cache line size whereas 64 bit processors (e.g., AMD64, EM64T) use a 64 byte cache line.
That means, in the case of a native 32 bit processor, load/store and coherency operations are performed on a swath of 32 bytes. Yes, there were exceptions like the Sequent NUMA-Q 2000 which had two different cache line sizes-but that was a prior life for me. Understanding cache line size and how it affects coherency operations is key to database throughput. And unlike Microsoft who never had to do the hard work of porting (IA64 not withstanding), Oracle pays very close attention to this topic. In the case of x86 Linux Oracle, the porting teams presumed the code was going to run on native 32 bit processors-a reasonable presumption.

What Aspects of Bitness Really Matter?
The area of the server that this topic impacts the most (by far) is latching. Sure, you use the database server to manage your data and the accesses to your data seem quite frequent to you (thousands of accesses per second), but that pales in comparison to the rate at which system memory is accessed for latches. These operations occur on the order of millions of times per second. Moreover, accesses to latches are write-intensive and most-painfully contended across multiple CPUs which results in a tremendous amount of bandwidth used for cache coherency. Spinlocks (latches) require attention to detail-period. Just sum up the misses and gets on all the latches during a processor-saturated workload sometime and you’ll see what I mean. What’s this have to do with 32 versus 64 bit?

It’s All About The Port
At porting time, Oracle pays close attention to ensure that latch structures fit within cache lines in a manner that eliminates false sharing. Remember, processors don’t really read or write single words. They read/write or invalidate the entire line that a given word resides in-at least when processor-to-memory operations occur. Imagine, therefore, a latch structure that is, say, 120 bytes long and that the actual latch word is the first element of the structure. Next, imagine that there are only 2 latches in our imaginary server and we are running on a 32 bit OS on a native 32 bit system such as Pentium IV or Xeon MP (Foster 32 bit) and therefore a 32 byte cache line size. We allocate and initialize our 2 structures at instance startup. These structures will lay out in 240 bytes within a single memory page. Since were dutiful enough to align our two structures on a page boundry, what we have is the first structure resides in the first 120 bytes of the memory page-the first 4 32 byte cache lines. But wait, there are 12 extra bytes in the 4th cache line. Doesn’t that mean the first 12 bytes of the second latch structure are going to share space in the 4th cache line? Not if you are good at porting. And in our example, we are.

That’s right, we were aware of our cache line size (32 bytes) so we padded the structure by allocating an array of unsigned integers (4 bytes) three deep as the last element of our structure. Now our latch structure is precisely 132 bytes or 4 lines. Finally, we have our imaginary 32 bit code optimized for a real 32 bit system (and therefore a 32 byte cache line size). That is, we have optimized our 32 bit software for our presumed platform which is 32 bit hardware. Now, if half of the CPUs in the box are hammering the first latch, there is no false sharing with the second. What’s this got to do with hardware bitness?The answer is in the fact that Oracle ports the x86 Linux release with a 32 bit system in mind.

Running 32 bit Code on a 64 bit CPU
The devil is in the details. Thus far our imaginary 2-latch system is optimized for hardware that operates on a 32 byte line. Since our structures fit within 4 32 byte lines or 2 64 byte lines should we execute on a x86_64 system there would be no false sharing so we must also be safe for a system with a 64 byte line, no? Well, true, there will be no false sharing between the two structures since they are now 2 64 byte lines as opposed to 4 32byte lines, but there is more to it.
Do you think it’s possible that the actual latch word in the structure might be adjacent (same 32 bytes) to anything interesting? Remember, heavily contended latches are constantly being tried for by processes on other CPUs. So if the holder of the latch writes on any other word in the cache line that holds the latch word, the processor coherency subsystem invalidates that line. To the other CPUs with processes spinning on the latch, this invalidation “looks” like the latch itself has been freed (changed from on to off) when in fact the latch is still held but an adjacent word in the same line was modified. This sort of madness absolutely thrashes a system. So, the dutiful port engineer rearranges the elements of the latch structure so that there is nothing else ever to be written in the same cache line that has the actual latch word. But remember, we ported to a 32 bit system with a 32 byte line. On the other hand, if you run this code on a 64 bit system–and therefore 64 byte lines–all of your effort to isolate the latch word from other write-mostly words was for naught. That is, if the cache line is now 64 bytes, any write by the latch holder in the first 64 bytes of the structure will cause invalidations (cache thrashing) for other processes trying to acquire the latch (spinning) on other CPUs. This isn’t a false sharing issue between 2 structures, but it has about the same effect.

Difficult Choices – 32 bit Software Optimized for 64 bit Hardware.
What if the porting engineer of our imaginary 2-latch system were to somehow know that the majority of 32 bit Linux servers would some day end up being 64 bit servers compatible with 32 bit software? Well, then he’d surely pad out the structure so that there are no frequently written words in the same 64 bytes in which the latch word resides. If the latch structure we have to begin with is 120 bytes, odds are quite slim that the percentage of read-mostly words will facilitate our need to pack the first 64 bytes with read-mostly objects along side the latch word. It’s a latch folks, it is not a read-mostly object! So what to do? Vapor!

Let’s say our 120 byte latch structure is a simple set of 30 words each being 4 bytes (remember we are dealing with a 32 bit port here). Let’s say further that there are only 4 read-mostly words in the bunch. In our imaginary 2 latch example, we’d have to set up the structure so that the first word is the latch, and the next 16 bytes are the 4 read-mostly elements. Now we have 20 bytes that need protection. To optimize this 32 bit code for a 64 bit processor, we’ll have to pad out to 64 bytes-with junk. So we’ll put an array of unsigned integers 11 deep (44 bytes) immediately after the latch word and our 4 read-mostly words. That fits nicely in 64 bytes-at the cost of wasting 44 bytes of processor cache for every single latch that comes through our processor caches. Think cache buffers chains folks! We aren’t done though.

We started with 120 bytes (30 4 byte words) and have placed only 5 of those words into their own cache line. We have 25 words, or 100 bytes left to deal with. Remember, we are the poor porting engineer that is doing an imaginary 32 bit software port optimized for 64 bit servers since nobody makes 32 bit servers any more. So, we’ll let the first 64 bytes of the remaining 100 fall into their own line. That leaves 36 bytes that we’ll also have to pad out to 64 bytes-there goes another 28 bytes of vapor. All told, we started with 120 bytes and wound up allocating 192 bytes so that our code will perform optimally on a processor that uses a 64 byte cache line. That’s a 60% increase in the footprint we leave on our processor caches which aren’t very large to start with. That’s how Oracle would have to optimize 32 bit Oracle for a 64 bit processor (x86 code on x86_64 kit). But they don’t because that would have been crazy. After all, 32 bit Oracle was intended to run on 32 bit hardware.

Porting, sound easy? Let me throw this one in there. It just so happens that the Oracle latch structure was in fact 120 bytes in Oracle8i on certain ports. Oh, and lest I forget, remember that Oracle keeps track of latch misses. What’s that got to do with this? Uh, that means processes that do not hold the latch increment counters in the latch structure when they miss. Imagine having one of those miss count words in the same line as the latch word itself!

This is tricky stuff.
Who Uses 32 bit Linux for Oracle These Days?
Finally, bugler, sound Taps.

A thread on oracle-l the other day got me thinking. The thread was about the difficulties being endured at a particular Linux RAC site that prompted the DBA there to audit what RPMs he has on the system. It appears as though everything installed was a revision “as high or higher” based on Oracle’s documented requirements. In his request for information from the list, I noticed uname(1) output that suggested he is using a 32 bit RHEL 4 system.

One place I always check for configuration information is Oracle’s Validated Configurations web page. This page covers Linux recipes for installation success. I just looked there to see if there was any help I could give offer that DBA and found that there are no 32 bit validated configurations!

I know there is a lot of 32 bit x86 hardware out there, but I doubt it is even possible to buy one today. Except for training or testing purposes I just can’t muster a reason to even use 32 bit Linux servers for the database tier at this point and to be honest, running a 32 bit port of Oracle on an x86_64 processor makes very little sense to me as well.

32 Responses to “Manly Men Only Deploy Oracle with 64 Bit Linux – Part I. What About a x86 Port on EM64T/AMD64 Hardware?”

Feed for this Entry Trackback Address

1 Tim Hall July 14, 2007 at 7:25 am

Nice post. Makes me think I should be getting some 64bit hardware to play on. 🙂

Cheers

Tim…

Reply
2 Jeremy Schneider July 20, 2007 at 6:08 pm

Just noticed this on the validated architecture FAQ:

http://www.oracle.com/technology/tech/linux/validated-configurations/validated-configurations-faq.html#x86

Why has Oracle chosen Linux x86-64 as the architecture for the initial configurations?

Oracle is seeing significant end-user demand for Linux x86-64 architectures and is fully committed to developing, advancing and promoting the 64-bit commodity Linux. All new chipsets and servers are now being shipped with x86-64 architecture, thereby offering a much wider hardware selection to end-users than some of the other architectures. Therefore, Oracle has chosen to initially make Oracle Validated Configurations available on Linux x86-64.

Reply
3 Alex Gorbachev July 25, 2007 at 10:54 pm

Heavy read Kevin. Very useful but very heavy for mere mortals. I wish you wrap it up as a presentation one day. 🙂 With pictures, cache lines sketches and etc. 🙂

Btw, here is one *rare* case when you might want to run 32 bit Oracle – if you install Grid Control on single node (DB + OMS). Grid Control is *still* lagging on 64 bit platform and 64 bit Linux GC releases are late. 1+ year ago, I’ve been told that Oracle doesn’t put GC on 64 bit in priority list because it doesn’t need more than 2-3 GB of memory but since then 64 bit releases of Grid Control have caught up quite a bit.

Reply
4 Jeff August 29, 2007 at 1:59 pm

If you’re running 64 bit Oracle on X86_64 and have 32 bit Oracle clients, does this change the equation?

Reply
5 kevinclosson August 29, 2007 at 3:15 pm

Jeff,

It doesn’t in my mind.

Reply
6 klynn October 31, 2007 at 6:18 pm

What if you’re running a 32-bit OS (libraries and all) but only the kernel is 64-bit. All programs are then running through the 32-bit “emulation”. Does Oracle support this config, and would there be considerable performance problems?

Reply
7 kevinclosson November 1, 2007 at 8:49 pm

klynn,

The simple answer is that running x86 Oracle against an x86_64 Linux kernel is not supported. BTW, that form of system setup you mention sounds really weird to me!

Reply
8 Lee Watkins April 24, 2013 at 3:30 pm

[6 years later] We’re porting a db from Oracle 8i on Tru64 Unix to Linux (because our Alphas are dying, finally, sadly… this is a legacy system that is going away soon but not soon enough, so have to keep it operational until then). Have it running on 32-bit Linux, running on 64-bit hardware, as you mention above, but need more SGA (and hugemem kernel doesn’t work for us) so would like run Oracle 8i on 64-bit Linux. Oracle support tells us that a 64-bit version of Oracle 8i exists for 64-bit Linux, but they don’t have the media. Any idea what version of Oracle 8i would be the one to run on 64-bit Linux? And where might we try to find it? Or any other suggestions?

Reply
- 9 kevinclosson April 24, 2013 at 5:25 pm
  
  Does the app absolutely mandate 8i? I’d love to see an apples test between Alpha 8i and Linux 8i(64b). That would be a cool comparison. Ping be through email as per my contact page.
  
  Reply
  - 10 Lee Watkins April 25, 2013 at 6:29 am
    
    Yes, unfortunately the app is an old, highly customized version of SQL*LIMS, a lab information management system originally from Applied Biosystems Inc. (makers of the capillary sequencers that were used for the Human Genome project, which is what we used to use for genotyping) now owned by LabVantage. Because of the customizations (done by ABI’s “Rapid Integration Services” group, not by us), we can’t upgrade to the latest version of SQL*LIMS that does run on newer versions of Oracle and is written primarily in Java, doesn’t use Oracle Forms, etc.. So far, some of our tests against 8i on 32-bit Linux are faster than they were on Tru64, but others are slower. The (Intel) hardware is much newer and faster, of course, so that’s got to help some. We also have other apps running against 10 and 11g on a small RAC cluster with very good performance. I’ll email you later today…
    
    Reply
- 11 Brian Pardy April 25, 2013 at 10:59 am
  
  In what way does the hugemem kernel not work for you? Will it not boot on your hardware, or is it that the DB won’t run or the performance is horrible? Getting that resolved might be the path of least resistance here.
  
  Or, if support can get you a 64-bit version of 9i or 9iR2 for 64-bit Linux, maybe your apps will run with the COMPATIBLE parameter set to 8.1.0?
  
  Reply
  - 12 Lee Watkins April 25, 2013 at 12:15 pm
    
    Brian, the db won’t start up properly with hugemem kernel:
    
    SVRMGR> startup
    ORA-27123: unable to attach to shared memory segment
    Linux Error: 22: Invalid argument
    Additional information: 1
    Additional information: 65537
    
    We have an external Oracle consultant who’s done it lots of times before and he can’t figure out what it’s not working in this case. So we’re stuck with SGA at 1.8GB for now, and performance is OK so far even though we had to increase it to 2.6GB on Tru64 due to performance issues. There are other trade-offs with the smp vs hugemem
    
    I’ll ask about 9i in compatibility mode, that’s a good thought. The legacy app in question is tightly integrated with Oracle and highly customized, so I’m not sure if it would work but it’s at least worth looking into.
    
    Reply
    - 13 Brian Pardy April 25, 2013 at 6:37 pm
      
      I love a good Linux/Oracle problem and all the more so if I might be able to help out some bio folks.
      
      Any chance you could post the contents of your /etc/sysctl.conf file, or at least the kernel.shmmax and kernel.shmall values? I’d also like to see the output of ‘cat /proc/sys/kernel/shmall’ and ‘cat /proc/sys/kernel/shmmax’ in case they’re different.
      
      Do you have a test system available with the same DB release, OS, kernel and hardware where you could try a few things out without disrupting operations?
      
      There’s an OTN post at https://forums.oracle.com/forums/thread.jspa?threadID=121441 showing how to edit ksms.s and relink the Oracle binary to get 8i up to a 2GB SGA, could be something like that might help. See also https://forums.oracle.com/forums/thread.jspa?threadID=86267.
      
      Oh how I’d enjoy ssh’ing to your dev box to try a few things.
      
      Reply
    - 14 kevinclosson April 25, 2013 at 9:42 pm
      
      Hi Lee,
      
      1.8 is about all you’re going to get without relocating the SGA to a lower attach address. Just won’t fit in the address space with the standard attach address. Can you do strace -f -o /tmp/foo svrmgrl <<EOF startup and grep the shmat calls out?
      
      Reply
15 James April 27, 2013 at 12:02 am

Hi Brian, the shmmax is 3g (3221225472), and the shmall (8388608) is also large enough to cover the 2.6g SGA that we are trying to do. We could increase the shmmax to 4gb or even 8gb, since we ahve 16gb RAM. We have tried to relink the softare for the memry address issue. However, there is no ksms.o file with 8i.

Reply
- 16 kevinclosson April 27, 2013 at 9:43 am
  
  @james : you have to generate the ksms.s file with the genksms tool, edit it and compile it. The SGA has been relocatable for eons.
  
  Reply
  - 17 James April 28, 2013 at 9:10 pm
    
    Hi Kevin, this is what we did and now we can startup the db to 1g or less SGA. Once we go anything above 1g, we got the same error.
    
    1. Shutdown any databases using the current “ORACLE_HOME”.
    
    2. Change your location to the “/lib” directory
    
    % cd $ORACLE_HOME/lib
    
    3. Make a backup copy of ‘libserver8.a’.
    
    % cp libserver8.a libserver8.a.orig
    
    4. Change your location to the “rdbms/lib” directory
    
    % cd $ORACLE_HOME/rdbms/lib
    
    5. Generate the “ksms.s” file
    
    % $ORACLE_HOME/bin/genksms -s 0x15000000 >ksms.s
    
    6. Regenerate the ‘ksms.o’ object:
    
    % make -f ins_rdbms.mk ksms.o
    
    7. Archive ‘ksms.o’ into ‘libserver8.a’
    
    %ar r $ORACLE_HOME/lib/libserver8.a ksms.o
    
    8. Relink
    
    % make -f ins_rdbms.mk ioracle
    
    Reply
    - 18 Brian Pardy April 29, 2013 at 7:25 am
      
      Interesting. So before adjusting ksms.s you were able to startup with an SGA of about 1.8GB, but after doing so it won’t allow anything above 1GB. Hopefully restoring the libserver8.a backup and relinking got you back being able to use a 1.8GB SGA.
      
      Are you attempting this while still running under the hugemem kernel?
      
      Reply
      - 19 James April 29, 2013 at 5:21 pm
        
        HI Brian, To clarify, 1) we can start the SGA 1.8g only if we boot the OS without using HugeMem. 2) All the above exercise are run under Hugemem kernel. 3) Under Hugemem, before we made those changes, we can’t start the database at all, not even with db buffer 2048. After we made those changes, we can start the db for SGA 1g or less. Errored out if above 1g. 4) If we reboot the OS without Hugemem, then we can start the DB with SGA 1.8g again. Thanks.
        
        Reply
    - 20 kevinclosson April 29, 2013 at 8:17 am
      
      @James : If you genksms with no args what attach address gets inserted into ksms.s ? Please post that. I have a hunch it’s about 800MB closer to the initialized data segment 🙂
      
      Reply
      - 21 James April 29, 2013 at 6:25 pm
        
        Kevin, if I do this: genksms >ksms.s, sgabeg starts 0X50000000.
        
        Reply
22 fkrot May 16, 2013 at 12:27 pm

Good day Brian and Kevin. This is Fil. I am working with Lee and James on the ora8i to Linux installation. Our most recent effort is to install 32bit 8i to a 64bit RH AS4, kernel 2.6.9-42.ELsmp, 16Gb RAM.
My understanding is that with large address space of the 64bit OS we should not need to recompile ksms.s and relocate SGA – is this correct?

However, the problem is, I am only able to start up the instance with very small SGA – about 340Mb :

Total System Global Area 341430432 bytes — 340Mb…
Fixed Size 73888 bytes
Variable Size 20385792 bytes
Database Buffers 320307200 bytes
Redo Buffers 663552 bytes

And the output of tstshm, if it is of any relevance, gives a very small sh.memory range:
[rh4ora8i-64bit-test ~]$ tstshm
Number of segments gotten by shmget() = 50
Number of segments attached by shmat() = 50
Segments attach at lower addresses
Default shared memory address = 0xf76e7000
Lowest shared memory address = 0xf12e7000
Highest shared memory address = 0xf76e7000
Total shared memory range = 106954752 <– 100Mb?
Total shared memory attached = 104857600
Largest single segment size = 2097152
Segment boundaries (SHMLBA) = 4096 (0x1000)

Any advice helps! Thank you.

Reply
- 23 kevinclosson May 16, 2013 at 4:23 pm
  
  Hi Fil: A 32b executable on 64bit Linux still is limited to 32b addressing. If only I had lab gear with which to investigate this curiosity…but I don’t. I’m surprised 8i 32b installed. Did you have to suffer relinking failures during the install and then go back in edit the Make file? You probably have to do a relink all? That is the first thing I’d do. Read the make script to see the directive for a relink all …or…way back in my archived brain I seem to recall a $ORACLE_HOME/bin tool that did relink all.
  
  Reply
24 fkrot May 18, 2013 at 3:39 am

Kevin, thank you for getting back! The intent of doing 8i on 64b RHAS4 was to get out the SGA limitation (we have to stick with 8i for the legacy application to work). From your response it sounds that with 32bit 8i limited to 4G addressing, we are under the same SGA binds ( ksms.s and then remake/relink oracle binaries? This would make sense to me, and then I’d see where I am currently stuck – with make/relink (I’d save the gory details of glibc/gcc/compat-*/jdk/ changes made for 8i install and the errors in ‘relink all’ output after patching p8174 for a personal email or another comment).

BTW, if the above is correct, and if we wanted to get >4Gb SGA with this 32bit 8i installation, should we then aim for the VLM approach that uses a shared memory-based filesystem? (http://docs.oracle.com/cd/B28359_01/server.111/b32009/appi_vlm.htm)

In light of all this, the 9i-64b alternative with COMPATIBLE=8.1.7 (as Brian suggested) looks increasingly more enticing. Only, that would require a _thorough_ test our legacy app… It is an Oracle Forms + Java app, and who knows how it would take to 9i.

Did your really mean that you’d be interested in looking at this “curiosity”? I may be able to set up a temp tunnel for you after clearing it up with a couple of people:)

Re: ‘relink all’ errors – one of them(but not the only one) was at the end of make -f ins_rdbms.mk ioracle long output :
/db01/app/oracle/product/8.1.7/lib//libserver8.a: could not read symbols: File format not recognized
collect2: ld returned 1 exit status
Prior to that libersver8.a was regenerated with genclntsh.

Reply
25 Brian Pardy May 22, 2013 at 10:25 am

Hi Fil,

Does your Red Hat 4 system have the ‘linux32’ binary, used to run an application in an emulated 32-bit environment? It may be available on your install media or from an online repository somewhere if it is not already installed. It’s possible that the tstshm output (and SGA limit) you’re seeing are due to running 32-bit binaries in a 64-bit environment without making use of the emulation provided by linux32.

You can start your work by just running “linux32 bash” to get a shell running in 32-bit emulation mode; I would be interested in seeing if that might increase the number of shared memory segments available to you. If tstshm from that shell gives you different results, I would then try starting Oracle from that same linux32 shell and see if you can increase SGA beyond 340MB.

I definitely agree with you that the testing it would take to validate your apps against 9i, even with COMPATIBLE set, would be huge. It sounds like something better worth avoiding.

Are you still trying the hugemem kernel? Based on the kernel string you posted there I’m guessing you’re not. So many knobs to twist here. Does this new install also have the same increased values for shmmax/shmall as James noted in his April 27th comment?

(If your question about getting an eyes-on look at the system was directed at me, sure, I would still have an interest in doing so. You’d probably be better off with someone more expert than I am but I wouldn’t mind giving it a shot…)

Reply
26 fkrot May 23, 2013 at 10:12 am

Brian, thanks for chiming in. The quick answers to your question:
— I’ve been starting/stopping Oracle from ‘linux32 bash’ shell
— just checked: tstshm gives me the same output with and without linux32
— shmmax : 8589729792 , shmmall : 3145728 (64-bit OS) – even larger that what James was using.
— in 64bit apparently HugePages are configured not by a kernel change but by a kernel parameter, e.g. vm.nr_hugepages = 1496 .. (I’m not 100% sure about the correctness of this, as I am just now going through this setup on 64-bit, following Doc 361468.1 “HugePages on Oracle Linux 64-bit”)

Re: taking a pick into this system, that was in response to Kevin’s earlier comment about investigating the curiosity, but I’d gladly have you look as well! Please email through this blog (or if you can’t, let me know how to eml you).

Reply
- 27 fkrot May 23, 2013 at 11:35 am
  
  I have increased the number of Hugepages in /etc/sysctl.conf to 1024 and rebooted the system, here is what I am seeing now:
  $ grep HugePages /proc/meminfo
  HugePages_Total: 1024
  HugePages_Free: 1024
  
  For some reason I am not seeing the other two expected lines of output, like ones listed in 361468.1 :
  …..
  HugePages_Rsvd: 446
  HugePages_Surp: 0
  
  Do you think there something missing in my HugePages configuration?
  
  For reference: $ uname -a
  Linux rh4ora8i-64bit-test.cidr.jhmi.edu 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 i686 i686 i386 GNU/Linux
  
  Consequently, I’m stoll not able to increase SGA past about 390Mb – continue getting crashes.
  It seems, I need to finish setting up tmpfs or ramfs for VLM Oracle configuration.
  
  BTW, the SGA memory attache address has been remapped (I noted this in an email to Kevin) :
  # ps -ef |grep ora_pmon
  oracle 4453 1 0 14:21 ? 00:00:00 ora_pmon_tst1
  # grep deleted /proc/4453/maps
  12001000-2963c000 rw-s 00000000 00:06 1933313 /SYSVbd59f7d0 (deleted)
  — the first number is changed from the default 50000000
  
  If you guys see anything else amiss, please comment. Thank you! -Fil
  
  Reply
28 fkrot May 23, 2013 at 5:18 pm

Also, stracing worked when I supplied a full connect string to sqlplus, with “sys/pwd@dbName as sysdba” . When I did it with just “/ as sysdba” (as normally done w/o strace), I got :
ERROR: ORA-12560: TNS:protocol adapter error.

The strace output during Oracle startup did not contain any shmat() or shmget() calls, but plenty of stat64(), fstat64(), and old_mmap() calls.
However, strace executed on a 32-bit linux has shmat() and shmget() calls..

Reply
- 29 Brian Pardy May 28, 2013 at 6:54 am
  
  Hi Fil,
  
  I haven’t found a way to email you through the site here so please feel free to email me — firstname.lastname@gmail.com.
  
  You’re correct about hugepages configuration happening through the sysctl parameter. What I’m not sure of is whether or not 8i is capable of using hugepages — I don’t see any reason why it couldn’t, but maybe someone else can answer that.
  
  What (if anything) do you have set for the hard/soft ‘memlock’ limits for your Oracle userid? I know for success with hugepages memlock needs to be set high enough for the hugepages grabbed by Oracle to be locked. The best hugepages configuration guide I’ve found is this one: http://dbakerber.wordpress.com/2012/03/14/configuring-hugepages-for-oracle-on-linux/
  
  Do you see any different behavior if you drop vm.nr_hugepages back to 0 and try starting Oracle without using hugepages at all?
  
  So linux32 doesn’t seem to help or hurt. I would stick with using linux32 as you have been doing when starting up the database just to avoid confusion from possibly loading 64 bit libraries.
  
  The tmpfs/ramfs idea is good. Do you have one of those mounted as /dev/shm now, and if so, what size is it? I’m not too familiar with RHEL 4 but it looks like you should ‘mount -t ramfs ramfs /dev/shm’ to make that available, and should also have the ‘use_indirect_data_buffers’ parameter set to TRUE in Oracle if that’s not already set.
  
  I think the answer to this is probably ‘no’, but you don’t by any chance have a huge shared pool or PGA or anything set up that might be stealing memory that would otherwise be available for the buffer cache, do you?
  
  Maybe you could copy/paste a copy of your init parameter file up on pastebin.com so that we can review the entire thing.
  
  Reply
30 fkrot June 4, 2013 at 6:25 pm

Brian, thank you for getting back, and sorry for such a late response!(I’ve had all the answers in my head right after reading your post 5 days ago..)

In your 2nd paragraph lies the answer: “What I’m not sure of is whether or not 8i is capable of using hugepages..” .
While working through an SR with OracleSupport last week on pretty much the same issue, I was directed to read several Metalink notes on 9i configurations on 32- and 64bit Linux. It became clear to me that 8i _only_ works with BIGPAGES, which was the TLB implementation in RHAS2.1. In RHEL3/4 Bigpages were replaced by HugePages, and are they are only supported by 9iR2, PatchSet 9.2.0.4 or later releases.
More specifically, from Metalink [ID 261889.1] Bigpages vs. Hugetlb on RedHat Linux :

>>>> A typical big server deployment in RHAS2.1 would use bigpages as a bootup parameter to preallocate a large chunk of memory to be used solely for shared memory. These pages have a 2MB or 4MB TLB entry that reduces the number of TLB misses and hence increases performance by a few percent.
The other advantage of using bigpages in RHAS2.1 was that it allowed the kernel VM not to worry too much about bookkeeping for that part of virtual memory. …
Enterprise Linux 3 has replaced bigpages with a feature called hugetlb, a backport of what is also in Linux kernel 2.6. There are a few differences in how hugetlb works. Hugetlb behavior is similar to that of bigpages; the pages are backed by large TLB entries, are not pageable, and are preallocated, which means that once you allocate x megabytes of hugetlb pages, that amount of physical memory can be used only through hugetlbfs or shm allocated with SHM_HUGETLB.
RHEL3 no longer requires a bootup parameter; it is dynamically adjustable. After the system has booted … you can put the value you want in ‘/etc/sysctl.conf’. The value is in megabytes, and it allocates several 2MB pages.
…………
Oracle Database 10g will do this by default; for Oracle9i Database, a patch is required and it’s downlodable by metalink searching for Patch 3318884 – Abstract: MERGE LABEL REQUEST ON TOP OF 9204 FOR 3267537 AND 3311507 available for 9.2.0.4 and 9.2.0.3
<<<< Author: Wim Coekaerts, Director of Linux Engineering

At the same time, I saw no Oracle documents that would even mention 8i in the context of large memory on either 32-bit or 64bit systems, not even in their Large Memory
general write-up, [ID 260152.1] "Linux Big SGA, Large Memory, VLM – White Paper".

Finally, I convinced myself that 8i does not handle BIGTABLE/HugePages, by looking at sqlplus strace output while starting up Oracle 8i on 64bit RHEL4 — I did not see shmat() or shmget() calls. By contrast, on a 32-bit system strace of Oracle startup shows calls to those functions, but without the SHM_HUGETLB flags.

At this point the course of our actions will probably be:
– Try out 9i, see if we can get our legacy DB imported there and the app working against it.
– If 9i does not work, then maybe look for RHAS2.1+8i, though I doubt we will be able to even reach the theoretical 2Gb SGA limit. We also have a Solaris installation coming our way slowly – this might resolve the whole 8i-on-64bit issue, since this was the most supported platform at the time.

I am still a bit at loss, why I can only get a small SGA(~0.7-0.8Gb) on my 32-bit system, even after lowering the SGA map attach address (0x12001000). This might be something to look into together. I'll email you and will pastebin init parameters.

Some "tails" from this discussion to clean up.
– Your questions about memlock: oracle's ulimit was set to 6291306 , which is like 12Gb in 2Mb pages, so – plenty. This was both hard and soft limits in the limits.conf
– Your questioin about other things "stealing" memory from buffer cache: shared pool was tried between 40 and 150Mb (fairly small), and I'm not sure how PGA gets reserved, but there is no activity in this testing DB, except just starting up. Other pool sizes:

shared_pool_size =140000000
large_pool_size = 34500000
java_pool_size = 140000000

– To anser my own earlier question: why I was not seeing the HugePages_Rsvd value in the grep of meminfo: that line only became available in RHEL5 or 6 – not available in my RHEL4..

BTW, I managed to get genclntsh and "relink all" to work on 64-bit by running it as "genclntsh -m32", which I picked up from an obscure site in Chinese;) (http://www.5icheese.com/forum.php?mod=viewthread&tid=141) .

Reply
31 Brian Pardy June 10, 2013 at 11:49 am

Hi Fil,

Great catch re bigpages vs hugepages and what can be used where. Since you’re probably really unlikely to find RHAS2.1, maybe you can try CentOS 2.1 — http://mirrors.cmich.edu/centos/2.1/. I haven’t used it but it’s supposed to be binary compatible with Red Hat and it is still available for download. Since it wouldn’t be a supported configuration either way, this might work for you. Then again if Solaris is en route, that may work better.

I don’t see anything else in those init parameters you listed that stick out to me as as possibly limiting your SGA.

There are probably enough moving parts here that no matter what you settle on, you’re in for a ton of re-validation work… maybe some small chance that CentOS + 8i + bigpages would be the least intrusive change but you may want to just ballpark a time estimate for validating on that setup vs just hitting 9i with compatible set.

You’re giving some new meaning to “legacy” here!

Reply

1 Direct I/O e asynchronous I/O « Oracle and other Trackback on July 20, 2007 at 1:56 pm

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage