In the comment thread of my recent blog entry entitled Of Gag-Orders, Excitement, and New Products, a fellow blogger, Jeff Hunter wrote:
I’d be happy if the major innovation was being able to run a 10.2.0.4 16G SGA on x86_64.
He offered a link to a thread on his blog where he has been chronicling his unsuccessful attempts to boot a 16GB SGA on the same iron that seemed to have no problem doing so with 10.2.0.3.
What’s New?
Oracle Database 10g release 10.2.0.4 has additional rudimentary support for NUMA in the Linux port, true, but Jeff has tried with NUMA enabled and disabled (via boot options) none of which has fixed his problems. In his latest installment on this thread I noticed that the title of the post has renamed the thread to “The Great NUMA debate” and the post ends with Jeff reporting that he still is having trouble with his 16GB SGA, but also that he can’t boot even a 4GB SGA. Jeff wrote:
I still couldn’t start a 16GB SGA. Interestingly enough, I couldn’t start a 4G SGA either! I had to go back to booting without numa=off. The saga continues…
Unfortunately, I can’t jump in and debug what is wrong on his configuration and I don’t know what the debate is. However, I can take a moment to post evidence that Oracle Database 10g 10.2.0.4 can in fact boot a 16GB SGA-in both AMD Opteron SUMA mode and NUMA mode. No, I don’t have any large memory AMD systems around to test this myself. But I certainly use to. So, I decided to call in a favor to my old friend Mary Meredith (yes, old Sequent folks stick together) who has taken over for me in the role I vacated at HP/PolyServe when left to join Oracle. I asked Mary if she’d mind booting a 16GB SGA on one of those large memory AMD systems I use to have available to me…and she did:
$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 6 09:15:35 2008 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to an idle instance. SQL> startup pfile=create1.ora ORACLE instance started. Total System Global Area 1.7700E+10 bytes Fixed Size 2115104 bytes Variable Size 503319008 bytes Database Buffers 1.7180E+10 bytes Redo Buffers 14659584 bytes Database mounted. Database opened. $ numactl --hardware available: 1 nodes (0-0) node 0 size: 32146 MB node 0 free: 13821 MB node distances: node 0 0: 10
So, here we see 10.2.0.4 on a SUMA-configured Proliant DL585 with a 16GB buffer pool. I asked Mary if she’d be willing to boot in NUMA mode (Linux boot option) and give it a try, and she did:
$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 7 10:03:35 2008 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to an idle instance. SQL> startup pfile=create1.ora ORACLE instance started. Total System Global Area 1.7700E+10 bytes Fixed Size 2115104 bytes Variable Size 503319008 bytes Database Buffers 1.7180E+10 bytes Redo Buffers 14659584 bytes Database mounted. Database opened. SQL> quit
But she reported that she didn’t get any hugepages:
$ cat /proc/meminfo|grep Huge HugePages_Total: 8182 HugePages_Free: 8182 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
I pointed out that 8192 2MB hugepages is not big enough. I recommended she up that to 8500 and then start the database up under strace so we could capture the shmget() call to ensure it was flagging in SHM_HUGETLB, and it was:
$ cat /proc/meminfo|grep Huge HugePages_Total: 8500 HugePages_Free: 7132 HugePages_Rsvd: 7073 Hugepagesize: 2048 kB
And from the strace:
6510 shmget(0x1420f290, 17702060032, IPC_CREAT|IPC_EXCL|SHM_HUGETLB|0600) = 393219
And…
$ ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 644 72 2 0x00000000 32769 root 644 16384 2 0x00000000 65538 root 644 280 2 0x1420f290 393219 oracle 600 17702060032 12
Also, in the NUMA configuration we see a good, even distribution of pages allocated from each of the “nodes”, with the exception of node zero which until Linux gets fully NUMA-aware will always be over-consumed:
$ numactl --hardware available: 4 nodes (0-3) node 0 size: 7906 MB node 0 free: 2025 MB node 1 size: 8080 MB node 1 free: 3920 MB node 2 size: 8080 MB node 2 free: 3969 MB node 3 size: 8080 MB node 3 free: 3926 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10
We also see that the shmget() call did flag in SHM_HUGETLB and correspondingly we see the shmkey in the ipcs output. We also see hugepages being used, although mostly just reserved.
So, I haven’t been able to see Jeff’s strace output or other such diagnostic information so I can’t help there. However, this blog post is meant to be a confidence booster to any wayward googler who might happen to be having difficulty booting a VLM SGA on AMD Opteron running Linux with Oracle Database 10g release 10.2.0.4.
Extra Credit
So, if Mary had booted in NUMA mode without hugepages, does anyone think it would have resulted in such a nice even consumption of pages from the nodes, or would it have looked like Cyclops? We all recall Cyclops, don’t we? In case you don’t here is a link:
Oracle on Opteron with Linux–The NUMA Angle Part VI. Introducing Cyclops.
Pardon the apparent cynic comment, Kevin: it isn’t meant that way and you know it isn’t. Others might not, hence the disclaimer.
But isn’t it about time Oracle got itself configured automatically for all these things, maybe with a single and easy to recall parameter?
I mean: you and I (well, I can spell the things involved, you actually *know* them) and many others of our crop can grok what’s going on and adjust accordingly.
I doubt if a handfull of dbas of the last 8 year crop will be able to even knock on the door of what’s ticking and why, other than: “I can’t get it going”.
Which is not the case with Jeff, BTW: I think his is just a weird h/w+s/w combo causing the probs. Hey, it happens in the best families: don’t anyone get me started on oracle+aix… 🙂
But you get the picture. This is the sort of thing I’d expect the Linux port to be able to figger by itself, preferably without one having to cough up a licence of the “linux/numa” performance pack for grid/oem at a cost that pales the national debt of a medium-size country.
Or am I aiming too high?
I don’t think you’re entirely off base, Noons. It really shouldn’t be that way. Even though it is reactive to want so, I do wish I could get a full diagnostic from Jeff because I can’t for the life of me figure out how his setup could be that broken. I’ve asked for strace but I really could use init.ora, strace and the alert log. Since he is currently not able to boot even a 4GB SGA something is really fishy. I think Jeff will get some click-throughs and perhaps he’ll join this thread. Jeff is sharp, so this is likely not a surface problem. Maybe we could learn something that could come in handy for 10.2.0.5, and, as usual, help the wayward googler someday.
I emailed you the strace output to the email indicated in the “Appearances/Contact” section of this blog.
After following through some of the examples, I may have more cluses about my issue. Although my Hugepagesize is 2048kB, the output from /etc/meminfo indicates my HugePages_total and HugePages_free are both 0.
Kevin,
Do you know what version of linux Mary was running when she started oracle 10.2.0.4 up on the proliant with 16G of sga? Was it redhat 4 or redhat 5 and which update?
Mary’s oracle appears to only use 1 shared memory segment. I notice recently that my linux x86_64 box uses multiple shared memory segment but my 10.2.0.4 x86 one uses 1 single shared memory segment. Just curious to see if we’re all on the same version of redhat or linux.
Michael
Maybe I can offer some twist as well. I have similar issues. My issue though has to do with pre page sga- true. I can in fact boot any size SGA so far as I get the huge pages proportionately higher than sga. The ratio is a mystery to me at this moment but hoevers around 65% of the SGA. That is if the huge page is 35% higher than size of SGA.
Take a look at my test results below
Test Results
With pre_page_ture
SGA=12g,13g, 14g
Huge Pages= 16GB
I get the ora-00443 background process “PMON” did not start
With pre_page_ture
SGA=11g and below
Huge Pages= 16GB
Works okay
With pre_page_ture
SGA=10g , 9g 8g
Huge Pages= 11GB
I get the ora-00443 background process “PMON” did not start
Problem resolved when sga dropped to 7g
Final series of tests
With pre_page_sga =true
SGA=21g
Huge Pages= 24GB
I get the ora-00443 background process “PMON” did not start
With pre_page_sga =false
SGA=21g
Huge Pages= 24GB
No issues
With pre_page_sga =true
SGA=21g,23g 24gb
Huge Pages= 30GB
No issues.
In all cases the NUMA optimization is set off. With NUMA optimization set to true any size SGA can be booted and use huge pages regardless of the size of the huge pages provided it is bigger than the SGA…even by a few Mbytes
I am battling this one with oracle now as we speak.
Anthony, maybe you can share which Oracle 10G and Linux you have tested the above. I’m planning to test similar parameters with 30GB SGA.
I’m running 2 oracle 11.2 databases on SUSE Linux Enterprise Server (SLES) having total 16 gb physical
memory.
1) sga_target 6gb sga_max_size =8g
2) sga_target=sga_max_size=1536 mb
I configured huges pages and
I run hugepages_settings.sh as per the document
Document 401749.1 and it s giving
Recommended setting: vm.nr_hugepages = 4868
which is looking wrong to me.
and also getting following message via OEM
Significant virtual memory paging was detected on the host operating system.
oracle@oracle1:/tmp> cat /proc/meminfo | grep Huge
HugePages_Total: 4868
HugePages_Free: 28
HugePages_Rsvd: 26
HugePages_Surp: 0
Hugepagesize: 2048 kB
I am getting this error
PROCESS J000 And M000 Die
Process J000 died, kkjcre1p: unable to spawn jobq slave process
ORA-00445: Background Process “J000” Did Not Start After 120 Seconds
You need to aggregate up to a number that fits both. 4868 might be enough for the 8g instance but not enough for both instances.You need somewhere north of 5000 hugepages. If you are just working this out I receommend you set hugepages to 6000, boot both instances and examine /proc/meminfo see how over-configured you are, adjust down and then keep thoese values.
There is some patch to 11.2 that implements an init.ora parameter (can’t remember specifics at the moment) that forces the instance to use hugepages or fail the instance startup. That’s the best way to go.