In part I of my recent blog series on Linux hugepages and modern Oracle releases I closed the post by saying that future installments would materialize if I found any pitfalls. I don’t like to blog about bugs, but in cases where there is little material on the matter provided elsewhere I think it adds value. First, however, I’d like to offer links to parts I and II in the series:
- Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – I.
- Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – II.
The pitfall I’d like to bring to readers’ attention is a situation that can arise in the case where the Oracle Database 11g Release 2 11.2.0.2 parameter use_large_pages is set to “only” thus forcing the instance to either successfully allocate all shared memory from the hugepages pool or fail to boot. As I pointed out in parts I and II this is a great feature. However, after an instance is booted it stands to reason that other processes (e.g., Oracle instances) may in fact use hugepages thus drawing down the amount of free hugepages. In fact, it stands to reason that other uses of hugepages could totally deplete the hugepages pool.
So what happens to a running instance that successfully allocated its shared memory from the hugepages pool and hugepages are later externally drawn down? The answer is nothing. An instance can plod along just fine after instance startup even if hugepages continue to get drawn down to the point of total depletion. But is that the end of the story?
What Goes Up, Must (be able to) Come Down
OK, so for anyone that finds themselves in a situation where an instance is up and happy but HugePages_Free is zero the following is what to expect:
$ sqlplus '/ as sysdba' SQL*Plus: Release 11.2.0.2.0 Production on Wed Sep 29 17:32:32 2010 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to an idle instance. SQL> SQL> HOST grep -i huge /proc/meminfo HugePages_Total: 4663 HugePages_Free: 0 HugePages_Rsvd: 10 Hugepagesize: 2048 kB SQL> shutdown immediate ORA-01034: ORACLE not available ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 1 Additional information: 6422533 SQL>
Pay particular attention to the fact that sqlplus is telling us that it is attached to an idle instance! I assure you, this is erroneous. The instance is indeed up.
Yes, this is bug 10159556 (I filed it for what it is worth). The solution is to have ample hugepages as opposed to precisely enough. Note, in another shell a privileged user can dynamically allocate more hugepages (even a single hugepage) and the instance will be then able to be shutdown cleanly. As an aside, an instance in this situation can be shutdown with abort. I don’t aim to insinuate that this is some sort of zombie instance that will not go away.
You mean another process can steal memory which I have allocated? Do LOCK_SGA and/or PRE_PAGE_SGA work on Linux and if so, don’t they help to avoid this pitfall?
I have no experience with Linux hugepages (only with AIX large pages and Oracle 10g).
Thanks,
Flado
Fldo,
No, if a process succeeds at shmget with SHM_HUGETLB he owns the pages for that allocation. They will be on the reserved list until touched (faulted)… I’m talking about drawing down what free hugepages remain after the instance boots. The gist of the matter is that if you think you need precisely 1,000 hugepages for your instance you better configure 1,001.
Ah, got it. Thanks.
Does this still happen when trying to shutdown through an already-connected local session or a new/existing TNS session?
Cheers!
Flado
Hi Flado,
Good question. I’d have to test. I expect that if the booting foreground stays around (connected) long enough to be the one shutting down there will be no issue… I’ll see if I can test that (if it is less than 5 minutes testing I can probably do it).
Hi Kevin, I wonder if there is any way to understand for each particular shared segment if huge pages were allocated to it or not. For example: two 10g databases running on a box – one SGA is 16GB, another one is ~1GB. 9000 huge pages configured. After bringing databases up I can see:
grep -i huge /proc/meminfo
HugePages_Total: 9000
HugePages_Free: 7479
HugePages_Rsvd: 7153
Hugepagesize: 2048 kB
ipcs -m
—— Shared Memory Segments ——–
key shmid owner perms bytes nattch status
0xc6a860ec 0 oracle 600 1008730112 26
0x76953ab0 65537 oracle 660 4096 0
0x00000000 262146 oracle 600 17179869184 29
0x83cb2f70 294915 oracle 600 2097152 29
How could I get to know for sure if the small segment using huge pages?
Hi Nick,
There is no way to tell after the segment is allocated. Let me make a post about that.
At some point the permissions were the giveaway if the hugepages are used or not.
Permissions 600 -> yes, permissions 660 -> no.
However I have no evidence that this is indeed true 100% of the time. Just observations.
Let us know if you solidify that hypothesis, Christo. Thanks.
Hi,
found smth related, but don’t have access to read it. Probably you Kevin give us more details:
BUG:6620371 – HUGEPAGES CAUSES SHARED MEMORY SEGMENTS TO START WITH PERMISSIONS OF 600.
Kevin,
Ran across an article about transparent huge pages in recent linux kernels. One feature seems to add new process that can convert regular pages into huge pages and be able to swap out huge pages. Don’t know if it work with shared memory segments or not.
http://www.phoronix.com/scan.php?page=article&item=linux_transparent_hugepages&num=1
Scott,
I could only find references saying transparent huge pages are working only for anonymous memory mappings right now. So we have to wait for a shared-memory implementation.
do we need to include pga memory when calculating hugepages when using 11.2.0.3
No. PGA is heap.
Is there a test in SLOB that will show how much improvement you get when moving to Huge Pages?
Hi David,
Sure…if you baseline with a SLOB test that suffers the hugepages penalty. Remember what hugepages addresses: significant memory overhead for page tables. So if you have a large SGA and a lot of sessions (like real life) and baseline without hugepages you will see good benefit of enabling hugepages. Start with examining /proc/meminfo on the baseline to see how much page table cost there is to recoup. After enabling hugepages you should see improved DB CPU/IOPS unless you hit bottlenecks that stop you short of course.