BLOG UPDATE (05.14.09): The bug number for this PRE_PAGE_SGA with Automatic Memory Management issue is 8505803
It has been quite a while since I’ve blogged about Automatic Memory Management (AMM). I had to dig out the following three posts before making this blog entry just to see what I’ve said about AMM in the past:
- Oracle Database 11g Automatic Memory Management – Part I. Linux Hugepages Support
- Oracle Database 11g Automatic Memory Management – Part II. Automatically Stupid?
- Oracle Database 11g Automatic Memory Management – Part III. Automatically Automatic?
Recently my friend Steve Shaw of Intel reported to me that he has had some problems with combining AMM and the PRE_PAGE_SGA init.ora parameter. I’ve looked into this a bit and thought I’d throw out a quick heads-up post. I won’t blog yet about the specific PRE_PAGE_SGA related problem Steve saw, but there are rather generic problems with combining PRE_PAGE_SGA with AMM to warrant this blog entry.
I could make this a really short blog entry by simply warning not to combine PRE_PAGE_SGA with AMM, but that would be boring. Nonetheless, don’t combine PRE_PAGE_SGA with AMM. There is a bug in 11.1.0.7 with AMM where PRE_PAGE_SGA causes every process to touch every page of the entire AMM space—not just the SGA! This has significant impact on page table consumption and session connect time. To make some sense out of this, consider the following…
I’ll set the following init.ora parameters:
MEMORY_TARGET=8G
SGA_TARGET= 100M
PARALLEL_MAX_SERVERS = 0
Next, I booted the instance and took a peek at ps(1) output. As you can see, every background process has a resident set of roughly 8G. Ignore the SZ column since it is totally useless on Linux (see the man page). Actually, that topic also warrants a post in the Little Things Doth Crabby Make series! Sorry, I digress. Anyway, here is the ps(1) output:
$ ps -elF | grep -v grep | grep -v ASM | egrep 'RSS|test' F S UID PID PPID C PRI NI ADDR SZ WCHAN RSS PSR STIME TTY TIME CMD 4 S root 27940 1 0 85 0 - 3276 pipe_w 1000 7 09:35 ? 00:00:00 ora_dism_test1 0 S oracle 27943 1 8 75 0 - 2162332 - 8404660 2 09:35 ? 00:00:06 ora_pmon_test1 0 S oracle 28022 1 3 58 - - 2161773 - 8403480 0 09:36 ? 00:00:02 ora_vktm_test1 0 S oracle 28077 1 3 75 0 - 2163861 159558 8411180 2 09:36 ? 00:00:02 ora_diag_test1 0 S oracle 28141 1 3 75 0 - 2162467 - 8405576 2 09:36 ? 00:00:02 ora_dbrm_test1 0 S oracle 28157 1 3 76 0 - 2161774 150797 8403900 2 09:36 ? 00:00:02 ora_ping_test1 0 S oracle 28171 1 4 75 0 - 2162570 - 8405808 6 09:36 ? 00:00:02 ora_psp0_test1 0 S oracle 28187 1 4 78 0 - 2161774 - 8403480 2 09:36 ? 00:00:02 ora_acms_test1 0 S oracle 28201 1 4 75 0 - 2164626 126590 8414448 7 09:36 ? 00:00:02 ora_dia0_test1 0 S oracle 28256 1 4 75 0 - 2164131 159729 8413160 6 09:36 ? 00:00:02 ora_lmon_test1 0 S oracle 28306 1 4 75 0 - 2165750 276166 8419012 2 09:36 ? 00:00:02 ora_lmd0_test1 0 S oracle 28364 1 5 58 - - 2165485 276166 8418852 2 09:36 ? 00:00:02 ora_lms0_test1 0 S oracle 28382 1 5 58 - - 2165485 277884 8418848 3 09:36 ? 00:00:02 ora_lms1_test1 0 S oracle 28398 1 5 75 0 - 2161773 - 8403504 2 09:36 ? 00:00:02 ora_rms0_test1 0 S oracle 28412 1 6 78 0 - 2161774 - 8403752 7 09:36 ? 00:00:02 ora_mman_test1 0 S oracle 28426 1 6 75 0 - 2162572 - 8406832 3 09:36 ? 00:00:02 ora_dbw0_test1 0 S oracle 28491 1 6 75 0 - 2161773 - 8403720 2 09:36 ? 00:00:02 ora_lgwr_test1 0 S oracle 28550 1 7 75 0 - 2162467 - 8406164 3 09:36 ? 00:00:02 ora_ckpt_test1 0 S oracle 28608 1 7 78 0 - 2161774 - 8403428 2 09:36 ? 00:00:02 ora_smon_test1 0 S oracle 28624 1 8 78 0 - 2161773 - 8403500 2 09:36 ? 00:00:02 ora_reco_test1 0 S oracle 28638 1 9 75 0 - 2162560 - 8406436 2 09:36 ? 00:00:02 ora_rbal_test1 0 S oracle 28652 1 9 78 0 - 2162487 pipe_w 8407412 2 09:36 ? 00:00:02 ora_asmb_test1 0 S oracle 28666 1 10 75 0 - 2161773 - 8404092 2 09:36 ? 00:00:02 ora_mmon_test1 0 S oracle 28729 1 12 75 0 - 2161773 - 8403528 2 09:36 ? 00:00:02 ora_mmnl_test1 0 S oracle 28776 1 14 75 0 - 2162597 277884 8406572 3 09:36 ? 00:00:02 ora_lck0_test1 0 S oracle 28860 1 19 75 0 - 2162597 276166 8406220 2 09:36 ? 00:00:02 ora_rsmn_test1 0 S oracle 28893 23881 37 78 0 - 2162210 - 8409436 2 09:37 ? 00:00:02 oracletest1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 0 S oracle 28914 1 54 81 0 - 2162070 - 8405276 2 09:37 ? 00:00:02 ora_o000_test1 0 R oracle 28997 1 99 82 0 - 2161643 - 7213532 0 09:37 ? 00:00:01 ora_dskm_test1 $ grep -i pagetable /proc/meminfo PageTables: 590404 kB
As you can see I followed up the ps command with a grep for how much memory is being spent on page tables. With all these 8GB resident sets it looks like roughly 575MB. That got me to thinking, what would other init.ora combinations result in. Those 575MB page tables were begat of 8G MEMORY_TARGET and no PQO slaves. I wrote a couple of quick and dirty scripts to probe around for some other values.
I created 6 init.ora files where, not surprisingly, the only setting that varied was the number of PQ slaves. MEMORY_TARGET and SGA_TARGET remained constant. The following script is the driver. It boots the instance with 16,32…or 96 PQ slaves, sleeps for 5 seconds and then executes the rss.sh script also listed in the following box:
$ cat doit.sh for i in 16 32 48 64 80 96 do sqlplus '/ as sysdba' <<EOF startup force pfile=./$i.ora host sleep 5 host sh ./rss.sh "MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE $i SLAVES" >> rss.out exit; EOF done $ cat rss.sh DESC="$1" RSS=`ps -elF | grep test | grep -v ASM | grep -v grep | awk '{ t=t+$12 } END { printf("%7.2lf\n", (t * 1024) / 2^ 30 ) }'` PT=`grep -i paget /proc/meminfo | awk '{ print $2 }'` echo "$RSS $PT $DESC"
The rss.sh script sums up the resident set sizes of all the interesting processes and reports it in gigabytes. The script also reports the page table size in KB. The script puts the interesting output in a file called rss.out. The following box shows the output generated by the script. The first line of output is with 16 PQ slaves, the next is 32 PQ slaves and so forth through the 6th line which used 96 PQ slaves.
$ cat rss.out 391.84 838644 MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE 16 SLAVES 529.00 1124688 MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE 32 SLAVES 657.03 1391100 MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE 48 SLAVES 785.32 1658000 MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE 64 SLAVES 918.41 1935368 MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE 80 SLAVES 1041.08 2190548 MEMORY_TARGET=8G SGA_TARGET=100M PRE_PAGE_SGA=TRUE 96 SLAVES
Pretty cut and dried. The aggregate RSS grows by roughly 8GB x 16 in accordance with each increment of 16 PQ slaves and the page tables grow to roughly 2GB through the increases in PQ slave count.
Bug Number: 42
I don’t have the bug number for this one yet. But it is a bug. Just don’t use PRE_PAGE_SGA with AMM. That setting was very significant many years ago for reasons that had mostly to do with ensuring Oracle on BSD-derived Unix implementations didn’t suffer from swapable SGA pages. The PRE_PAGE_SGA functionality ensured that each page was multiply referenced and therefore could not leave physical memory. But that was a long time ago. Time for old dogs to learn new tricks. And, no, my friend Steve Shaw does not suffer from old-dog-clamoring-for-new-trickitis. As I said above, I fully intend to blog about what Steve ran into with his recent PRE_PAGE_SGA related issue…soon.
By the way, did I forget to mention that you really shouldn’t combine PRE_PAGE_SGA with AMM? Like they say, the memory is the first thing to go…
And, before I forget, this is 11.1.0.7 on 64-bit Linux. I have no idea how PRE_PAGE_SGA works on other platforms. Maybe Glenn or Tanel will chime in on a Solaris x64 result?
Oh, I am forgetful today. I nearly forgot to mention that with AMM, PRE_PAGE_SGAand a 8G MEMORY_TARGET, a simple connect as scott/tiger followed by an immediate exit takes 2.3 seconds on Xeon 5400 processors. With PRE_PAGE_SGA commented out, the same test completes in .19 seconds. Hey, I should start rambling on about recovering 12x performance! 🙂
Kevin,
Off topic for this blog post I know, but given your interest in all thinkgs NUMA, I wonder if you’d possibly comment on a recent article that popped up on metalink – 759565.1
It seems to be a very strong steer to NOT enable Oracle’s NUMA optimizations if you want a stable system, although it’s not clear whether is is excluding 11.1.0.7 from this.
Thanks
Duncan
I have always advised against PRE_PAGE_SGA, mainly because I didn’t think it was necessary, at least for Solaris. Solaris uses ISM which is locked shared memory usually backed by large pages (256M on SPARC).But with Intel only 2M pages are supported. It might be time to do some experiments with x64/Solaris since the supported page sizes are so small. Maybe, in the future we can see some larger page sizes? The memory sizes just continue to grow and with more threading on the way, it will be necessary.
Glenn,
ISM only in manual sga or sga_max = sga_target.
In automatic where sga_target < sga_max, it uses DISM, which is swappable and requires ORADISM process to lock it in memory.
Been swappable also means you need the size of your SGA in swap space available.
Hello Kevin,
What I cannot understand ( at least from the Oracle documenation and Metalink notes ) is what is the actual benefit from PRE_PAGE_SGA=TRUE and huge pages together( without AMM -i.e is turned off) We tried to run 80GB SGA on 11gR1( RHEL4 64 bit ) with huge pages ,AMM off and pre_page_sga=TRUE and the instance just cannot start . I have SR 7712866.994 opened on that( I think is also quoted on Oracle’s BDA forum) and so far at least I am not clear what is the reason .On 10.2.0.4/RHEL5(64 bit )/AMM off with around 28GB SGA that works just fine . Where I am getting is the Oracle documenation does not seem to provide answer (or at least I cannot find it ) is there any benefit of using pre_page_sga=TRUE and huge pages .Thanks
Regards,
Alex
What I’m getting from Oracle doc is: if you want to use pre_page_sga, you’d better use huge page, since huge page size decrease page number.
“The advantage that PRE_PAGE_SGA can afford depends on page size. ” – http://docs.oracle.com/cd/E11882_01/server.112/e25513/initparams197.htm#CHDHDAJC