Archive for the 'AMD 6100 (Magny-Cours)' Category

You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part III.

By The Way, How Many NUMA Nodes Is Your AMD Opteron 6100-Based Server?

In my on-going series about Oracle Database 11g configuration for NUMA systems I’ve spoken of the enabling parameter and how it changed from _enable_NUMA_optimization (11.1) to _enable_NUMA_support (11.2). For convenience sake I’ll point to the other two posts in the series for folks that care to catch up.

What does AMD Opteron 6100 (Magny-Cours) have to do with my on-going series on enabling/disabling NUMA features in Oracle Database? That’s a good question. However, wouldn’t it be premature to just presume each of these 12-core processors is a NUMA node?

The AMD Opteron 6100 is a Multi-Chip Module (MCM). The “package” is two hex-core processors essentially “glued” together and placed into a socket. Each die has its own memory controller (hint, hint). I wonder what the Operating System sees in the case of a 4-socket server? Let’s take a peek.

The following is output from the numactl(8) command on a 4s48c Opteron 6100 (G34)-based server:

# numactl --hardware
available: 8 nodes (0-7)
node 0 size: 8060 MB
node 0 free: 7152 MB
node 1 size: 16160 MB
node 1 free: 16007 MB
node 2 size: 8080 MB
node 2 free: 8052 MB
node 3 size: 16160 MB
node 3 free: 15512 MB
node 4 size: 8080 MB
node 4 free: 8063 MB
node 5 size: 16160 MB
node 5 free: 15974 MB
node 6 size: 8080 MB
node 6 free: 8051 MB
node 7 size: 16160 MB
node 7 free: 15519 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  16  16  22  16  22  16  22
  1:  16  10  22  16  16  22  22  16
  2:  16  22  10  16  16  16  16  16
  3:  22  16  16  10  16  16  22  22
  4:  16  16  16  16  10  16  16  22
  5:  22  22  16  16  16  10  22  16
  6:  16  22  16  22  16  22  10  16
  7:  22  16  16  22  22  16  16  10

Heft
It wasn’t that long ago that an 8-node NUMA system was so large that a fork lift was necessary to move it about (think Sequent, SGI, DG, DEC etc). Even much more recent 8-socket (thus 8 NUMA nodes) servers were a 2-man lift and quite large (e.g., 7U HP Proliant DL785). These days, however, an 8-node NUMA system like the AMD Opteron 6100 (G34) comes in a 2U package!

Is it time yet to stop thinking that NUMA is niche technology?

I’ll blog soon about booting Oracle to test NUMA optimizations on these 8-node servers.

Intel Xeon 5600 (Westmere EP) vs AMD Opteron 6100 (Magny-Cours)

AnandTech has a significant amount of fresh coverage of the AMD 6100 (Magny-Cours) versus the Westmere EP. It is an interesting read:

Intel Xeon 5600 (Westmere EP) versus AMD Opteron 6100 (Magny-Cours)

It’s getting quite wordy now that it takes some three syllables to express how many cores there are in some processors. We’ve gone beyond hex-core and oct-core to dodeca-core. Hmmm…

You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part I.

In May 2009 I made a blog entry entitled You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part II. There had not yet been a Part I but as I pointed out in that post I would loop back and make Part I. Here it is. Better late than never.

Background
I originally planned to use Part I to stroll down memory lane (back to 1995) with a story about the then VP of Oracle RDBMS Development’s initial impression about the Sequent DYNIX/ptx NUMA API during a session where we presented it and how it would be beneficial to code to NUMA APIs sooner rather than later. We were mixing vision with the specific need of our port to be honest.

We were the first to have a production NUMA API to which Oracle could port and we were quite a bit sooner to the whole NUMA trend than anyone else. Our’s was the first production NUMA system.

Now, this VP is no longer at Oracle but the  (redacted) response was, “Why would we want to use any of this ^#$%.”  We (me and the three others presenting the API) were caught off guard. However, we all knew that the question was a really good question. There were still good companies making really tight, high-end SMPs with uniform memory.  Just because we (Sequent) had to move into NUMA architecture didn’t mean we were blind to the reality around us. However, one thing we knew for sure—all systems in the future would have NUMA attributes of varying levels. All our competition was either in varying stages of denial or doing what I like to refer to as “Poo-pooh it while you do it.” All the major players eventually came out with NUMA systems.  Some sooner, some later and the others died trying.

That takes us to Commodity NUMA and the new purpose of this “Part I” post.

Before I say a word about this Part I I’d like to point out that the concepts in Part II are of a “must-know” variety unless you relinquish your computing power to some sort of hosted facility where you don’t have the luxury of caring about the architecture upon which you run Oracle Database.

Part II was about the different types of NUMA (historical and present) and such knowledge will help you if you find yourself in a troubling performance situation that relates to NUMA. NUMA is commodity, as I point out, and we have to come to grips with that.

What Is He Blogging About?
The current state of commodity NUMA is very peculiar. These Commodity NUMA Implementations (CNI) systems are so tightly coupled that most folks don’t even realize they are running on a NUMA system. In fact, let me go out on a ledge. I assert that nobody is configuring Oracle Database 11g Release 2 with NUMA optimizations in spite of the fact that they are on a NUMA box (e.g., Nehalem EP, AMD Opterton). The reason I believe this is because the init.ora parameter to invoke Oracle NUMA awareness changed names from 11gR1 to 11gR2 as per My Oracle Support note 864633.1. The parameter changed from _enable_NUMA_optimization to enable_NUMA_support. I know nobody is setting this because if they had I can almost guarantee they would have googled for problems. Allow me to explain.

If Nobody is Googling It, Nobody is Doing It
Anyone who tests _enable_NUMA_support as per My Oracle Support note 864633.1 will likely experience the sorts of problems that I detail later in this post. But first, let’s see what they would get from google when they search for _enable_NUMA_support:

Yes, just as I thought…Google found nothing. But what is my point? My point is two-fold. First, I happen to know that Nehalem EP  with QPI and Opteron with AMD HyperTransport are such good technologies that you really don’t have to care that much about NUMA software optimizations. At least to this point of the game. Reading M.O.S note 1053332.1 (regards disabling Linux NUMA support for Oracle Database Machine hosts) sort of drives that point home. However, saying you don’t need to care about NUMA doesn’t mean you shouldn’t experiment. How can anyone say that setting _enable_NUMA_support is a total placebo in all cases? One can’t prove a negative.

If you dare, trust me when I say that an understanding of NUMA will be as essential in the next 10 years as understanding SMP (parallelism and concurrency) was in the last 20 years. OK, off my soapbox.

Some Lessons in Enabling Oracle NUMA Optimizations with Oracle Database 11g Release 2
This section of the blog aims to point out that even when you think you might have tested Oracle NUMA optimizations there is a chance you didn’t. You have to know the way to ensure you have NUMA optimizations in play. Why? Well, if the configuration is not right for enabling NUMA features, Oracle Database will simply ignore you. Consider the following session where I demonstrate the following:

  1. Evidence that I am on a NUMA system (numactl(8))
  2. I started up an instance with a pfile (p4.ora) that has _enable_NUMA_support set to TRUE
  3. The instance started but _enable_NUMA_support was forced back to FALSE

Note, in spite of event #3, the alert log will not report anything to you about what went wrong.

SQL>
SQL> !numactl --hardware
available: 2 nodes (0-1)
node 0 size: 36317 MB
node 0 free: 31761 MB
node 1 size: 36360 MB
node 1 free: 35425 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

SQL> startup pfile=./p4.ora
ORACLE instance started.

Total System Global Area 5746786304 bytes
Fixed Size                  2213216 bytes
Variable Size            1207962272 bytes
Database Buffers         4294967296 bytes
Redo Buffers              241643520 bytes
Database mounted.
Database opened.
SQL> show parameter _enable_NUMA_support

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_enable_NUMA_support                 boolean     FALSE

SQL>
SQL> !grep _enable_NUMA_support ./p4.ora
_enable_NUMA_support=TRUE

OK, so the instance is up and the parameter was reverted, what does the IPC shared memory segment look like?

SQL> !ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          root      644        72         2
0x00000000 32769      root      644        16384      2
0x00000000 65538      root      644        280        2
0xed304ac0 229380     oracle    660        4096       0
0x7393f7f4 1179653    oracle    660        5773459456 35
0x00000000 393223     oracle    644        790528     5          dest
0x00000000 425992     oracle    644        790528     5          dest
0x00000000 458761     oracle    644        790528     5          dest

Right, so I have no NUMA placement of the buffer pool. On Linux, Oracle must create multiple segments and allocate them on specific NUMA nodes (memory hierarchies). It was a little simpler for the first NUMA-aware port of Oracle (Sequent) since the APIs allowed for the creation of a single shared memory segment with regions of the segment placed onto different memories. Ho Hum.

What Went Wrong
Oracle could not find the libnuma.so it wanted to link with dlopen():

$ grep libnuma /tmp/strace.out | grep ENOENT | head
14626 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)
14627 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)

So I create the necessary symbolic link and subsequently boot the instance and inspect the shared memory segments. Here I see that I have a ~1GB segment for the variable SGA components and my buffer pool has been segmented into two roughly 2.3 GB segments.

# ls -l /usr/*64*/*numa*
lrwxrwxrwx 1 root root    23 Mar 17 09:25 /usr/lib64/libnuma.so -> /usr/lib64/libnuma.so.1
-rwxr-xr-x 1 root root 21752 Jul  7  2009 /usr/lib64/libnuma.so.1

SQL> show parameter db_cache_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_cache_size                        big integer 4G
SQL> show parameter NUMA_support

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_enable_NUMA_support                 boolean     TRUE
SQL> !ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          root      644        72         2
0x00000000 32769      root      644        16384      2
0x00000000 65538      root      644        280        2
0xed304ac0 229380     oracle    660        4096       0
0x00000000 2719749    oracle    660        1006632960 35
0x00000000 2752518    oracle    660        2483027968 35
0x00000000 393223     oracle    644        790528     6          dest
0x00000000 425992     oracle    644        790528     6          dest
0x00000000 458761     oracle    644        790528     6          dest
0x00000000 2785290    oracle    660        2281701376 35
0x7393f7f4 2818059    oracle    660        2097152    35

So there I have an SGA successfully created with _enable_NUMA_support set to TRUE. But, what strings appear in the alert log? Well, I’ll blog that soon because it leads me to other content.

Done Blogging or Dumb Blogging?

Some of one and none of the other actually…

I’ve received a couple of emails wondering what’s happened to my blogging. No worries, just really busy.  I’ve been putting some (very) interesting hardware through the performance wringer lately.

What, doesn’t everyone’s performance sandbox look like the following?

# mpstat -P ALL 5
05:14:24 PM  all   91.99    0.00    8.01    0.00    0.00    0.00    0.00    0.00   1117.00
05:14:24 PM    0   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00   1000.33
05:14:24 PM    1   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM    2   92.67    0.00    7.33    0.00    0.00    0.00    0.00    0.00     58.00
05:14:24 PM    3   92.98    0.00    7.02    0.00    0.00    0.00    0.00    0.00      0.67
05:14:24 PM    4   94.00    0.00    6.00    0.00    0.00    0.00    0.00    0.00      5.00
05:14:24 PM    5   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM    6   91.36    0.00    8.31    0.00    0.00    0.33    0.00    0.00     26.67
05:14:24 PM    7   91.36    0.00    8.64    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM    8   91.97    0.00    8.03    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM    9   91.69    0.00    8.31    0.00    0.00    0.00    0.00    0.00     17.67
05:14:24 PM   10   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      5.33
05:14:24 PM   11   92.03    0.00    7.97    0.00    0.00    0.00    0.00    0.00      3.33
05:14:24 PM   12   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   13   91.03    0.00    8.97    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   14   92.00    0.00    8.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   15   91.30    0.00    8.70    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   16   92.00    0.00    8.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   17   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   18   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   19   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   20   91.33    0.00    8.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   21   92.67    0.00    7.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   22   91.97    0.00    8.03    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   23   92.03    0.00    7.97    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   24   92.98    0.00    7.02    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   25   92.03    0.00    7.97    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   26   92.33    0.00    7.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   27   92.00    0.00    8.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   28   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   29   93.00    0.00    7.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   30   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   31   91.33    0.00    8.67    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   32   91.97    0.00    8.03    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   33   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   34   92.69    0.00    7.31    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   35   91.00    0.00    9.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   36   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   37   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   38   91.36    0.00    8.64    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   39   92.00    0.00    8.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   40   93.00    0.00    7.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   41   91.36    0.00    8.64    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   42   91.03    0.00    8.97    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   43   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   44   91.00    0.00    9.00    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   45   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   46   91.67    0.00    8.33    0.00    0.00    0.00    0.00    0.00      0.00
05:14:24 PM   47   91.33    0.00    8.67    0.00    0.00    0.00    0.00    0.00      0.00

DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,947 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: