Fun with Intel Xeon 5500 Nehalem and Linux cpuspeed(8). Part III.

I recently received email from a reader who wondered why Part I and II of my series on Intel 5500 “Nehalem” cpuspeed(8) was based on NUMA-disabled mode (SUMA/SUMO system) testing. The series the reader referred to can be found at the following links:

Fun With Intel Xeon 5500 Nehalem and Linux cpuspeed(8) Part I.

Fun With Intel Xeon 5500 Nehalem and Linux cpuspeed(8). Part II.

The reader is correct. Thus far in the series I’ve been sharing some findings (trivia?) from a test system with NUMA disabled at the BIOS level. For reference, you can see more about the concept of disabling NUMA with commodity NUMA systems in this post. As an aside, running a Commodity NUMA Implementation (CNI) system (e.g., Xeon 5500 Nehalem) with NUMA disabled in the BIOS is also refered to as a SUMA or SUMO configuration.

A Look at cpuspeed(8) and NUMA
In this blog entry I’ll show some findings based on the busy.sh script (to stress varying processor threads) and analysis of how cpuspeed(8) reacts using the howfast.sh script. But first, recall from Part II of this series where I said:

Hammering all the primary threads heats up only OS cpus 0,2,4,6,8,10,12 and 14 but hammering on the all the secondary threads causes all processor threads to clock up.

That was an indeed an odd thing to observe and I have not yet started to investigate why it is that way since I’m still in somewhat of a discovery phase. Let’s see how the processors respond under the same conditions with NUMA enabled in the BIOS. But first, I’ll do a quick check to make sure it is a NUMA system, not a SUMA/SUMO system system. I’ll use numactl(8) to make sure I have two NUMA nodes in this HP Proliant server with Intel Xeon 5500 “Nehalem” processors:


# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 8052 MB
node 0 free: 3683 MB
node 1 size: 8080 MB
node 1 free: 3664 MB
node distances:

node   0   1
0:  10  20
1:  20  10

Good, it is a NUMA system. In the following box I’ll show how the processors respond to two different experiments. Before I show any test results, though, I need to point out that I’ve changed the howfast.sh script so that it it takes an argument and compares the current processor speeds against the value supplied in the argument. If no argument is provided the script just lists a single line of output with all the processors’ current clock rates. This change was necessary to avoid having to peruse the output of the script to validate the speeds prior to an experiment.

The following box shows the new script behavior. I first use the script with an argument of 1600 and so long as all the cpus are currently clocked at 1600 MHz, the script returns success and the shell moves on to execute busy.sh. As expected, after busy.sh executed, howfast.sh stumbles on a cpu that is not clocked at 1600 and fails.


# ./howfast.sh 1600 && ./busy.sh 1;./howfast.sh 1600
Check Failed: CPU 0 is 2934.00

NUMA Experiments
First, I’ll stress the primary thread of core 0. Next, I’ll stress the primary thread of core 1. Both cores are in socket 0:

# ./howfast.sh 1600 && ./busy.sh 0; ./howfast.sh
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000
#
# ./howfast.sh 1600 && ./busy.sh 1; ./howfast.sh
0 1600.000 1 2934.000 2 1600.000 3 2934.000 4 1600.000 5 2934.000 6 1600.000 7 2934.000 8 1600.000 9 2934.000 10 1600.000 11 2934.000 12 1600.000 13 2934.000 14 1600.000 15 2934.000

That output should look familiar to the six or so folks following this series because it is exaclty how the processors behave when the system is booted as a SUMA/SUMO system. In Part II of this series I made the following observation:

Running dumb.c on core 0 speeds up OS CPU 0 and every even-numbered processor thread in the box. Conversely, stressing core 1 causes the clock rate on all odd-numbered processor threads to increase.

Let’s see what happens when I hammer multiple processor threads as I did in Part II.


# ./howfast.sh 1600 && ./busy.sh '0 1 2 3 4 5 6 7';./howfast.sh

0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000

# ./howfast.sh 1600 && ./busy.sh '8 9 10 11 12 13 14 15';./howfast.sh
0 2934.000 1 2934.000 2 2934.000 3 2934.000 4 2934.000 5 2934.000 6 2934.000 7 2934.000 8 2934.000 9 2934.000 10 2934.000 11 2934.000 12 2934.000 13 2934.000 14 2934.000 15 2934.000

Déjà Vu

Here, as in the SUMA case, stressing the primary procesor threads in both sockets causes only certain processor threads to clock up. On the other hand, as was also the case with SUMA, stressing the secondary processor threads of both sockets speeds up all processor threads. So, at least this much is consistent between the NUMA and SUMA tests. But what about a series of these tests with a cool down period in the loop?

In the following box I’ll show the effect of looping the busy.sh script in the same fashion as I did in Part II (SUMA). In each iteration, I’ll stress the secondary processor threads of both sockets. As you’ll see, the results are similar to the SUMA behavior except for the frequency of tests that resulted in all processors speeding up. In the SUMA case it was 50% but in the NUMA case it is only 40%:

#
# for t in 1 2 3 4 5 6 7 8 9 10; do ./howfast.sh 1600 && ./busy.sh '8 9 10 11 12 13 14 15' ;./howfast.sh;sleep 30; done
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000
0 2934.000 1 2934.000 2 2934.000 3 2934.000 4 2934.000 5 2934.000 6 2934.000 7 2934.000 8 2934.000 9 2934.000 10 2934.000 11 2934.000 12 2934.000 13 2934.000 14 2934.000 15 2934.000
0 2934.000 1 2934.000 2 2934.000 3 2934.000 4 2934.000 5 2934.000 6 2934.000 7 2934.000 8 2934.000 9 2934.000 10 2934.000 11 2934.000 12 2934.000 13 2934.000 14 2934.000 15 2934.000
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000
0 2934.000 1 2934.000 2 2934.000 3 2934.000 4 2934.000 5 2934.000 6 2934.000 7 2934.000 8 2934.000 9 2934.000 10 2934.000 11 2934.000 12 2934.000 13 2934.000 14 2934.000 15 2934.000
0 2934.000 1 2934.000 2 2934.000 3 2934.000 4 2934.000 5 2934.000 6 2934.000 7 2934.000 8 2934.000 9 2934.000 10 2934.000 11 2934.000 12 2934.000 13 2934.000 14 2934.000 15 2934.000
0 2934.000 1 1600.000 2 2934.000 3 1600.000 4 2934.000 5 1600.000 6 2934.000 7 1600.000 8 2934.000 9 1600.000 10 2934.000 11 1600.000 12 2934.000 13 1600.000 14 2934.000 15 1600.000

So here we are at Part III and thus far the sum value of all this information is:

  • cpuspeed(8) acts unpredictably on Xeon 5500 “Nehalem” processors
  • cpuspeed(8) acts differently on Xeon 5500 “Nehalem” processors in NUMA mode compared to SUMA mode.
  • processors cool down quickly after being clocked up

Someone, someday, will likely be scratching their head and googling to see if anyone else is seeing odd processor frequency issues with the Xeon 5500 “Nehalem processors. If nothing else, this series of blog posts will at least let said googler know that they are not alone in what they are seeing.

2 Responses to “Fun with Intel Xeon 5500 Nehalem and Linux cpuspeed(8). Part III.”


  1. 1 boblee June 5, 2009 at 9:22 pm

    Great Blogging!!
    Keep Your Good Work Going!!

    Processor

  2. 2 Brett Schroeder June 6, 2009 at 3:44 am

    Some of your results can be explained by the fact that Nehalem has Hyper-Threading (and your BIOS has it enabled). Hyper-threading effectively turns one core into 2 logical CPU’s by having two sets of registers to store the architectural state of two threads. However, not all components of the core are duplicated for each thread. The threads are thus forced to share the “central” part of each core i.e. execution engine, system bus interface and clock circuitry.

    Since the clock is common to both threads on any given core, when you load up the primary and its frequency increases, so does the frequency of the secondary. Vice versa when you load up the primary. These reported speeds are in reality the speed of the same clock circuit in each core.

    In general, for your two socket quad-core system the speeds of OS CPU #N will *always* be equal to OS CPU #(N+8) irrespective of how you load it e.g CPU #3 (s0_c3_t0) = CPU #11 (s0_c3_t1).

    See section 2.2.7 and 2.2.8 of Intel Software Developers Manual Vol 1 (there is a nice schematic of Nehalem/Core i7 at the end of 2.2.8). Here’s the link http://download.intel.com/design/processor/manuals/253665.pdf


Leave a Reply to Brett Schroeder Cancel reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 747 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: