Intel Xeon 5500 Nehalem: Is It 17 Percent Or 2.75-Fold Faster Than Xeon 5400 Harpertown? Well, Yes Of Course It Is!

I received two related emails while I was out recently for a couple of days of fishing and hiking. I thought they’d make for an interesting blog entry. The first email read:

…our tests show very little performance improvement on nehalem cpus compared to older Xeon…

And, the other email was the polar opposite:

…in most of our tests the Xeon 5500 was over 2 times as fast as the harpertown Xeon…

And the email continued:

…so we think you should stop saying that Xeon 5500 is double the perf of older xeon

Well, I can’t make everyone happy. I tend to say that Intel Xeon 5500 (Nehalem) processors are twice as fast as Harpertown Xeon (5400) as a conservative, well-rounded way to set expectations.

Introducing Fat and Skinny
OK, bear with me now, this is a wee tongue-in-cheek. The reader who emailed me with the report of near parity between Nehalem and Xeon is not lying, he’s just skinny. And the reader who admonished me for my usual low-ball citation of 2x performance vis a vis Nehalem versus Harpertown? No, he’s not lying either…he’s fat. Allow me to explain.

It’s really quite simple. If you run code that spends a significant portion of processor cycles operating on memory lines in the processor cache, you are operating code that has a very low CPI (cycles per instruction) cost. In my terminology such code is “skinny.” On the other hand code that jumps around in memory causing processor stalls for memory loads has a high CPI and is, in my terminology, fat.

Skinny code more or less relegates the comparison between Harpertown and Nehalem to one of clock frequency whereas fat code is really where the rubber hits the road. The more load and store hungry (fat) the code is the more the Nehalem pay-off will be.

Let’s take a look at two different, simple programs to help make the point. Using fat.c and skinny.c I’ll take timings on a Harpertown and Nehalem based boxes. As you can see, skinny.c simply hammers away on the same variable and does not leave L2 cache. On the other hand, fat.c treats its memory allocation as an array of 8-byte longs and skips to every 8th one in a loop in order to force memory loads since the cache line size on this box is 64 bytes. NOTE: do not compile these with -O (or change the longs in the array to volatile long). A simple gcc without args will suffice.

So, skinny.c has a very low CPI and fat.c has a very high CPI.

In the following examples, the model name field from cpuid output tells us what each system is. The E5430 is Harpertown Xeon and the 5570 is of course Nehalem. In terms of clock frequency, the Nehalem processors are 10% faster than the Harpertown Xeons.

In the following box you’ll see screen-scrapes I took from two different systems, one based on Nehalem and the other Harpertown. Notice how skinny only improves by 17% with the same executable on Nehalem compared to Harpertown.


# cat /proc/cpuinfo | grep 'model name'
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
# md5sum skinny
df86d9a278ea33b7da853d7a17afdd46  skinny

# time ./skinny

real    6m3.658s
user    6m3.567s
sys     0m0.001s
#

# cat /proc/cpuinfo | grep 'model name'
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
# md5sum skinny
df86d9a278ea33b7da853d7a17afdd46  skinny
# time ./skinny

real    5m1.941s
user    5m2.043s
sys     0m0.001s

In the next box you’ll see screen-scrapes from the same two systems where I ran the “fat” executable. Notice how the Harpertown Xeon took 2.75x longer to process the fat.


# cat /proc/cpuinfo | grep 'model name' | head -1
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
# md5sum fat
b717640846839413c87aedd708e8ac0d  fat
# time ./fat

real    1m57.731s
user    1m57.659s
sys     0m0.045s

# cat /proc/cpuinfo | grep 'model name' | head -1
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
# md5sum fat
b717640846839413c87aedd708e8ac0d  fat
# time ./fat

real    0m42.834s
user    0m42.803s
sys     0m0.023s

So, as it turns out, we can believe both of the folks that sent me email on the matter.

5 Responses to “Intel Xeon 5500 Nehalem: Is It 17 Percent Or 2.75-Fold Faster Than Xeon 5400 Harpertown? Well, Yes Of Course It Is!”


  1. 1 Andrew Gregovich September 5, 2009 at 4:47 pm

    Looks like Nehalem-EX will also make a big splash: http://news.cnet.com/8301-13512_3-10321740-23.html

  2. 2 Shawn September 22, 2009 at 5:06 pm

    Hello Kevin,

    Is there a way in the BIOS to turn off two of the four cores (Xeon 5500)? Oracle does licensing by cores, so that is something we are considering.

    I am having difficulty finding this information.

    Thanks!

  3. 4 Jeff D September 25, 2009 at 3:06 pm

    Kevin – thanks for the blog.

    Do you know of any TPC-H benchamarks available on these processors. I’m getting some new servers to put together a 5 node 11g (hopefully 11gR2) RAC environment and was looking to compare a config with the Nehalem processors on RHEL5 vs. a Sun 5240 Ultra Sparc config running Solaris.

    My main concern is w.r.t to parallel processing within our warehouse. We’re starting to hit some CPU bottlencks in our existing environment (single server Sun 890 8 CPU config) even though we do a pretty good job controlling DOP through Resource management. Any opinions on which processors would handle parallel processing better?

    • 5 kevinclosson September 25, 2009 at 4:15 pm

      Well now, it would be really odd for me to take a position against SPARC at this juncture.

      It’s not about the processors anyway…it’s about the bandwidth between memory and the processors… really fasy CPUs on a junk bus/interconnect stall a lot..they remain “busy” but not effectively so. Until the latest Harpertown Xeon-based systems, I’d have to say that Intel routinely mated CPUs to an under-performing bus. Those were the days when Opteron with HT ruled in the commodity space. Well, actually, the Woodcrest 5100 family and their chipset brought Intel and AMD closer together. All that aside, we are talking QuickPath and Nehalem these days… and the whole thing is moot, I don’t know enough about SPARC-based systems to say one way or the other and it would be utterly technocratically/politically incorrect for me to say anything about it now anyway given the Oracle/Sun merger.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 743 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: