Archive for the 'AMD Barcelona' Category

Oracle on Virtual Machines. Going Fishing? Intel “Nehalem” Xeon Quad-Core with CSI Floats!

CRN.com has coverage of the Xeon “Penryn” processor and some info about the micro-architecture change that will following in 2008 with the “Nehalem” processor. I think the following is an astounding comment:

Meanwhile, Intel is also preparing its next-generation Nehalem platform, which represents the company’s most significant shift in system architecture since the Pentium Pro debuted in 1996, Gelsinger said.

If you remember the P6 Orion chipset with the Pentium Pro, you’ll recall that it was Intel’s first MCM with 4 Pentium processors. It offered 48 bit memory support (kernel address space), 3 cycle shared L2 cache, and was quite the leap over the Pentium. The article states that the off-chip memory controller will be gone (good) and the interconnect (CSI) will be more like AMD HyperTransport. I think that means a bit of a NUMA feel, but I’m not sure yet. The architecture of Nehalem will support up to 8 cores as well.

What Does This Have To Do With Oracle
These are quad core processors that are going to pack a very significant punch—much more so than the AMD Barcelona processor expected later this year. That means single socket, quad core servers with more power than most 4 socket systems today. So if you have, say, a Proliant DL585 (great box) with idle cycles, you will likely have a lot of idle cycles when you refresh with these servers. That means virtualization—get use to it. The article hints towards 32nm processors in the 2010 timeframe. My oh my.

Where and What is a Nehalem, Really?
It is a North American Indian tribe. There is also a river about 40 miles from where I live and it is, in fact, precisely what Intel named this processor after. Intel has named other processors after rivers in the Pacific Northwest region of the states in the past (e.g., Willamette). I’ve been fishing the Nehalem for many, many years. I’m told blogs are better with photos, so here goes.

I’m sure the concept of fishing will wound the tender sentiment of at least a few readers. I’m sorry. You can’t make everyone happy, but I’ll throw a bone. The main species we fish for in the Nehalem is Steelhead which is an anadromous salmonid related to trout. Basically, it is a trout that lives in salt water but spawns in fresh water. Unlike true salmon, it can repeat that cycle. For that reason, game management in my home state enforce a great deal of “catch and release” and artificial bait regulations. That is in fact what I was doing when I caught the “Nehalem Bright”, as they are called, in the following photo. Caught, photographed and placed gently back into the water.

Nehalem_Brite

AMD Quad-Core “Barcelona” Processor For Oracle. How Badly Do You Need Enterprise Edition Oracle?

 

This blog entry is 6th in a series about Oracle on AMD’s upcoming quad-core processor code named “Barcelona.” The following is a link to the other installments on this thread:

Oracle on Opteron, K8L, NUMA, etc

Got Quad-Core? Need Enterprise Edition Oracle?
There is quite a buzz today about Oracle’s changes to software licensing for the database products. According to this ZDNet article, the changes are specific to the Standard Edition family of database products. The article refers to Oracle’s multi-core pricing guide which was updated on February 16, 2007. Get out your slide rule and gulp a heaping helping of patience.

Quad-Core x86_64
The ZDNet Article states:

Servers with four quad-core chips are relatively rare right now, but Intel and AMD plan to release processors for that segment later this year.

Um, the Xeon “Cloverdale” processors are quad-core and shipping already. AMD “Barcelona” is coming out this year. So what does this change really mean? If you use one of the Standard Edition products, you are longer limited based on cores, but sockets instead.

Misinformation—Lot’s of It
It’s Christmas for the bean counters. According to this News.com article, you can just simply switch out Enterprise Edition with Standard Edition:

Customers no longer must buy licenses for each of the 16 cores to run the top-end Enterprise Edition, but instead may buy licenses for the four sockets and run Standard Edition. That cuts list licensing prices from between $320,000 and $480,000–depending on Oracle adjustments that factor in multi-core processor performance–to $60,000.

I am still scratching my head about that one. Customers don’t swap out EE for SE at the drop of a hat—or do you? Since the choice would have never been there before to run SE on that many cores, could it be that SE will start to be the preferred multi-core edition? Can you live without the differences between EE and SE?

Barcelona
Folks that if have EE on a 4-Socket F (2200/8200) Opteron system today might be wise to think very hard about whether they can drop to SE because if they plug in Barcelona processors (they are socket-compatible), EE is going to be very, very expensive. That is, if you stay with EE and plug in Barcelona processors you will double your license cost.

I find this to be a very interesting policy change.

AMD Quad-Core “Barcelona” Processor For Oracle (Part V). 40% Expected Over Clovertown.

A reader posted an interesting comment on the latest installment on my thread about Oracle licensing on the upcoming AMD Barcelona processor. The comment as posted on my blog article entitled AMD Quad-Core “Barcelona” Processor For Oracle (Part IV) and the Web 2.0 Trolls states:

The problem with your numbers is that they are based on old AMD marketing materials. AMD has had a chance to run their engineering samples at their second stepping (they are now gearing up full production for late Q2 delivery – 12 weeks from wafer starts) and they are currently claiming a 40% advantage on Clovertown versus the 70% over the Opteron 2200 from their pre-A0 stepping marketing material.

The AMD claim was covered in this ZDNet article which quotes AMD Vice President Randy Allen as follows:

We expect across a wide variety of workloads for Barcelona to outperform Clovertown by 40 percent,” Allen said. The quad-core chip also will outperform AMD’s current dual-core Opterons on “floating point” mathematical calculations by a factor of 3.6 at the same clock rate, he said.

That is a significantly different set of projections than I covered in my article entitled AMD Quad-core “Barcelona” Processor For Oracle (Part II). That article covers AMD’s initial OLTP projections of 70% OLTP improvement on a per-processor (socket) over Opteron 2200. These new projections are astounding, and I would love to see it be the case for the sake of competition. Let’s take a closer look.

Hypertransport Bandwidth
I’m glad AMD has set expectations by stating the 40% uplift over Clovertown would be realized for “a wide variety of workloads.” However, since this is an Oracle blog I would much have preferred to see OLTP mentioned specifically. The numbers are hard to imagine, and it is all about feeding the processor, not the processor itself. The Barcelona processor is socket-compatible with Socket F. Any improvement of Opteron 2200/8200 would require existing headroom on the Hypertransport for workloads like OLTP. A lot of headroom—let’s look at the numbers.

The Socket F baseline that the original AMD projections were based on was 139,693 TpmC. If OLTP is included in the “wide variety of workloads”, then the projected OLTP throughput would be Clovertown 222,117 TpmC x 1.4, or 310.963 TpmC—all things being equal. This represents 2.2 times the throughput from the same Socket F/Hypertransport setup. Time for a show of hands, how many folks out there think that the Opteron 2200 OLTP result of 139,693 TpmC was achieved with more then 50% headroom to spare on the Hyptertransports? I would love to see Barcelona come in with this sort of OLTP throughput, but folks, systems are not made with more than 200% bus bandwidth than the processors need. I’m not very hopeful.

 

Bear in mind that today’s Tulsa processor as packaged in the IBM System x 3950 is capable of 331,087 TpmC with 8 cores. So, let’s factor our Oracle licensing in and see what the numbers look like if AMD’s projections apply to OLTP:

Opteron 2200 4 core: 139,693 TpmC, 2 licenses = 69,846 per license

Clovertown 8 core: 222,117 TpmC, 4 licenses = 55,529 per license

AMD Old Projection 8 core: 237,478 TpmC, 4 licenses = 59,369 per license

AMD New Projection 8 core: 310,963 TpmC, 4 licenses = 77,740 per license

Tulsa 8 core: 331,087 TpmC, 4 licenses = 82,771 per license

Barcelona Floating Point
FPU performance doesn’t matter to Oracle as I point out in this blog entry.

Clock Speed
The news about the expected 40% jump over Clovertown was accompanied by the news that Barcelona will clock in at a lower speed than Opteron 2200/8200 processors. I haven’t mentioned that aspect—because with Oracle it really doesn’t matter much. The amount of work Oracle gets done in cache is essentially nill. I’ll blog about clock speed with Opterons very soon.

AMD Quad-Core “Barcelona” Processor For Oracle (Part IV) and the Web 2.0 Trolls.

This blog entry is the fourth in a series:

Oracle on Opteron with Linux–The NUMA Angle (Part I)

Oracle on Opteron with Linux-The NUMA Angle (Part II)

Oracle on Opteron with Linux-The NUMA Angle (Part III)

It Really is All About The Core, Not the Processor (Socket)
In my post entitled AMD Quad-core “Barcelona” Processor For Oracle (Part III). NUMA Too!, I had to set a reader straight over his lack of understanding where the terms processor, core and socket are concerned. He followed up with:

kevin – you are correct. your math is fine. though, i may still disagree about core being a better term than “physical processor”, but that is neither here, nor there.

He continues:

my gut told me based upon working with servers and knowing both architectures your calculations were incorrect, instead i errored in my math as you pointed out. *but*, i did uncover an error in your logic that makes your case worthless.

So, I am replying here and now. His gut may just be telling him that he ate something bad, or it could be his conscience getting to him for mouthing off over at the investor village AMD board where he called me a moron. His self-proclaimed server expertise is not relevent here, nor is it likely the level he insinuates.

This is a blog about Oracle; I wish he’d get that through his head. Oracle licenses their flagship software (Real Application Clusters) at a list price of USD $60,000 per CPU. As I’ve pointed out, x86 cores are factored at .5 so a quad-core Barcelona will be 2 licenses—or $120,000 per socket. Today’s Tulsa processor licenses at $60,000 per socket and outperforms AMD’s projected Barcelona performance. AMD’s own promotional material suggests it will achieve a 70% OLTP (TPC-C) gain over today’s Opteron 2200. Sadly that is just not good enough where Oracle is concerned. I am a huge AMD fan, so this causes me grief.

Also, since he is such a server expert, he must certainly be aware that plugging a Barcelona processor into a Socket F board will need 70% headroom on the Hypertransport in order to attain that projected 70% OLTP increase. We aren’t talking about some CPU-only workload here, we are talking OLTP—as was AMD in that promotional video. OLTP hammers Hypertransport with tons of I/O, tons of contentious shared memory protected with spinlocks (a MESI snooping nightmare) and very large program text. I have seen no data anywhere suggesting this Socket F (Opteron 2200) TPC-C result of 139,693 TpmC was somehow achieved with 70% headroom to spare on the Hypertransport.

Specialized Hardware
Regarding the comparisons being made between the projected Barcelona numbers and today’s Xeon Tulsa, he states:

you are comparing a commodity chip with a specialized chip. those xeon processors in the ibm TPC have 16MB of L3 cache and cost about 6k a piece. amd most likely gave us the performance increase of the commodity version of barcelona, not a specialized version of barcelona. they specifically used it as a comparison, or upgrade of current socket TDP (65W,89W) parts.

What can I say about that? Specialized version of Barcelona? I’ve seen no indication of huge stepping plans, but that doesn’t matter. People run Oracle on specialized hardware. Period. If AMD had a “specialized” Barcelona in the plans, they wouldn’t have predicted a 70% increase over Opteron 2200—particularly not in a slide about OLTP using published TPC-C numbers from Opteron 2200 as the baseline. By the way, the only thing 16MB cache helps with in an Oracle workload is Oracle’s code footprint. Everything else is load/store operations and cache invalidations. The AMD caches are generally too small for that footprint, but the fact that the on-die memory controller is coupled with awesome memory latencies (due to Hypertransport), small cache size hasn’t mattered that much with Opteron 800 and Socket F—but only in comparison to older Xeon offerings. This whole blog thread has been about today’s Xeons and future Barcelona though.

Large L2/L3 Cache Systems with OLTP

Regarding Tulsa Xeon processors used in the IBM System x TPC-C result of 331,087 TpmC, he writes:

the benchmark likely runs in cache on the special case hardware.

Cache-bound TPC-C? Yes, now I am convinced that his gut wasn’t telling him anything useful. I’ve been talking about TPC-C. He, being a server expert, must surely know that TPC-C cannot execute in cache. That Tulsa Xeon number at 331,087 TpmC was attached to 1,008 36.4GB hard drives in a TotalStorage SAN. Does that sound like cache to anyone?

Tomorrow’s Technology Compared to Today’s Technology
He did call for a new comparison that is worth consideration:

we all know the p4 architecture is on the way out and intel has even put an end of line date on the architecture. compare the barcelon to woodcrest

So I’ll reciprocate, gladly. Today’s Clovertown ( 2 Woodcrest processors essentially glued together) has a TPC-C performance of 222,117 TpmC as seen in this audited Woodcrest TPC-C result. Being a quad-core processor, the Oracle licensing is 2 licenses per socket. That means today’s Woodcrest performance is 55,529 TpmC per Oracle license compared to the projected Barcelona performance of 59,369 TpmC per Oracle license. That means if you wait for Barcelona you could get 7% more bang for your Oracle buck than you can with today’s shipping Xeon quad-core technology. And, like I said, since Barcelona is going to get plugged into a Socket F board, I’m not very hopeful that the processor will get the required complement of bandwidth to achieve that projected 70% increase over Opteron 2200.

Now, isn’t this blogging stuff just a blast? And yes, unless AMD over-achieves on their current marketing projections for Barcelona performance, I’m going to be really bummed out.

Multi-core Oracle Licensing. Proc/Sock/Core…What a Bore!

In this AMD webpage regarding software licensing, AMD is appealing to software vendors to license products by the socket as opposed to core. I wish Oracle would go this way because the .25 (Sun T1), .50 (Intel/AMD) and .75 (Power) core factoring is tedious. The webpage specifically states:

AMD is providing industry-thought leadership by recommending software developers license their software by socket […]

It is hard to tell if this recommendation from AMD has Barcelona in mind or not. As I blogged about in this post about Oracle per-core licensing with regard to Barcelona, I think the performance per Oracle license on Barcelona will be in trouble.

How can we expect normal humans to make good decisions about server purchases for Oracle when the topic of per-core performance—as it applies to Oracle per-core licensing—is so hard to grasp? As I have found in a comment from a reader on my blog, some people don’t even understand the difference between the terms “processor”, “core” and “socket”. The reader of this post comments:

check you math on the xeon system. tpc is 331,087 and the box has 4 dual core processors for a total of 8 physical processors. 331,087/8 = 41386.

now compare that to the 2 way dual core opteron system. tpc is 139,693 (multiply by 1.7 to estimate barcalona ) = 237478 for 4 physical cpus or 59367.

the barcelona@59367 > xeon@41386 by a factor of 1.44

your welcome… and i’m glad you aren’t my IT buyer.

The comment has been quoted verbatim. As far as the bit about being their IT buyer, I’m sure all of you who know me well are certain I wouldn’t buy this person so much as a bottle of water—even if his hair was on fire—after commenting like this on my blog. I did follow up with even more clarification though because it is a difficult topic:

The Xeon system at 331,087 is 4 socket, 8 core not “8 physical processors” as you state. The terminology is very important and the term “physical processors” has generally been replaced with the term “socket.”

The Opteron number is 139,693 for 2 sockets, 4 cores. AMD expects an increase of 70% per socket, not core. So you are right, the projected Barcelona number is 1.7x or 237,478, but that would be for a 2 socket system–albeit 8 cores.

This is an Oracle blog and I’m blogging about performance per core. So I’ll reiterate:

Opteron 2200 34,923 TpmC per core (139,693/4)
Barcelona ~29,684 TpmC per core (237,478/8)
Tulsa 41,385 TpmC per core (331,087/8)

Oracle licenses by the core. That is all that matters on this blog.

Performance per Oracle license really is all that matters here.

Oracle on Opteron with Linux-The NUMA Angle (Part II)

A little more groundwork. Trust me, the Linux NUMA API discussion that is about to begin and the microbenchmark and Oracle benchmark tests will make a lot more sense with all this old boring stuff behind you.

Another Terminology Reminder
When discussing NUMA, the term node is not the same as in clusters. Remember that all the memory from all the nodes (or Quads, QBBs, RADs, etc) appear to all the processors as cache-coherent main memory.

More About NUMA Aware Software
As I mentioned in Oracle on Opteron with Linux–The NUMA Angle (Part I), NUMA awareness is a software term that refers to kernel and user mode software that makes intelligent decisions about how to best utilize resources in a NUMA system. I use the generic term resources because as I’ve pointed out, there is more to NUMA than just the non-uniform memory aspect. Yes, the acronym is Non Uniform Memory Access, but the architecture actually supports the notion of having building blocks with only processors and cache, only memory, or only I/O adaptors. It may sound really weird, but it is conceivable that a very specialized storage subsystem could be built and incorporated into a NUMA system by presenting itself as memory. Or, on the other hand, one could envision a very specialized memory component—no processors, just memory—that could be built into a NUMA system. For instance, think of a really large NVRAM device that presents itself as main memory in a NUMA system. That’s much different than an NVRAM card stuffed into something like a PCI bus and accessed with a device driver. Wouldn’t that be a great place to put an in-memory database for instance? Even a system crash would leave the contents in memory. Dealing with such topology requires the kernel to be aware of the differing memory topology that lies beneath it, and a robust user mode API so applications can allocate memory properly (you can’t just blindly malloc(3) yourself into that sort of thing). But alas, I digress since there is no such system commercially available. My intent was merely to expound on the architecture a bit in order to make the discussion of NUMA awareness more interesting.

In retrospect, these advanced NUMA topics are the reason I think Digital’s moniker for the building blocks used in the AlphaServer GS product line was the most appropriate. They used the acronym RAD (Resource Affinity Domain) which opens up the possible list of ingredients greatly. An API call would return RAD characteristics such as how many processors, how much memory (if any) and so on a RAD consisted of. Great stuff. I wonder how that compares to the Linux NUMA API? Hmm, I guess I better get to blogging…

When it comes to the current state of “commodity NUMA” (e.g., Opteron and Itanium) there are no such exotic concepts. Basically, these systems have processors and memory “nodes” with varying latency due to locality—but I/O is equally costly for all processors. I’ll speak mostly of Opteron NUMA with Linux since that is what I deal with the most and that is where I have Oracle running.

For the really bored, here is a link to a AlphaServer GS320 diagram.

The following is a diagram of the Sequent NUMA-Q components that interfaced with the SHV Xeon chipset to make systems with up to 64 processors:

lynx1.jpg

OK, I promise, the next NUMA blog entry will get into the Linux NUMA API and what it means to Oracle.

AMD Quad-Core “Barcelona” Processor For Oracle (Part III). NUMA Too!

To continue my thread about AMD’s future Quad-core processors code named “Barcelona” (a.k.a. K8L), I need to elaborate a bit on my last installment on this thread where I pointed out that AMDs marketing material suggests we should expect 70% better OLTP performance from Barcelona than Socket F (Opteron 2220). To be precise, the marketing materials are predicting a 70% increase on a per-processor basis. That is a huge factor that I need to blog, so here it is.

“Friendemies”
While doing the technical review for the Julian Dyke/Steve Shaw RAC on Linux Book I got to know Steve Shaw a bit. Since then we have become more familiar with each other especially after manning the HP booth in the exhibitor hall at UKOUG 2006. Here is a photo of Steve in front of the HP Enterprise File Services Clustered Gateway demo. The EFS is an OEMed version of the PolyServe scalable file serving utility (scalable clustered storage that works).

shaw_4.JPG

People who know me know I’m a huge AMD fan, but they also know I am not a techno-religious zealot. I pick the best, but there is no room for loyalty in high technology (well, on second thought, I was loyal to Sequent to the bitter end…oh well). So over the last couple of years, Steve and I have occasionally agreed to disagree about the state of affairs between Intel and AMD processor fitness for Oracle. Steve and I are starting to see eye to eye a lot more these days because I’m starting to smell the coffee as they say.

It’s All About The Core
When it comes to Oracle performance on industry standard servers, the only thing I can say is, “It’s the core, stupid”—in that familiar Clintonian style of course. Oracle licenses the database at the rate of .5 per core, rounded up. So a quad-core processor is licensed as 2 CPUs. Let’s look at some numbers.

Since AMD’s Quad-core promo video is based on TPC results, I think it is fair to go with them. TPC-C is not representative of what real applications do to a processor, but the workload does one thing really well—it exploits latency issues. For OLTP, memory latency is the most important performance characteristic. Since AMD’s material sets our expectations for some 70% improvement in OLTP over the Opteron 2200, we’ll look at TPC-C.

This published TPC-C result shows that the Opteron 2200 can perform 69,846 TpmC per processor. If the AMD quad-core promotional video proves right, the Barcelona processor will come it at approximately 118,739 TpmC per processor (a 70% improvement).

TpmC/Oracle-license
Since a quad-core AMD is licensed by Oracle as 2 CPUs, it looks like Barcelona will be capable of 59,370 TpmC per Oracle license. Therein lies the rub, as they say. There are a couple of audited TPC-C results with the Intel “Tulsa” processor (a.k.a. Xeon 7140, 7150), such as this IBM System x result, that show this current high-end Xeon processor is capable of some 82,771 TpmC per processor. Since the Xeon 71[45]0 is a dual-core processor, the Oracle-license price factor is 82,771 TpmC per Oracle license. If these numbers hold any water, some 9 months from now when Barcelona ships, we’ll see a processor that is 28% less price-performant from a strict Oracle licensing standpoint. My fear is that it will be worse than that because Barcelona is socket-compatible with Socket F systems—such as the Opteron 2200. I’ve been at this stuff for a while and I cannot imagine the same chipset having enough headroom to feed a processor capable of 70% more throughput. Also, Intel will not stand still. I am comparing current Xeon to future Barcelona.

A Word About TPC-C Analysis
I admit it! I routinely compare TPC-C results on the same processor using results achieved by different databases. For instance, in this post, I use a DB2/SLES on IBM System x to make a point about the Xeon 7150 (“Tulsa”) processor. E-gad, how can I do that with a clear conscience? Well, think about it this way. If DB2 on IBM System x running SuSE can achieve 82,771 TpmC per Xeon 7150 and this HP result shows us that SQL Server 2005 on Proliant ML570G4 (Xeon 7140) can do 79,601 TpmC per CPU, you have to at least believe Oracle would do as well. There are no numbers anywhere that suggest Oracle is head and shoulders above either of these two software configurations on identical hardware. We can only guess because Oracle seems to be doing TPC-C with Itanium exclusively these days. I think that is a bummer, but Steve Shaw likes it (he works for Intel)!

What Does NUMA Have To Do With It?
Uh, Opteron/HyperTransport systems are NUMA systems. I haven’t blogged much about that yet, but I will. I know a bit about Oracle on NUMA—a huge bit.

I hope you’ll stay tuned because we’ll be looking at real numbers.


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,947 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: