AMD Quad-Core “Barcelona” Processor For Oracle (Part IV) and the Web 2.0 Trolls.

This blog entry is the fourth in a series:

Oracle on Opteron with Linux–The NUMA Angle (Part I)

Oracle on Opteron with Linux-The NUMA Angle (Part II)

Oracle on Opteron with Linux-The NUMA Angle (Part III)

It Really is All About The Core, Not the Processor (Socket)
In my post entitled AMD Quad-core “Barcelona” Processor For Oracle (Part III). NUMA Too!, I had to set a reader straight over his lack of understanding where the terms processor, core and socket are concerned. He followed up with:

kevin – you are correct. your math is fine. though, i may still disagree about core being a better term than “physical processor”, but that is neither here, nor there.

He continues:

my gut told me based upon working with servers and knowing both architectures your calculations were incorrect, instead i errored in my math as you pointed out. *but*, i did uncover an error in your logic that makes your case worthless.

So, I am replying here and now. His gut may just be telling him that he ate something bad, or it could be his conscience getting to him for mouthing off over at the investor village AMD board where he called me a moron. His self-proclaimed server expertise is not relevent here, nor is it likely the level he insinuates.

This is a blog about Oracle; I wish he’d get that through his head. Oracle licenses their flagship software (Real Application Clusters) at a list price of USD $60,000 per CPU. As I’ve pointed out, x86 cores are factored at .5 so a quad-core Barcelona will be 2 licenses—or $120,000 per socket. Today’s Tulsa processor licenses at $60,000 per socket and outperforms AMD’s projected Barcelona performance. AMD’s own promotional material suggests it will achieve a 70% OLTP (TPC-C) gain over today’s Opteron 2200. Sadly that is just not good enough where Oracle is concerned. I am a huge AMD fan, so this causes me grief.

Also, since he is such a server expert, he must certainly be aware that plugging a Barcelona processor into a Socket F board will need 70% headroom on the Hypertransport in order to attain that projected 70% OLTP increase. We aren’t talking about some CPU-only workload here, we are talking OLTP—as was AMD in that promotional video. OLTP hammers Hypertransport with tons of I/O, tons of contentious shared memory protected with spinlocks (a MESI snooping nightmare) and very large program text. I have seen no data anywhere suggesting this Socket F (Opteron 2200) TPC-C result of 139,693 TpmC was somehow achieved with 70% headroom to spare on the Hypertransport.

Specialized Hardware
Regarding the comparisons being made between the projected Barcelona numbers and today’s Xeon Tulsa, he states:

you are comparing a commodity chip with a specialized chip. those xeon processors in the ibm TPC have 16MB of L3 cache and cost about 6k a piece. amd most likely gave us the performance increase of the commodity version of barcelona, not a specialized version of barcelona. they specifically used it as a comparison, or upgrade of current socket TDP (65W,89W) parts.

What can I say about that? Specialized version of Barcelona? I’ve seen no indication of huge stepping plans, but that doesn’t matter. People run Oracle on specialized hardware. Period. If AMD had a “specialized” Barcelona in the plans, they wouldn’t have predicted a 70% increase over Opteron 2200—particularly not in a slide about OLTP using published TPC-C numbers from Opteron 2200 as the baseline. By the way, the only thing 16MB cache helps with in an Oracle workload is Oracle’s code footprint. Everything else is load/store operations and cache invalidations. The AMD caches are generally too small for that footprint, but the fact that the on-die memory controller is coupled with awesome memory latencies (due to Hypertransport), small cache size hasn’t mattered that much with Opteron 800 and Socket F—but only in comparison to older Xeon offerings. This whole blog thread has been about today’s Xeons and future Barcelona though.

Large L2/L3 Cache Systems with OLTP

Regarding Tulsa Xeon processors used in the IBM System x TPC-C result of 331,087 TpmC, he writes:

the benchmark likely runs in cache on the special case hardware.

Cache-bound TPC-C? Yes, now I am convinced that his gut wasn’t telling him anything useful. I’ve been talking about TPC-C. He, being a server expert, must surely know that TPC-C cannot execute in cache. That Tulsa Xeon number at 331,087 TpmC was attached to 1,008 36.4GB hard drives in a TotalStorage SAN. Does that sound like cache to anyone?

Tomorrow’s Technology Compared to Today’s Technology
He did call for a new comparison that is worth consideration:

we all know the p4 architecture is on the way out and intel has even put an end of line date on the architecture. compare the barcelon to woodcrest

So I’ll reciprocate, gladly. Today’s Clovertown ( 2 Woodcrest processors essentially glued together) has a TPC-C performance of 222,117 TpmC as seen in this audited Woodcrest TPC-C result. Being a quad-core processor, the Oracle licensing is 2 licenses per socket. That means today’s Woodcrest performance is 55,529 TpmC per Oracle license compared to the projected Barcelona performance of 59,369 TpmC per Oracle license. That means if you wait for Barcelona you could get 7% more bang for your Oracle buck than you can with today’s shipping Xeon quad-core technology. And, like I said, since Barcelona is going to get plugged into a Socket F board, I’m not very hopeful that the processor will get the required complement of bandwidth to achieve that projected 70% increase over Opteron 2200.

Now, isn’t this blogging stuff just a blast? And yes, unless AMD over-achieves on their current marketing projections for Barcelona performance, I’m going to be really bummed out.

14 Responses to “AMD Quad-Core “Barcelona” Processor For Oracle (Part IV) and the Web 2.0 Trolls.”


  1. 1 Richard January 26, 2007 at 6:34 pm

    The problem with your numbers is that they are based on old AMD marketing materials. AMD has had a chance to run their engineering samples at their second stepping (they are now gearing up full production for late Q2 delivery – 12 weeks from wafer starts) and they are currently claiming a 40% advantage on Clovertown versus the 70% over the Opteron 2200 from their pre-A0 stepping marketing material.

    And do remember that they have long term plans for dual-core versions of the same core and that Oracle can always change the license scheme.

  2. 2 kevinclosson January 26, 2007 at 6:50 pm

    Richard,

    You are 100% correct about the information I’m using. I am still looking for updated statements about leap-frogging Clovertown.

    The most important thing for my blog was to point out the $$ per Oracle license aspect because nobody else in the blogosphere or elsewhere was doing it. Redundantly I’ll state that I am a huge AMD fan.

    As for Oracle changing their Intel core licensing, I don’t see that happening but I could be wrong. Oracle cannot afford a revenue hit and I don’t think there is volume to amortize over.

    I have made a follow-up entry regarding the new AMD info:

    https://kevinclosson.wordpress.com/2007/01/26/amd-qaud-core-barcelona-processor-for-oracle-part-v-40-expected-over-clovertown/

    Thanks for visiting my blog.

  3. 3 information_is_king January 26, 2007 at 11:30 pm

    can you at least admit that i pointed out a flaw in your methodology, which led YOU to a false conclusion?

    this is like comparing the slalom timings of a ferrari with an escort, and then assuming that dropping an improved engine might give the escort a chance. slalom is a good analogy, because tire friction and suspension are very important components. just like network bandwidth, disk performance, interprocess bus, and cache components would be in an OLTP benchmark.

    when you do a more realistic comparison, you found the first run silicon of amd quadcore is faster than the current intel counterpart. amazing, “it’s all about the cores”. imagine that, if we throw out your $2M specialized hardware case and use an apples to apples comparison, suddenly we can learn something useful. amd is also on record as saying the barcelon and core2duo architecture should have a similar IPC, so a 7% gap is probably a reasonable indicator you are on to something.

    and… i don’t take offense to what you blog about. your blog was used as evidence on a message board. had you not been cross posted, i would have never had the opportunity to debate your posting. btw, looking forward to reading your numa / oracle post.

  4. 4 Richard January 27, 2007 at 2:49 am

    I thought it would be helpful information. Also keep in mind that there are custom FPGA’s (using Torrenza) from companies that allow you to accelerate specific tasks. One of the companies producing these has already made a custom algorithm to accelerate the transaction power of the Postgres database, so Oracle could clearly be done as well, if it hasn’t already. This is something that AMD has as an advantage.

    The cost of these accelerators varies greatly with some as low as $3000 and others as high as $15,000, but they generally come down in price once they start selling.

    Also keep in mind that just because a dual socket dual-core Opteron system has both sockets populated doesn’t mean you have to replace both with a pair of Barcelona’s. You could go with just one, and still get significant improvement. I think AMD is on the right course, and I think Intel isn’t in as favorable a position as some think, but this competition will continue to drive computing power to new heights and we certainly can’t help but enjoy the results.

  5. 5 Kevinclosson January 27, 2007 at 4:29 am

    information_is_king writes:

    “can you at least admit that i pointed out a flaw in your methodology, which led YOU to a false conclusion? ”

    i.i. King,

    You have not added any value to this thread. Your input is not considered “debate” by any stretch of the imagination. You flatulated your brain matter all over the place over at investorvillage.com on this topic and called me a moron to boot. We cannot kiss and make up, but you are a welcome reader on this blog nonetheless.

    Yes, I ran some more theoretical numbers using AMD’s Opty2200+70% compared to today’s Cloverdale Xeon and there is a 7% difference. So what? AMD used TPC-C to set the OLTP expectations for Barcelona and given Oracle’s licensing structure, those projections were simply not going to cut it. Who else brought that topic out in this wealth of jailhouse lawyer cross-chatter we refer to as Web 2.0? Nobody. So I did.

    Look, i.i.king, I “get it”. I am a huge AMD fan and I know there is more to OLTP than TPC-C, but AMD used those numbers and I had to raise the topic. Not only am I a huge AMD fan, but I’ve got a bunch of them in the lab, one of my whitepapers is published on AMD’s website and I was a guest speaker in their demo theater at Oracle World 2006.

    I’ve blogged the heck out of this topic so I’m a bit tired of it and that has nothing to do with this supposed battle of wits with you. I have blogged about the revised expectations (Cloverdale+40% now) and so I’m moving on.

    In the future I’d encourage you to use a modicum of decorum.

  6. 6 kevinclosson January 27, 2007 at 4:39 am

    Richard,

    Thanks for the input. Until I get my hands on a Barcelona system, I’ll shall remain reluctantly skeptical that there would be enough headroom in the socket F and current Hypertransport to feed the processor. I’m long on HT 3.0 of course.

  7. 7 information_is_king January 27, 2007 at 2:52 pm

    i’ll agree with you that it is time to move on. to do so, i posted a retraction on the IV boards, but remain critical of your original comparison which involved the $1.5M ibm xeon system.

    http://www1.investorvillage.com/smbd.asp?mb=476&mn=31935&pt=msg&mid=1308210

    best wishes on your blog and endeavors.

  8. 8 Robert I. Eachus January 28, 2007 at 1:10 am

    I have seen no data anywhere suggesting this Socket F (Opteron 2200) TPC-C result of 139,693 TpmC was somehow achieved with 70% headroom to spare on the Hypertransport.

    I may not be able to help with the rest of the discussion, but I can address the Hypertransport headroom issue. Current Opterons use HT 2.0, which supports rates up to 1400 Mtransfers/second. The weird unit is of course because HT connections can be 2 to 32 bits wide. Since, for now, all Opteron chips use 16-bit HT ports, lets switch to Gigabytes/second. Current Opteron CPUs have three ports in each direction, which run at 2 GBytes/sec each. The HT 2.0 standard allows for up to 2.8 GBytes/second. If the new quad-core Barcelona does support 7x HT connections as expected, that is 40% extra headroom right there.

    If you are talking about the interprocessor bandwidth, that ‘extra’ bandwidth will be necessary with four cores instead of two. Current four-socket Opteron systems max out the IPC bandwidth between cores, while the bandwidth from the CPU to the chipset is nowhere near maximum.

    But there is another magic gotcha. If you compare one four-core Barcelona to two dual-core Opteron 2220s, there is no cache coherency traffic over HT at all. It is all contained within the interprocessor crossbar on the chip. So there could indeed be that much headroom in that particular comparison.

    Finally, it is not clear whether Barcelona will implement HT 3.0. (Right now it appears that it will, but not all HT 3.0 features.) Right now, those extra features of HT 3.0 are promised for 2008 as part of Direct Connect 2.0. This will include four (as opposed to three) 16-bit HT pairs of ports. (Well, really HT connections use a pair of LDT ports…) HT 3.0 also supports frequencies up to 2.8 GigaTransfers/second. Why? Well…the Opterons scheduled for 2008 will be able to split those four pairs of ports into eight. These will have 2.8 GBytes/second capacity same as above, but will allow eight CPU sockets to be connected with one HT hop between them. (With eight eight-bit HT port pairs left over for I/O.)

  9. 9 kevinclosson January 28, 2007 at 1:46 am

    But there is another magic gotcha. If you compare one four-core Barcelona to two dual-core Opteron 2220s, there is no cache coherency traffic over HT at all. It is all contained within the interprocessor crossbar on the chip

    Robert, excellent follow up. Clarification please. I think you are saying that there is no coherency traffic on HT for the 4 cores of a Barcelona. That I understand. How about between sockets? From what I see that must surely just be MESI.

  10. 10 Mogens Nørgaard January 28, 2007 at 4:29 am

    I don’t want to de-rail this excellent thread with the following – it’s just meant as a random observation on my part. So promise me to keep at it on the technical side, OK? :-)).

    Here goes:

    Let’s go to another town than Barcelona, namely Harvard. I’ve been reading Harvard Business Review (HBR) for some years, and I might have learned a thing or two. For instance I think I’ve learned that you cannot learn anything from other businesses. I also think I’ve observed that pretty much any new issue of HBR contains an article about a new, corporate type of person you should pay attention to as a leader – the Corporate Pain Absorber, for instance.

    But I do recall one very interesting piece of research published in HBR: They were looking at grand, ole’ companies with market dominance, who were faced with a shift in their “safe” market. The various things they can do in such a situation were discussed, and the big conclusion was that none of the grand, old companies had been able to profit from the shift, no matter what they tried.

    Example: Carlsberg has been faced here in Denmark with a lot of new micro-breweries. So what can Carlsberg do?

    1. More Marketing. But the “Probably the best beer in the World” sounds kind of hollow in that context, so they have had to try new tactics.

    2. Make their ‘standard’ beer more exclsive. They tried. They came out in new bottles with 3 cl (!!) more, they came out with plastic bottles, and they came out with cases that took 24 bottles instead of the traditional 30. Nothing worked. Their ‘standard’ beer is now a discount beer competing with the other discount beers here.

    3. Embrace the new micro-breweries by inviting them to join the association of brewers. Very few have done that. What would they get out of it anyway?

    4. Creating their own micro-brewery (called Jacobsen, which was the name of the original founder of Carlsberg). Well, micro might not be the right word. Its capacity is 200 (two hundred) times the combined capacity of all other micro-breweries in Denmark. Mini-brewery might have been a better label :).

    5. Importing quality beer (Leffe, etc.) from abroad and “allowing” restaurants who have signed all these exclusivity deals with Carlsberg, to sell these at very high prices along with the standard Carlsberg offerings (i.e. introducing the competition themselves).

    It works a little, of course. Nice to have Leffe in the standard beer bars now. Jacobsen can make some good beers now and then. Letting their brewmasters teach the new-comers how to avoid bad beer is good.

    But their market share doesn’t grow, their profit doesn’t either – and the micro-breweries are just becoming more and more popular, since it’s way more cool to drink quality stuff than enzyme-enhanced standard stuff.

    OK, end of Carlsberg rant.

    I wonder if the whole Opteron thing was a similar shift in Intel’s standard market? And can they stop it? I personally doubt it.

  11. 11 kevinclosson January 28, 2007 at 5:59 am

    i.i.king,

    RE:
    “i’ll agree with you that it is time to move on. to do so, i posted a retraction on the IV boards, but remain critical of your original comparison which involved the $1.5M ibm xeon system.”

    You are still mistakingly using your interpretation of what is costly hardware. Read my blog about how much more expensing Oracle licensing is PER CPU than the normal commodity server is as a whole. You are wrong about the cost of that IBM System x 3950 TPC-C system cost. You state 1.5mm, but that is total cost including software and storage. See the FDR at http://tpc.org/results/individual_results/IBM/IBM-x3950-3.5-061215.es.pdf and you’ll see the hardware was $135,650. Yes that is expensive, but a normal IT shop would then heap on $40,000 List for Enterprise Edition or $60,000 for RAC…and that is Per CPU. How many CPUs? A core is .5 so you have 4 licenses or $240,000…Oracle is a whole different league.

    Remember that the 7% delta is an existing Clovertown number compared to the original AMD projection of Opteron2200+70%..and that is 6 months or so from now. At that time the competition will be Tigerton.

    There is nothing flawed about my methodology. I take AMDs projections and compare them to existing published TPC numbers from production servers and always with a keen eye for what that means to an Oracle shop…I don’t care about MySQL, Doom or PhotoShop Deluxe economics.

  12. 12 BaronMatrix February 2, 2007 at 5:42 pm

    This was kind of a nice read but for a person who is so into AMD you seem to not understand that Barcelona is designed for Soket 1207+, bu twill be backward compatible with Socket F.

    By simply looking at Core 2 and Core 2 Quad you can see that even with the same FSB the imporvements are noticeable. The same will happen wth Opteron.

    It will be I think more pronounced because Opteron has always enabled more bandwidth and this will definitely soak up more of it.

    Why is it so hard to believe AMD’s numbers? Lying about them is not going to help anything.

    K8 is already known as the transaction processor, Barcelona will take it to the next level.

    I guess we’ll see numbers soon.

  13. 13 kevinclosson February 2, 2007 at 7:55 pm

    BaronMatrix,

    Yes I’m an AMD fan. AMD specifically states that Barcelona is an in-place upgrade for Socket F systems. Socket F is a 1207pin socket compatible with Opteron 2200/8200 and Barcelona. You state:

    “This was kind of a nice read but for a person who is so into AMD you seem to not understand that Barcelona is designed for Soket 1207+, bu twill be backward compatible with Socket F.”

    Since Barcelona is an in-place upgrade for Socket F I don’t think it is reasonable to state that it is designed for “Soket 1207+”. It is designed to be compatible. Do you think AMD wants people to hold off in-place upgrades from Opteron 2200/8200 to Barcelona just because you think Barcelona is designed for Soket 1207+? Fact is, Soket 1207+ is not available until 2008. Do you have any reference where AMD projects their 70% increase over Opteron 2200 or 40% over Xeon Cloverdale using Soket 1207+? Do you think AMD is getting people excited about a new chip only to tell them to wait for a future Hyptertransport? SMP Hypertransport always lags the single socket support. It will be the same for HT3.0 vis a vis Altair, Budapest.

    I’m guessing you are a desktop guy and are thinking of the desktop quad-core called Altair or its server cousin called Budapest, which will require Socket F+ and are single-socket players. Now it’s time to talk about Socket F+. Socket F+ is HT3.0 compatible and from an SMP server perspective is relevant in the Shangai rev, not Barcelona. If you stick a Barcelona in a Socket F (HT2.0) it plays HT2.0. That is how HT works. When Socket F+/HT 3.0 comes out in 2008, things will be different.

    By the way, after calling me a liar, I’m only letting your comment through in case there are others who have the same mistaken interpretation of what this blog thread is about. I’m lot lying about anything. There are no measurements to lie about for heaven’s sake! AMD doesn’t have any, I don’t have any. They have projections though, and I’m using them.

    I’m taking AMD’s projections (which are Barcelona NOT Budapest Socket F+ HT 3.0) and putting them into the context of Oracle licensing. No more, no less. People are generally not aware that when you add cores you have to amortize the cost of Oracle licensing over significantly improved performance. If the performance gain is not significant then Oracle licensing is going to make Barcelona less attractive. It is clear you did not read, or did not understand, the whole thread.

    This is a blog about Oracle.

  14. 14 Barcelona Blog June 30, 2008 at 1:20 pm

    Keep up the good work guys, this is surely one of the better Oracle blogs on the net.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,987 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: