A reader posted an interesting comment on the latest installment on my thread about Oracle licensing on the upcoming AMD Barcelona processor. The comment as posted on my blog article entitled AMD Quad-Core “Barcelona” Processor For Oracle (Part IV) and the Web 2.0 Trolls states:
The problem with your numbers is that they are based on old AMD marketing materials. AMD has had a chance to run their engineering samples at their second stepping (they are now gearing up full production for late Q2 delivery – 12 weeks from wafer starts) and they are currently claiming a 40% advantage on Clovertown versus the 70% over the Opteron 2200 from their pre-A0 stepping marketing material.
The AMD claim was covered in this ZDNet article which quotes AMD Vice President Randy Allen as follows:
We expect across a wide variety of workloads for Barcelona to outperform Clovertown by 40 percent,” Allen said. The quad-core chip also will outperform AMD’s current dual-core Opterons on “floating point” mathematical calculations by a factor of 3.6 at the same clock rate, he said.
That is a significantly different set of projections than I covered in my article entitled AMD Quad-core “Barcelona” Processor For Oracle (Part II). That article covers AMD’s initial OLTP projections of 70% OLTP improvement on a per-processor (socket) over Opteron 2200. These new projections are astounding, and I would love to see it be the case for the sake of competition. Let’s take a closer look.
Hypertransport Bandwidth
I’m glad AMD has set expectations by stating the 40% uplift over Clovertown would be realized for “a wide variety of workloads.” However, since this is an Oracle blog I would much have preferred to see OLTP mentioned specifically. The numbers are hard to imagine, and it is all about feeding the processor, not the processor itself. The Barcelona processor is socket-compatible with Socket F. Any improvement of Opteron 2200/8200 would require existing headroom on the Hypertransport for workloads like OLTP. A lot of headroom—let’s look at the numbers.
The Socket F baseline that the original AMD projections were based on was 139,693 TpmC. If OLTP is included in the “wide variety of workloads”, then the projected OLTP throughput would be Clovertown 222,117 TpmC x 1.4, or 310.963 TpmC—all things being equal. This represents 2.2 times the throughput from the same Socket F/Hypertransport setup. Time for a show of hands, how many folks out there think that the Opteron 2200 OLTP result of 139,693 TpmC was achieved with more then 50% headroom to spare on the Hyptertransports? I would love to see Barcelona come in with this sort of OLTP throughput, but folks, systems are not made with more than 200% bus bandwidth than the processors need. I’m not very hopeful.
Bear in mind that today’s Tulsa processor as packaged in the IBM System x 3950 is capable of 331,087 TpmC with 8 cores. So, let’s factor our Oracle licensing in and see what the numbers look like if AMD’s projections apply to OLTP:
Opteron 2200 4 core: 139,693 TpmC, 2 licenses = 69,846 per license
Clovertown 8 core: 222,117 TpmC, 4 licenses = 55,529 per license
AMD Old Projection 8 core: 237,478 TpmC, 4 licenses = 59,369 per license
AMD New Projection 8 core: 310,963 TpmC, 4 licenses = 77,740 per license
Tulsa 8 core: 331,087 TpmC, 4 licenses = 82,771 per license
Barcelona Floating Point
FPU performance doesn’t matter to Oracle as I point out in this blog entry.
Clock Speed
The news about the expected 40% jump over Clovertown was accompanied by the news that Barcelona will clock in at a lower speed than Opteron 2200/8200 processors. I haven’t mentioned that aspect—because with Oracle it really doesn’t matter much. The amount of work Oracle gets done in cache is essentially nill. I’ll blog about clock speed with Opterons very soon.
Kevin,
I wonder if you might comment on the memory system in the Sun X4600 and its potential for use for as a mid sized (5-10TB) Oracle Warehouse server. In particular, is the Hypertransport likely to be the bottleneck in this machine, when fully suffed with 8 x 2 core Opteron processors?
Any thoughts on Linux vs Solaris X64 on such a box?
thanks for your input.
–Peter
Peter,
If by warehousing you mean PQO, I think it would likely be a great system. Remember that it is a “2-hop” box so SGA work will be 50% 2-hop remote references and 87.5% remote memory overall. So without some NUMA awareness in Oracle I think this would likely not get the bang for your buck on the OLTP side. Now, PQO on the other hand, hammers PGA mostly and you should be able to get PGA memory allocated out of local memory. I’m about to discuss how to do that with Red Hat Linux, but I’d have to ask someone how to do that with Solaris. If there is no way to force a process to allocate heap from node-local memory with Solaris, then I’d be pretty pessimistic.
Please keep an eye out for the upcoming blog entry I’ll be making on forcing node-local allocation by hand. I’ll try to put a section in there about how the same thing can, or can’t, be done with Solaris.
We join our hero, one year later ….
Have you had an opportunity to compare Oracle performance of the AMD 23xx “Barcelona” and Intel E54xx “Harpertown” now that these CPUs are in shipping servers?
Is the throughput measurement shown here a meaningful indicator of an AMD advantage for some aspects of running 10gR2 or 11g? http://www.hardware.info/en-UK/articles/amdnZWppZGWa/New_quad_core_server_CPUs_AMD_Barcelona_vs_Intel_Harpertown/18
Hi Rob,
I have not tested them myself. No, that URL wouldn’t hint an advantage for Oracle Database. On the other hand, look at page 16 of that report (SPEC int). Now that… is more in line with a Oracle. Remember, Oracle is a load and store, integer-rich workload.
I wish they would have done that testing with Linux though.