Recent Oracle 8-Socket Xeon E7 TPC-C Result. Big NUMA Box, No NUMA Parameters.

I’ve read through the full disclosure report from Oracle’s January 2012 TPC-C. I’ve found that the result was obtained without using any NUMA init.ora parameters (e.g., enable_NUMA_support). The storage was a collection of Sun x64 servers running COMSTAR to serve up F5100 flash storage. The storage connectivity was 8GFC fibre channel. This was a non-RAC result with 8s80c160t Xeon E7. The only things that stand out to me are:

  1. The settings of disk_async_io=TRUE. This was ASM on raw disk so I should think ASYNC would be the default. Interesting.
  2. Overriding the default number of DBWR processes by setting db_writer_processes. The default number of DBWR processes would be 20 so the benchmark team increased that 60%. Since sockets are NUMA “nodes” on this architecture the default of 20 would render 2.5 DBWR per “node.” In my experience it is beneficial to have DBWR processes an equal multiple of the number of sockets (NUMA nodes) so if the benchmark team was thinking the way I think they went with 4x socket count.

The FDR is here: http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_X4800-M2_TPCC_OL-UEK-FDR_011712.pdf

For more information about the missing enable_NUMA_support parameter see: Meet _enable_NUMA_support: The if-then-else Oracle Database 11g Release 2 Initialization Parameter.

For a lot more about NUMA as it pertains to Oracle, please visit: QPI-Based Systems Related Topics (e.g., Nehalem EP/EX, Westmere EP, etc)

On the topic of increasing DBWR processes I’d like to point out that doing so isn’t one of those “some is good so more must be better” situations. For more reading on that matter I recommend:

Over-Configuring DBWR Processes Part I

Over-Configuring DBWR Processes Part II

Over-Configuring DBWR Processes Part III

Over-Configuring DBWR Processes Part IV

The parameters:

Got A Big NUMA Box For Running Oracle? Take Care To Get Interrupt Handling Spread Across The Sockets Evenly
Page 310 of the FDR shows the following script used to arrange good affinity between the FC HBA device drivers and the sockets. I had to do the same sort of thing with the x4800 (aka Exadata X2-8) back before I left Oracle’s Exadata development organization. This sort of thing is standard but I wanted to bring the concept to your attention:


#!/bin/bash
 service irqbalance stop
 last_node=-1
 declare -i count=0
 declare -i cpu cpu1 cpu2 cpu3 cpu4
 for dir in /sys/bus/pci/drivers/qla2xxx/0000*
do
 node=`cat $dir/numa_node`
 irqs=`cat $dir/msi_irqs`
 if [ "`echo $irqs | wc -w`" != "2" ] ; then
 echo >&2 "script expects 2 interrupts per device"
 exit 1
 fi
first_cpu=`sed 's/-.*//' < $dir/local_cpulist` 
echo $node $irqs $first_cpu $dir done | sort | while read node irq1 irq2 cpu1 dir 
do 
cpu2=$cpu1+10 
cpu3=$cpu1+80 
cpu4=$cpu1+90 
if [ "$node" != "$last_node" ]
then 
count=1 cpu=$cpu1 
else 
count=$count+1 
case $count in 
2) cpu=$cpu2;; 
3) cpu=$cpu3;; 
4) cpu=$cpu4;; 
*) echo "more devices than expected on node $node" count=1 cpu=$cpu1;; 
esac 
fi 
last_node=$node 
echo "#$dir" 
echo "echo $cpu > /proc/irq/$irq1/smp_affinity_list"
 echo "echo $cpu > /proc/irq/$irq2/smp_affinity_list"
 echo
 echo $cpu > /proc/irq/$irq1/smp_affinity_list
 echo $cpu > /proc/irq/$irq2/smp_affinity_list
 done

9 Responses to “Recent Oracle 8-Socket Xeon E7 TPC-C Result. Big NUMA Box, No NUMA Parameters.”


  1. 1 goryszewskig March 2, 2012 at 1:17 am

    Thats strange, I see pga_aggregate_target = 0 and parallel_max_servers=0 why is that ?

    • 2 kevinclosson March 2, 2012 at 7:03 am

      It’s TPC-C so none of the Parallel Query Option related settings are needed. It’s common benchmark technique to shut of all the lights that aren’t needed so to speak 🙂

  2. 3 Freek March 2, 2012 at 3:57 am

    Ok, on the risk of being stupid: Why is it interesting that the disk_asynch_io parameter is set to true?
    Is this parameter not always default true and here just being listed as one of the parameters (just like undo_management)?

    • 4 kevinclosson March 2, 2012 at 7:10 am

      Hi Freek,

      I’m not trying to raise red flags. I just pointed out the only settings that looked odd. There are hundreds of settings in the instance that were left to default (not specifically set). It just seemed odd to me that this one would be necessary to set. Full disclosure: I haven’t taken the time to see what the default for that setting is. I should think that is the default.

  3. 5 Jason March 2, 2012 at 11:57 am

    Could you have the script at the bottom of your post either in a downloadable form, or maybe embedded in a syntax highlighting plugin (like wp-syntax) so it can be read more easily? The long line is really goofing up the rendering.

    –Jason

  4. 6 Matt March 2, 2012 at 4:56 pm

    no NUMA support because of a clueless architecture vs. say HP’s prema architecture.
    The cost of a miss might not be that expensive on a x4800 with evenly distributed IRQs and the use of an OS that is much more stringent with IO/Processor affinity.

    Stills seems like Oracle would want to run a TPC-C with Exadata though.
    Maybe they don’t want to because it will be replaced with the SUN SuperCluster platform…..

    • 7 kevinclosson March 3, 2012 at 6:38 am

      It is certainly NUMA, Matt. The lack of “glue (ala PREMA or X5) doesn’t disqualify the x4800 from being able to exploit NUMA (via enabling built-in Oracle NUMA support via _enable_NUMA_support). It’s quite the opposite in fact. The tighter the NUMA architecture, the less need for application software NUMA awareness. I’m quite surprised the x4800 was able to do a decent TPC-C without getting the age-old _enable_NUMA_support underpinnings functioning well in the database code and enabled.

      Now, having said all that, it may just be the case that the Oracle database bits they used just enable the NUMA awareness code by default based on socket-count discovery. I don’t know because I not longer have access to the source for Oracle Database.

      Finally, while the 4,803,718 TpmC is a huge number, on a per-core basis it still shows that big machines are usually not fast machines as evidenced by the fact that 4.8 million on this platform is 60,000 TpmC/core whereas the recent Cisco UCS+Oracle result of 1,053,100 on 12 WSM-EP cores comes out to roughly 88,000 TpmC/core. The 4.8 million result requires a database that is about 47% larger than the Cisco UCS result as per rules of scale in TPC-C. That’s a shame because it would be really interesting to see what the x4800 could do running the scale used on the 12 core Cisco UCS. When you buy software on a per-core basis it’s nice to compare apples to apples on a per-core basis.

  5. 8 rreynolds March 19, 2012 at 1:24 pm

    Thanks for pointing this report out. One thing that jumps out at me is this, from page 47:
    “Product Availability – Date:
    Oracle Database 11g Release 2 with Partitioning for OEL
    6.1 – June 26, 2012”

    I’m new to the Oracle on Linux world, as I’m prepping to move off AIX. I may have missed something, but I’ve heard no dates for 11gR2 certification on OEL 6. Is this the first word we’ve heard from Oracle that they’re finally adding RHEL/OEL 6 to the certification matrix?


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services but all of the words on this blog are purely my own. Not a single word on this blog is to be mistaken as originating from any Amazon spokesperson. You are reading my words this webpage and, while I work at Amazon, all of the words on this webpage reflect my own opinions and findings. To put it another way, "I work at Amazon, but this is my own opinion." To conclude, this is not an official Amazon information outlet. There are no words on this blog that should be mistaken as official Amazon messaging. Every single character of text on this blog originates in my head and I am not an Amazon spokesperson.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,894 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: