Recent Oracle 8-Socket Xeon E7 TPC-C Result. Big NUMA Box, No NUMA Parameters.

I’ve read through the full disclosure report from Oracle’s January 2012 TPC-C. I’ve found that the result was obtained without using any NUMA init.ora parameters (e.g., enable_NUMA_support). The storage was a collection of Sun x64 servers running COMSTAR to serve up F5100 flash storage. The storage connectivity was 8GFC fibre channel. This was a non-RAC result with 8s80c160t Xeon E7. The only things that stand out to me are:

The settings of disk_async_io=TRUE. This was ASM on raw disk so I should think ASYNC would be the default. Interesting.
Overriding the default number of DBWR processes by setting db_writer_processes. The default number of DBWR processes would be 20 so the benchmark team increased that 60%. Since sockets are NUMA “nodes” on this architecture the default of 20 would render 2.5 DBWR per “node.” In my experience it is beneficial to have DBWR processes an equal multiple of the number of sockets (NUMA nodes) so if the benchmark team was thinking the way I think they went with 4x socket count.

The FDR is here: http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_X4800-M2_TPCC_OL-UEK-FDR_011712.pdf

For more information about the missing enable_NUMA_support parameter see: Meet _enable_NUMA_support: The if-then-else Oracle Database 11g Release 2 Initialization Parameter.

For a lot more about NUMA as it pertains to Oracle, please visit: QPI-Based Systems Related Topics (e.g., Nehalem EP/EX, Westmere EP, etc)

On the topic of increasing DBWR processes I’d like to point out that doing so isn’t one of those “some is good so more must be better” situations. For more reading on that matter I recommend:

Over-Configuring DBWR Processes Part I

Over-Configuring DBWR Processes Part II

Over-Configuring DBWR Processes Part III

Over-Configuring DBWR Processes Part IV

The parameters:

Got A Big NUMA Box For Running Oracle? Take Care To Get Interrupt Handling Spread Across The Sockets Evenly
Page 310 of the FDR shows the following script used to arrange good affinity between the FC HBA device drivers and the sockets. I had to do the same sort of thing with the x4800 (aka Exadata X2-8) back before I left Oracle’s Exadata development organization. This sort of thing is standard but I wanted to bring the concept to your attention:


#!/bin/bash
 service irqbalance stop
 last_node=-1
 declare -i count=0
 declare -i cpu cpu1 cpu2 cpu3 cpu4
 for dir in /sys/bus/pci/drivers/qla2xxx/0000*
do
 node=`cat $dir/numa_node`
 irqs=`cat $dir/msi_irqs`
 if [ "`echo $irqs | wc -w`" != "2" ] ; then
 echo &gt;&amp;2 "script expects 2 interrupts per device"
 exit 1
 fi
first_cpu=`sed 's/-.*//' &lt; $dir/local_cpulist` 
echo $node $irqs $first_cpu $dir done | sort | while read node irq1 irq2 cpu1 dir 
do 
cpu2=$cpu1+10 
cpu3=$cpu1+80 
cpu4=$cpu1+90 
if [ "$node" != "$last_node" ]
then 
count=1 cpu=$cpu1 
else 
count=$count+1 
case $count in 
2) cpu=$cpu2;; 
3) cpu=$cpu3;; 
4) cpu=$cpu4;; 
*) echo "more devices than expected on node $node" count=1 cpu=$cpu1;; 
esac 
fi 
last_node=$node 
echo "#$dir" 
echo "echo $cpu &gt; /proc/irq/$irq1/smp_affinity_list"
 echo "echo $cpu &gt; /proc/irq/$irq2/smp_affinity_list"
 echo
 echo $cpu &gt; /proc/irq/$irq1/smp_affinity_list
 echo $cpu &gt; /proc/irq/$irq2/smp_affinity_list
 done

9 Responses to “Recent Oracle 8-Socket Xeon E7 TPC-C Result. Big NUMA Box, No NUMA Parameters.”

Feed for this Entry Trackback Address

1 goryszewskig March 2, 2012 at 1:17 am

Thats strange, I see pga_aggregate_target = 0 and parallel_max_servers=0 why is that ?

- 2 kevinclosson March 2, 2012 at 7:03 am
  
  It’s TPC-C so none of the Parallel Query Option related settings are needed. It’s common benchmark technique to shut of all the lights that aren’t needed so to speak 🙂
  
3 Freek March 2, 2012 at 3:57 am

Ok, on the risk of being stupid: Why is it interesting that the disk_asynch_io parameter is set to true?
Is this parameter not always default true and here just being listed as one of the parameters (just like undo_management)?

- 4 kevinclosson March 2, 2012 at 7:10 am
  
  Hi Freek,
  
  I’m not trying to raise red flags. I just pointed out the only settings that looked odd. There are hundreds of settings in the instance that were left to default (not specifically set). It just seemed odd to me that this one would be necessary to set. Full disclosure: I haven’t taken the time to see what the default for that setting is. I should think that is the default.
  
5 Jason March 2, 2012 at 11:57 am

Could you have the script at the bottom of your post either in a downloadable form, or maybe embedded in a syntax highlighting plugin (like wp-syntax) so it can be read more easily? The long line is really goofing up the rendering.

–Jason

6 Matt March 2, 2012 at 4:56 pm

no NUMA support because of a clueless architecture vs. say HP’s prema architecture.
The cost of a miss might not be that expensive on a x4800 with evenly distributed IRQs and the use of an OS that is much more stringent with IO/Processor affinity.

Stills seems like Oracle would want to run a TPC-C with Exadata though.
Maybe they don’t want to because it will be replaced with the SUN SuperCluster platform…..

- 7 kevinclosson March 3, 2012 at 6:38 am
  
  It is certainly NUMA, Matt. The lack of “glue (ala PREMA or X5) doesn’t disqualify the x4800 from being able to exploit NUMA (via enabling built-in Oracle NUMA support via _enable_NUMA_support). It’s quite the opposite in fact. The tighter the NUMA architecture, the less need for application software NUMA awareness. I’m quite surprised the x4800 was able to do a decent TPC-C without getting the age-old _enable_NUMA_support underpinnings functioning well in the database code and enabled.
  
  Now, having said all that, it may just be the case that the Oracle database bits they used just enable the NUMA awareness code by default based on socket-count discovery. I don’t know because I not longer have access to the source for Oracle Database.
  
  Finally, while the 4,803,718 TpmC is a huge number, on a per-core basis it still shows that big machines are usually not fast machines as evidenced by the fact that 4.8 million on this platform is 60,000 TpmC/core whereas the recent Cisco UCS+Oracle result of 1,053,100 on 12 WSM-EP cores comes out to roughly 88,000 TpmC/core. The 4.8 million result requires a database that is about 47% larger than the Cisco UCS result as per rules of scale in TPC-C. That’s a shame because it would be really interesting to see what the x4800 could do running the scale used on the 12 core Cisco UCS. When you buy software on a per-core basis it’s nice to compare apples to apples on a per-core basis.
  
8 rreynolds March 19, 2012 at 1:24 pm

Thanks for pointing this report out. One thing that jumps out at me is this, from page 47:
“Product Availability – Date:
Oracle Database 11g Release 2 with Partitioning for OEL
6.1 – June 26, 2012”

I’m new to the Oracle on Linux world, as I’m prepping to move off AIX. I may have missed something, but I’ve heard no dates for 11gR2 certification on OEL 6. Is this the first word we’ve heard from Oracle that they’re finally adding RHEL/OEL 6 to the certification matrix?

- 9 kevinclosson March 19, 2012 at 4:08 pm
  
  Interesting find, Ryan. The old trick is to pull the result before the 6 months if things don’t meet dates. That’s fair play. We’ll see.

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage