Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.

Published March 24, 2008 oracle 12 Comments

But my, oh my, how I’ve tried. OK, I guess my new name is Fan Boy. I know for a fact that I’ve been pretty relentless on this particular server for over 100 days of its current 215-day life.

-sh-3.00$ cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 3)

-sh-3.00$ uptime
 14:41:17 up 215 days, 14:32, 15 users,  load average: 37.85, 37.48, 25.89

And, top(1):

  top - 14:40:44 up 215 days, 14:31, 15 users,  load average: 40.91, 38.05, 25.62
Tasks: 309 total,  30 running, 278 sleeping,   0 stopped,   1 zombie
Cpu0  : 92.8% us,  7.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1  : 90.1% us,  9.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  : 89.3% us,  9.8% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.9% hi,  0.0% si
Cpu3  : 90.1% us,  9.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu4  : 89.2% us,  9.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.9% hi,  0.0% si
Cpu5  : 89.1% us, 10.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu6  : 92.8% us,  7.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu7  : 93.7% us,  6.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:  10393736k total,  9347616k used,  1046120k free,     1892k buffers
Swap: 10288440k total,   838236k used,  9450204k free,  6264396k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14919 kclosson  15   0  120m  84m 7076 S 30.3  0.8   0:17.67 sqlldr
14942 kclosson  15   0  119m  84m 7068 S 29.4  0.8   0:17.75 sqlldr
14940 kclosson  15   0  120m  84m 7068 S 28.6  0.8   0:16.21 sqlldr
15008 kclosson  16   0  668m  35m  29m R 28.6  0.3   0:16.48 oracle
14924 kclosson  15   0  119m  84m 7076 R 26.8  0.8   0:16.39 sqlldr
14932 kclosson  16   0  120m  84m 7068 R 26.8  0.8   0:17.07 sqlldr
14959 kclosson  15   0  668m  34m  29m S 25.9  0.3   0:15.96 oracle
14961 kclosson  16   0  668m  34m  29m R 25.9  0.3   0:14.90 oracle
14945 kclosson  15   0  119m  84m 7076 S 25.0  0.8   0:16.07 sqlldr
14980 kclosson  15   0  668m  34m  29m S 25.0  0.3   0:15.09 oracle
14935 kclosson  16   0  119m  84m 7068 S 24.1  0.8   0:15.05 sqlldr
14947 kclosson  16   0  119m  84m 7072 R 24.1  0.8   0:15.90 sqlldr
14943 kclosson  15   0  119m  84m 7076 R 23.2  0.8   0:14.75 sqlldr
14938 kclosson  16   0  120m  84m 7068 S 22.3  0.8   0:14.35 sqlldr
14941 kclosson  15   0  119m  84m 7076 R 22.3  0.8   0:15.96 sqlldr
14951 kclosson  15   0  120m  84m 7068 S 22.3  0.8   0:16.96 sqlldr
14921 kclosson  16   0  120m  84m 7068 R 21.4  0.8   0:17.84 sqlldr
14934 kclosson  15   0  120m  84m 7076 S 21.4  0.8   0:16.13 sqlldr
14929 kclosson  15   0  119m  84m 7076 R 20.5  0.8   0:17.70 sqlldr
14950 kclosson  16   0  119m  84m 7068 R 20.5  0.8   0:13.63 sqlldr
14922 kclosson  15   0  120m  84m 7068 S 19.6  0.8   0:17.40 sqlldr
14977 kclosson  15   0  668m  34m  29m R 18.7  0.3   0:16.38 oracle
15002 kclosson  16   0  668m  34m  29m R 18.7  0.3   0:15.00 oracle
14920 kclosson  16   0  119m  84m 7076 R 17.8  0.8   0:17.97 sqlldr
14923 kclosson  16   0  119m  84m 7068 R 17.0  0.8   0:13.44 sqlldr
14925 kclosson  16   0  120m  84m 7068 S 17.0  0.8   0:13.06 sqlldr
14927 kclosson  16   0  119m  84m 7076 R 17.0  0.8   0:15.05 sqlldr
14931 kclosson  16   0  119m  84m 7076 R 17.0  0.8   0:15.18 sqlldr
14957 kclosson  15   0  668m  34m  28m S 17.0  0.3   0:14.16 oracle
14930 kclosson  16   0  120m  84m 7068 R 16.1  0.8   0:15.31 sqlldr
14986 kclosson  15   0  668m  34m  29m R 16.1  0.3   0:14.37 oracle
14936 kclosson  15   0  119m  84m 7068 S 15.2  0.8   0:15.58 sqlldr
14964 kclosson  15   0  668m  34m  29m S 15.2  0.3   0:17.10 oracle
15014 kclosson  15   0  668m  34m  28m S 12.5  0.3   0:12.83 oracle
14949 kclosson  16   0  120m  84m 7076 S  7.1  0.8   0:15.70 sqlldr
14955 kclosson  16   0  666m  35m  31m R  4.5  0.4   0:03.11 oracle
14966 kclosson  16   0  666m  35m  31m R  4.5  0.3   0:02.80 oracle
14998 kclosson  15   0  666m  35m  31m S  4.5  0.3   0:02.68 oracle

12 Responses to “Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.”

Feed for this Entry Trackback Address

1 CSAB March 24, 2008 at 8:53 pm

We have a 6 way RAC cluster running 16 core AMD servers in production running RHAS4 x64 across SilverStorm IB. While we had memory errors from two of the servers (thus demonstrating the fail over capabilities), the original three have been real troopers for us.

16:43:20 up 288 days, 16:46, 3 users, load average: 11.49, 10.45, 8.63

The load average does not even give a taste of how well they run during our peak times (load over 30 for hours at a time). Glad to hear of others having the same success we are with this platform.

Reply
2 kevinclosson March 24, 2008 at 9:23 pm

Wow, CSAB, 16 core? That must be the Sun 4600 ? And just so I’m straight, you mean a 6-node RAC cluster?

Reply
3 CSAB March 24, 2008 at 10:26 pm

Correct. 6 @ x4600 each w/ 8 boards AMD dual core and 64GB of RAM. All running RAC. And yes, this is production and not a concept or benchmark system.

CPU’s of the first ones purchased.

processor : 15
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8218
stepping : 2
cpu MHz : 2600.027
cache size : 1024 KB
physical id : 7
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 5199.32
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

CPU’s of the second servers purchased:

processor : 15
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 2800.053
cache size : 1024 KB
physical id : 7
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 5599.25
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

Reply
4 kevinclosson March 25, 2008 at 4:24 am

CSAB,

Cool. I’ve never “talked” to anyone that has these 4600s in production. I always presumed they’d work just fine–depending on workload of course.

Reply
5 no_treble March 27, 2008 at 2:41 am

I’m anxiously awaiting the 32 core DL785… we’re using HP gear (currently DL585’s) for our RAC clusters, so it’s only a matter of time… 🙂

Reply
6 kevinclosson March 27, 2008 at 2:47 pm

no_treble,

Do you intend to condense down? That is, do you intend to reduce the degree of horizontal scalability or will you tend to stay with the node count you are at and “fatten” the nodes?

Reply
7 no_treble March 28, 2008 at 12:00 am

Currently our group only has a 2-node cluster in production, with four other 2-nodes in testing and development, and one 3-node in testing. So we’d probably just leverage the extra hardware and stay 2 or 3 node.

In my SA/SE career I’ve had much greater success with many smaller systems making up a cluster (non-RAC, OS-level clusters), not only for greater overall uptime, but for the “invisibility-factor” of one node dropping out if there’s trouble or for maintenance. But the way our people want to use RAC here, the focus seems to be more on the HA benefits than HPC. So if we’re only going to be rolling out two nodes for HA, it seems like it would be better to fatten them up.

Reply
8 Amir Hameed April 2, 2008 at 7:35 pm

We are currently running a mission critical 11i system with database running on a 20 CPU dual-core/1.5 GHz SunFire 20k server. This is a pretty costly solution for us. I am looking at the possibility of using a few dual-core 8-way AMD-based 4600 servers via RAC. It will definitely help us reduce the cost without sacrificing the performance. Would someone like to comment on this.

Reply
9 jeff needham April 17, 2008 at 4:51 am

When choosing to upgrade RevF CPUS, the 2222s (3.0/1000) have the best write performance of any processor in the family (including the 3.2Ghz parts). With the core being an integral number of the HT baseband, bus cyles are used efficiently.

Reply
10 Krishna Manoharan June 9, 2008 at 9:38 pm

Hi Kevin,

Can you please let me know what kind of load were you running on the 4 socket AMD?

We have been testing a RAC cluster with Dell 2950’s (2 Quad Core) and these systems were unable to sustain a run-queue of 15+. While oracle does not crash, nor does Linux, the system is so painfully slow that you cannot use it for any practical purposes. The loads we were running were primarily complex Datawarehouse extracts.

Also I would like your opinion on parallel_instance_groups and splitting loads across nodes. All from oracle and dell recommend to not split loads across nodes.

Thanks
Krishna

Reply
11 kevinclosson June 11, 2008 at 12:37 am

The workload was software development and test….

What OS are you running on this slow 2950? Is your DW extraction split across the cluster?

Reply
12 Krishna Manoharan June 11, 2008 at 3:04 am

Hi Kevin,

The OS is Redhat AS 4.0 with Veritas Cluster Filesystem. Splitting the extracts across nodes become a rather contentious topic as Oracle and Dell – both have recommended to not split across nodes.

We are going to be testing it anyway.

Thanks
Krishna

Reply

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage

Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.

12 Responses to “Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.”

Leave a comment Cancel reply

DISCLAIMER

Pages

Blogroll

Follow Blog via Email

Recent Posts

Recent Comments

Fond Memories

Copyright

Kevin Closson's Blog: Platforms, Databases and Storage

Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.

Share this:

Related

12 Responses to “Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.”

Leave a comment Cancel reply

DISCLAIMER

Pages

Blogroll

Follow Blog via Email

Recent Posts

Recent Comments

Fond Memories

Copyright