But my, oh my, how I’ve tried. OK, I guess my new name is Fan Boy. I know for a fact that I’ve been pretty relentless on this particular server for over 100 days of its current 215-day life.
-sh-3.00$ cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 3) -sh-3.00$ uptime 14:41:17 up 215 days, 14:32, 15 users, load average: 37.85, 37.48, 25.89
And, top(1):
top - 14:40:44 up 215 days, 14:31, 15 users, load average: 40.91, 38.05, 25.62 Tasks: 309 total, 30 running, 278 sleeping, 0 stopped, 1 zombie Cpu0 : 92.8% us, 7.2% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 90.1% us, 9.9% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu2 : 89.3% us, 9.8% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.9% hi, 0.0% si Cpu3 : 90.1% us, 9.9% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu4 : 89.2% us, 9.9% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.9% hi, 0.0% si Cpu5 : 89.1% us, 10.9% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu6 : 92.8% us, 7.2% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu7 : 93.7% us, 6.3% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 10393736k total, 9347616k used, 1046120k free, 1892k buffers Swap: 10288440k total, 838236k used, 9450204k free, 6264396k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14919 kclosson 15 0 120m 84m 7076 S 30.3 0.8 0:17.67 sqlldr 14942 kclosson 15 0 119m 84m 7068 S 29.4 0.8 0:17.75 sqlldr 14940 kclosson 15 0 120m 84m 7068 S 28.6 0.8 0:16.21 sqlldr 15008 kclosson 16 0 668m 35m 29m R 28.6 0.3 0:16.48 oracle 14924 kclosson 15 0 119m 84m 7076 R 26.8 0.8 0:16.39 sqlldr 14932 kclosson 16 0 120m 84m 7068 R 26.8 0.8 0:17.07 sqlldr 14959 kclosson 15 0 668m 34m 29m S 25.9 0.3 0:15.96 oracle 14961 kclosson 16 0 668m 34m 29m R 25.9 0.3 0:14.90 oracle 14945 kclosson 15 0 119m 84m 7076 S 25.0 0.8 0:16.07 sqlldr 14980 kclosson 15 0 668m 34m 29m S 25.0 0.3 0:15.09 oracle 14935 kclosson 16 0 119m 84m 7068 S 24.1 0.8 0:15.05 sqlldr 14947 kclosson 16 0 119m 84m 7072 R 24.1 0.8 0:15.90 sqlldr 14943 kclosson 15 0 119m 84m 7076 R 23.2 0.8 0:14.75 sqlldr 14938 kclosson 16 0 120m 84m 7068 S 22.3 0.8 0:14.35 sqlldr 14941 kclosson 15 0 119m 84m 7076 R 22.3 0.8 0:15.96 sqlldr 14951 kclosson 15 0 120m 84m 7068 S 22.3 0.8 0:16.96 sqlldr 14921 kclosson 16 0 120m 84m 7068 R 21.4 0.8 0:17.84 sqlldr 14934 kclosson 15 0 120m 84m 7076 S 21.4 0.8 0:16.13 sqlldr 14929 kclosson 15 0 119m 84m 7076 R 20.5 0.8 0:17.70 sqlldr 14950 kclosson 16 0 119m 84m 7068 R 20.5 0.8 0:13.63 sqlldr 14922 kclosson 15 0 120m 84m 7068 S 19.6 0.8 0:17.40 sqlldr 14977 kclosson 15 0 668m 34m 29m R 18.7 0.3 0:16.38 oracle 15002 kclosson 16 0 668m 34m 29m R 18.7 0.3 0:15.00 oracle 14920 kclosson 16 0 119m 84m 7076 R 17.8 0.8 0:17.97 sqlldr 14923 kclosson 16 0 119m 84m 7068 R 17.0 0.8 0:13.44 sqlldr 14925 kclosson 16 0 120m 84m 7068 S 17.0 0.8 0:13.06 sqlldr 14927 kclosson 16 0 119m 84m 7076 R 17.0 0.8 0:15.05 sqlldr 14931 kclosson 16 0 119m 84m 7076 R 17.0 0.8 0:15.18 sqlldr 14957 kclosson 15 0 668m 34m 28m S 17.0 0.3 0:14.16 oracle 14930 kclosson 16 0 120m 84m 7068 R 16.1 0.8 0:15.31 sqlldr 14986 kclosson 15 0 668m 34m 29m R 16.1 0.3 0:14.37 oracle 14936 kclosson 15 0 119m 84m 7068 S 15.2 0.8 0:15.58 sqlldr 14964 kclosson 15 0 668m 34m 29m S 15.2 0.3 0:17.10 oracle 15014 kclosson 15 0 668m 34m 28m S 12.5 0.3 0:12.83 oracle 14949 kclosson 16 0 120m 84m 7076 S 7.1 0.8 0:15.70 sqlldr 14955 kclosson 16 0 666m 35m 31m R 4.5 0.4 0:03.11 oracle 14966 kclosson 16 0 666m 35m 31m R 4.5 0.3 0:02.80 oracle 14998 kclosson 15 0 666m 35m 31m S 4.5 0.3 0:02.68 oracle
We have a 6 way RAC cluster running 16 core AMD servers in production running RHAS4 x64 across SilverStorm IB. While we had memory errors from two of the servers (thus demonstrating the fail over capabilities), the original three have been real troopers for us.
16:43:20 up 288 days, 16:46, 3 users, load average: 11.49, 10.45, 8.63
The load average does not even give a taste of how well they run during our peak times (load over 30 for hours at a time). Glad to hear of others having the same success we are with this platform.
Wow, CSAB, 16 core? That must be the Sun 4600 ? And just so I’m straight, you mean a 6-node RAC cluster?
Correct. 6 @ x4600 each w/ 8 boards AMD dual core and 64GB of RAM. All running RAC. And yes, this is production and not a concept or benchmark system.
CPU’s of the first ones purchased.
processor : 15
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8218
stepping : 2
cpu MHz : 2600.027
cache size : 1024 KB
physical id : 7
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 5199.32
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
CPU’s of the second servers purchased:
processor : 15
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 2800.053
cache size : 1024 KB
physical id : 7
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 5599.25
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
CSAB,
Cool. I’ve never “talked” to anyone that has these 4600s in production. I always presumed they’d work just fine–depending on workload of course.
I’m anxiously awaiting the 32 core DL785… we’re using HP gear (currently DL585’s) for our RAC clusters, so it’s only a matter of time… 🙂
no_treble,
Do you intend to condense down? That is, do you intend to reduce the degree of horizontal scalability or will you tend to stay with the node count you are at and “fatten” the nodes?
Currently our group only has a 2-node cluster in production, with four other 2-nodes in testing and development, and one 3-node in testing. So we’d probably just leverage the extra hardware and stay 2 or 3 node.
In my SA/SE career I’ve had much greater success with many smaller systems making up a cluster (non-RAC, OS-level clusters), not only for greater overall uptime, but for the “invisibility-factor” of one node dropping out if there’s trouble or for maintenance. But the way our people want to use RAC here, the focus seems to be more on the HA benefits than HPC. So if we’re only going to be rolling out two nodes for HA, it seems like it would be better to fatten them up.
We are currently running a mission critical 11i system with database running on a 20 CPU dual-core/1.5 GHz SunFire 20k server. This is a pretty costly solution for us. I am looking at the possibility of using a few dual-core 8-way AMD-based 4600 servers via RAC. It will definitely help us reduce the cost without sacrificing the performance. Would someone like to comment on this.
When choosing to upgrade RevF CPUS, the 2222s (3.0/1000) have the best write performance of any processor in the family (including the 3.2Ghz parts). With the core being an integral number of the HT baseband, bus cyles are used efficiently.
Hi Kevin,
Can you please let me know what kind of load were you running on the 4 socket AMD?
We have been testing a RAC cluster with Dell 2950’s (2 Quad Core) and these systems were unable to sustain a run-queue of 15+. While oracle does not crash, nor does Linux, the system is so painfully slow that you cannot use it for any practical purposes. The loads we were running were primarily complex Datawarehouse extracts.
Also I would like your opinion on parallel_instance_groups and splitting loads across nodes. All from oracle and dell recommend to not split loads across nodes.
Thanks
Krishna
The workload was software development and test….
What OS are you running on this slow 2950? Is your DW extraction split across the cluster?
Hi Kevin,
The OS is Redhat AS 4.0 with Veritas Cluster Filesystem. Splitting the extracts across nodes become a rather contentious topic as Oracle and Dell – both have recommended to not split across nodes.
We are going to be testing it anyway.
Thanks
Krishna