Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.

But my, oh my, how I’ve tried. OK, I guess my new name is Fan Boy. I know for a fact that I’ve been pretty relentless on this particular server for over 100 days of its current 215-day life.

-sh-3.00$ cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 3)

-sh-3.00$ uptime
 14:41:17 up 215 days, 14:32, 15 users,  load average: 37.85, 37.48, 25.89

And, top(1):

  top - 14:40:44 up 215 days, 14:31, 15 users,  load average: 40.91, 38.05, 25.62
Tasks: 309 total,  30 running, 278 sleeping,   0 stopped,   1 zombie
Cpu0  : 92.8% us,  7.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1  : 90.1% us,  9.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  : 89.3% us,  9.8% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.9% hi,  0.0% si
Cpu3  : 90.1% us,  9.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu4  : 89.2% us,  9.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.9% hi,  0.0% si
Cpu5  : 89.1% us, 10.9% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu6  : 92.8% us,  7.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu7  : 93.7% us,  6.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:  10393736k total,  9347616k used,  1046120k free,     1892k buffers
Swap: 10288440k total,   838236k used,  9450204k free,  6264396k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14919 kclosson  15   0  120m  84m 7076 S 30.3  0.8   0:17.67 sqlldr
14942 kclosson  15   0  119m  84m 7068 S 29.4  0.8   0:17.75 sqlldr
14940 kclosson  15   0  120m  84m 7068 S 28.6  0.8   0:16.21 sqlldr
15008 kclosson  16   0  668m  35m  29m R 28.6  0.3   0:16.48 oracle
14924 kclosson  15   0  119m  84m 7076 R 26.8  0.8   0:16.39 sqlldr
14932 kclosson  16   0  120m  84m 7068 R 26.8  0.8   0:17.07 sqlldr
14959 kclosson  15   0  668m  34m  29m S 25.9  0.3   0:15.96 oracle
14961 kclosson  16   0  668m  34m  29m R 25.9  0.3   0:14.90 oracle
14945 kclosson  15   0  119m  84m 7076 S 25.0  0.8   0:16.07 sqlldr
14980 kclosson  15   0  668m  34m  29m S 25.0  0.3   0:15.09 oracle
14935 kclosson  16   0  119m  84m 7068 S 24.1  0.8   0:15.05 sqlldr
14947 kclosson  16   0  119m  84m 7072 R 24.1  0.8   0:15.90 sqlldr
14943 kclosson  15   0  119m  84m 7076 R 23.2  0.8   0:14.75 sqlldr
14938 kclosson  16   0  120m  84m 7068 S 22.3  0.8   0:14.35 sqlldr
14941 kclosson  15   0  119m  84m 7076 R 22.3  0.8   0:15.96 sqlldr
14951 kclosson  15   0  120m  84m 7068 S 22.3  0.8   0:16.96 sqlldr
14921 kclosson  16   0  120m  84m 7068 R 21.4  0.8   0:17.84 sqlldr
14934 kclosson  15   0  120m  84m 7076 S 21.4  0.8   0:16.13 sqlldr
14929 kclosson  15   0  119m  84m 7076 R 20.5  0.8   0:17.70 sqlldr
14950 kclosson  16   0  119m  84m 7068 R 20.5  0.8   0:13.63 sqlldr
14922 kclosson  15   0  120m  84m 7068 S 19.6  0.8   0:17.40 sqlldr
14977 kclosson  15   0  668m  34m  29m R 18.7  0.3   0:16.38 oracle
15002 kclosson  16   0  668m  34m  29m R 18.7  0.3   0:15.00 oracle
14920 kclosson  16   0  119m  84m 7076 R 17.8  0.8   0:17.97 sqlldr
14923 kclosson  16   0  119m  84m 7068 R 17.0  0.8   0:13.44 sqlldr
14925 kclosson  16   0  120m  84m 7068 S 17.0  0.8   0:13.06 sqlldr
14927 kclosson  16   0  119m  84m 7076 R 17.0  0.8   0:15.05 sqlldr
14931 kclosson  16   0  119m  84m 7076 R 17.0  0.8   0:15.18 sqlldr
14957 kclosson  15   0  668m  34m  28m S 17.0  0.3   0:14.16 oracle
14930 kclosson  16   0  120m  84m 7068 R 16.1  0.8   0:15.31 sqlldr
14986 kclosson  15   0  668m  34m  29m R 16.1  0.3   0:14.37 oracle
14936 kclosson  15   0  119m  84m 7068 S 15.2  0.8   0:15.58 sqlldr
14964 kclosson  15   0  668m  34m  29m S 15.2  0.3   0:17.10 oracle
15014 kclosson  15   0  668m  34m  28m S 12.5  0.3   0:12.83 oracle
14949 kclosson  16   0  120m  84m 7076 S  7.1  0.8   0:15.70 sqlldr
14955 kclosson  16   0  666m  35m  31m R  4.5  0.4   0:03.11 oracle
14966 kclosson  16   0  666m  35m  31m R  4.5  0.3   0:02.80 oracle
14998 kclosson  15   0  666m  35m  31m S  4.5  0.3   0:02.68 oracle

12 Responses to “Attempted Murder of a 4-Socket AMD Opteron Server with RHEL4. Oracle Can’t Kill It.”


  1. 1 CSAB March 24, 2008 at 8:53 pm

    We have a 6 way RAC cluster running 16 core AMD servers in production running RHAS4 x64 across SilverStorm IB. While we had memory errors from two of the servers (thus demonstrating the fail over capabilities), the original three have been real troopers for us.

    16:43:20 up 288 days, 16:46, 3 users, load average: 11.49, 10.45, 8.63

    The load average does not even give a taste of how well they run during our peak times (load over 30 for hours at a time). Glad to hear of others having the same success we are with this platform.

  2. 2 kevinclosson March 24, 2008 at 9:23 pm

    Wow, CSAB, 16 core? That must be the Sun 4600 ? And just so I’m straight, you mean a 6-node RAC cluster?

  3. 3 CSAB March 24, 2008 at 10:26 pm

    Correct. 6 @ x4600 each w/ 8 boards AMD dual core and 64GB of RAM. All running RAC. And yes, this is production and not a concept or benchmark system.

    CPU’s of the first ones purchased.

    processor : 15
    vendor_id : AuthenticAMD
    cpu family : 15
    model : 65
    model name : Dual-Core AMD Opteron(tm) Processor 8218
    stepping : 2
    cpu MHz : 2600.027
    cache size : 1024 KB
    physical id : 7
    siblings : 2
    core id : 1
    cpu cores : 2
    fpu : yes
    fpu_exception : yes
    cpuid level : 1
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
    bogomips : 5199.32
    TLB size : 1088 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 40 bits physical, 48 bits virtual
    power management: ts fid vid ttp tm stc

    CPU’s of the second servers purchased:

    processor : 15
    vendor_id : AuthenticAMD
    cpu family : 15
    model : 65
    model name : Dual-Core AMD Opteron(tm) Processor 8220
    stepping : 3
    cpu MHz : 2800.053
    cache size : 1024 KB
    physical id : 7
    siblings : 2
    core id : 1
    cpu cores : 2
    fpu : yes
    fpu_exception : yes
    cpuid level : 1
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
    bogomips : 5599.25
    TLB size : 1088 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 40 bits physical, 48 bits virtual
    power management: ts fid vid ttp tm stc

  4. 4 kevinclosson March 25, 2008 at 4:24 am

    CSAB,

    Cool. I’ve never “talked” to anyone that has these 4600s in production. I always presumed they’d work just fine–depending on workload of course.

  5. 5 no_treble March 27, 2008 at 2:41 am

    I’m anxiously awaiting the 32 core DL785… we’re using HP gear (currently DL585’s) for our RAC clusters, so it’s only a matter of time… 🙂

  6. 6 kevinclosson March 27, 2008 at 2:47 pm

    no_treble,

    Do you intend to condense down? That is, do you intend to reduce the degree of horizontal scalability or will you tend to stay with the node count you are at and “fatten” the nodes?

  7. 7 no_treble March 28, 2008 at 12:00 am

    Currently our group only has a 2-node cluster in production, with four other 2-nodes in testing and development, and one 3-node in testing. So we’d probably just leverage the extra hardware and stay 2 or 3 node.

    In my SA/SE career I’ve had much greater success with many smaller systems making up a cluster (non-RAC, OS-level clusters), not only for greater overall uptime, but for the “invisibility-factor” of one node dropping out if there’s trouble or for maintenance. But the way our people want to use RAC here, the focus seems to be more on the HA benefits than HPC. So if we’re only going to be rolling out two nodes for HA, it seems like it would be better to fatten them up.

  8. 8 Amir Hameed April 2, 2008 at 7:35 pm

    We are currently running a mission critical 11i system with database running on a 20 CPU dual-core/1.5 GHz SunFire 20k server. This is a pretty costly solution for us. I am looking at the possibility of using a few dual-core 8-way AMD-based 4600 servers via RAC. It will definitely help us reduce the cost without sacrificing the performance. Would someone like to comment on this.

  9. 9 jeff needham April 17, 2008 at 4:51 am

    When choosing to upgrade RevF CPUS, the 2222s (3.0/1000) have the best write performance of any processor in the family (including the 3.2Ghz parts). With the core being an integral number of the HT baseband, bus cyles are used efficiently.

  10. 10 Krishna Manoharan June 9, 2008 at 9:38 pm

    Hi Kevin,

    Can you please let me know what kind of load were you running on the 4 socket AMD?

    We have been testing a RAC cluster with Dell 2950’s (2 Quad Core) and these systems were unable to sustain a run-queue of 15+. While oracle does not crash, nor does Linux, the system is so painfully slow that you cannot use it for any practical purposes. The loads we were running were primarily complex Datawarehouse extracts.

    Also I would like your opinion on parallel_instance_groups and splitting loads across nodes. All from oracle and dell recommend to not split loads across nodes.

    Thanks
    Krishna

  11. 11 kevinclosson June 11, 2008 at 12:37 am

    The workload was software development and test….

    What OS are you running on this slow 2950? Is your DW extraction split across the cluster?

  12. 12 Krishna Manoharan June 11, 2008 at 3:04 am

    Hi Kevin,

    The OS is Redhat AS 4.0 with Veritas Cluster Filesystem. Splitting the extracts across nodes become a rather contentious topic as Oracle and Dell – both have recommended to not split across nodes.

    We are going to be testing it anyway.

    Thanks
    Krishna


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services but all of the words on this blog are purely my own. Not a single word on this blog is to be mistaken as originating from any Amazon spokesperson. You are reading my words this webpage and, while I work at Amazon, all of the words on this webpage reflect my own opinions and findings. To put it another way, "I work at Amazon, but this is my own opinion." To conclude, this is not an official Amazon information outlet. There are no words on this blog that should be mistaken as official Amazon messaging. Every single character of text on this blog originates in my head and I am not an Amazon spokesperson.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,894 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: