I was recently exchanging email with a fellow participant in the oracle-l mailing list about a problem they were having evaluating Oracle on a SAN. It seems their evaluation was completely blown out due to non-stop SAN madness. I wonder if their Storage Administrator is also their Unstructured Data Administrator? I know one thing for certain; this is a shop that needs to join up with the forces of BAARF. When answering my question about how things were going, he responded with:
Poorly. The installation/setup of oracle was fine. The problem was with the SAN the SA gave me. It was an [brand name removed]. I know he set it up as Raid 5 but I don’t know the specifics after that. All I know it was 5x slower than anything we currently have, and they’ve made us use Raid 5 everywhere. We use alot of T3’s with a few SANs sprinkled in. I think 15 days of the 30 day trial was spent hooking up the SAN.
SAN madness indeed! I understand how frustrating infrastructure issues are when they hinder the effort to actually perform the required testing. It isn’t all the fault of the SAN in this situation as it seems there were a few 8i->10g Cost Based Optimizer (CBO) hurdles to overcome as well. He wrote:
I also spent alot of time just trying to get explain plans to match between my 8i database and the new 10g, in order to compare apples to apples.
I hope he didn’t feel responsible for that. I am convinced one of the only humans that really knows the cost based optimizer in practical application is fellow OakTable member Jonathan Lewis.
The email continued by questioning whether a T2000 could replace a 16 CPU E6500. He wrote:
But after all that I still couldn’t see why they thought we could replace our 16 cpu Sun E6500 running full tilt with one T2000. Anyways, that project has been shelved.
SAN Madness
How sad it is that an entire project can get shelved because of SAN madness, but I am not surprised. On the other hand, the idea that a T2000 can supplant a E6500 is not that out of line as my old friend Glenn Fawcett points out, the bandwidth is not even close. The T2K has a 20GB/s backplane whereas the E6500 is only 9.6GB/s. For OLTP, the T2000 would most likely beat a fully loaded Starfire UE10K which is backplane limited to 12.8GB/s. And you might even have enough HVAC for it!
Hi Kevin,
First off – great blog! I always find your posts very interesting. I was interested to note your comments about the T2K as we are about to investigate whether to dump our Redhat Linux HP servers for a couple of T2Ks or even T1Ks.
I have heard though that the CoolThread processors are not always great at supporting databases because they only have a single floating point processor? Would you see this as a problem in either a OLTP or DSS environment that don’t have any requirement for calculations that may involve floating points?
We are going to get a T2K and maybe a V440/490 or similar to do some benchmarking, but was interested if you had an informed view?
Oh yeah, it will be sitting on a NetApp SAN 🙂
Best regards,
stuart
With respect to the SAN-madness saga; did the vendor suggest going with RAID 5 configuration?
No, I don’t think so. It seems that was their internal policy perhaps.
“did the vendor suggest going with RAID 5 configuration?”
is RAID 5 (RAID 50) bad for production? even with modern SAN vendors (like 3PAR) claiming that they have a faster RAID 5 which overcomes the inherent write penalty of RAID 5 and since their disks fail rarely they say that you can save lot of money by deploying on RAID 5 instead of RAID 10.
Is this just marketing or there is some technical merit to their claim ?
Hi Amit,
If the array cache is 100% effective, then RAID 5 is not a factor. If array cache is less than 100% effective, then RAID 5 becomes a factor. The less effective array cache is, the more pain RAID 5 causes.
RAID 5 is really only a train wreck for modify intesive RDBMs. A read-mostly database can often do well with RAID 5. This is teh general principle.
RAID 5 is generally fine for unstructured data which I have already blogged as being the new important type of data. RDBMs data will continue to be most less critical going forward.
A T2000 should make a fine upgrade from a 16-way E6500, provided no improvement
in response time is needed, offering about 2X the throughput at about the same
response time. If the E6500 is currently CPU pegged with a sustained runq
population (vmstat 1st column), response times could improve significantly
due to the extra throughput available.
Well, actually E6500 is not 9.6GB/s as claimed (I wish it would be). When clocked at 84MHz (vs. 100MHz then it would be 3.2GB/s) it only yields 2.68GB/s. I had E6500 with 28 CPUs (400MHz USIIs) and in that configuration, it only could run at 84MHz. As far as, T2000 replacing E6500, from experience (Oracle Applications 11i as a benchmark), T2000 can blow away E6500 with a big margin to spare. At the time of testing, I had T2000 with 1GHz UST1 (8 cores) and 16GB of RAM, and it outperformed 28 processor E6500 by 35%. With 1.4GHz version (which became available recently) and 64GB of RAM, I would not doubt it could be as much as 50%. The key to running anything on T2000 is a balanced architecture with decent IO subsystem (in my case older FC4700 with 40 drives was used) and application that ‘knows’ what parallelism is (e.g. Oracle RDBMS). It is also true that T2000 requires 2 CPU licenses (8 cores x 0.25) according to Oracle global pricing list. Thus, do not believe FUD and just do serious benchmarking before jumping to any conclusions.