SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!

I was catching up on my mojo reading when I caught a little snippet I’d like to blog about. Oh, by the way, have I mentioned recently that StorageMojo is one of my favorite blogs?

In Robin Harris’ latest installment about ZFS action at Apple, he let out a glimpse of one of his other apparent morbid curiosities—flash. Just joking, I don’t think ZFS on Mac or flash technology are morbid, it just sounded catchy. Anyway, he says:

I’ve been delving deep into flash disks. Can you say “weird”? My take now is that flash drives are to disk drives what quantum mechanics is to Newtonian physics. I’m planning to have something out next week.

I look forward to what he has to say. I too have a great interest in flash.

Now, folks, just because we are Oracle-types and Jim Grey was/is a Microsoft researcher, you cannot overlook what sorts of things Jim was/is interesting in. Jim’s work has had a huge impact on technology over the years and it turns out that Jim took/takes an interest in flash technology with servers in mind. Just the abstract of that paper makes it a natural must-read for Oracle performance minded individuals. Why? Because it states (with emphasis added by me):

Executive summary: Future flash-based disks could provide breakthroughs in IOps, power, reliability, and volumetric capacity when compared to conventional disks.

Yes, IOps! Nothing else really matter where Oracle database is concerned. How can I say that? Folks, round-brown spinning things do sequential I/O just fine—naturally. What they don’t do is random I/O. To make it worse, most SAN array controllers (you know, that late 1990’s technology) pile on overhead that further choke off random I/O performance. Combine all that with the standard IT blunder of allocating space for Oracle on a pure capacity basis and you get the classic OakTable Network response:

Attention DBAs, it’s time for some déjà vu. I’ll state with belligerent repetition, redundantly, over and over, monotonously reiterating this one very important recurrent bit of advice: Do everything you can to get spindles from your storage group—not just capacity.

Flash
Yes that’s right, it wont be long (in relative terms) until you see flash memory storage fit for Oracle databases. The aspect of this likely future trend that I can’t predict, however, is what impact such technology would have on the entrenched SAN array providers. Will it make it more difficult to keep the margins at the levels they demand, or will flash be the final straw that commoditizes enterprise storage? Then again, and Jim Grey points out in that paper, flash density isn’t even being driven by the PC—and most certainly not enterprise storage—ecosystem. The density is being driven by consumer and mobile applications. Hey, I want my MTV. Um, like all of it, crammed into my credit-card sized mpeg player too.

When?
When it gets cheaper and higher capacity of course. Well, its not exactly that simple. I went spelunking for that Samsung 1.8” 32GB SSD and found two providers with street price of roughly USD $700.00 for 32GB here and here. In fact, upon further investigation, Ritek may soon offer a 32GB device at some $8 per GB. But let’s stick with current product for the moment. At $22 per GB, we’re not exactly talking SATA which runs more on the order of $.35 per GB. But then we are talking enterprise applications here, so a better comparison would be to Fibre drives which go for about $3-$4 per GB.

Now that is interesting since Jim Grey pointed out that in-spite of some industry predictions setting the stage for NAND to double every year, NAND had in fact gained 16 fold in 4 years–off by year. If that pace continues, could we really expect 512GB 1.8″ SSD devices in the next 4 years? And would the price stay relatively constant yielding a cost of something like $1.35 per GB? Remember, even the current state of the art (e.g., the Samsung 1.8″ 32GB SSD) delivers on the order of 130,000 random single-sector IOps–that’s approximately 7usec latency for a random I/O. At least that is what Samsung’s literature claims. Jim’s paper, on the other hand reports grim current art performance when measured with DskSpd.exe:

The story for random IOs is more complex – and disappointing. For the typical 4-deep 8KB random request, read performance is a spectacular 2,800 requests/second but write performance is a disappointing 27 requests/second.

The technology is young and technically superior, but there is work to do in getting the most out of NSSD as the paper reports. Jim suspects that short term quick fixes could be made to bring the random I/O performance for 8KB transfers on today’s NSSD technology up to about 1,700 IOps split evenly between read and write. Consider, however, that real world applications seldom exhibit a read:write ratio of 50:50. Jim generalized on the TPC-C workload as a case in point. It seems with “some re-engineering” (Jim’s words) even today’s SSD would be a great replacement for hard drives for typical Oracle OLTP workloads since you’ll see more 70:30 read:write ratios in the real world. And what about sequential writes? Well, there again, even today’s technology can handle some 35MB/s of sequential writes so direct path writes (e.g., sort spills) and redo log writes would be well taken care of. But alas, the $$/GB is still off. Time will fix that problem and when it does, NSSD will be a great fit for databases.

Don’t think for a moment Oracle Corporation is going to pass up on enabling customers to exploit that sort of performance–with or without the major SAN vendors.

But flash burns out, right? Well, yes and no. The thing that matters is how long the device lasts-the sum of its parts. MTBF numbers are crude, but Samsung sticks a 1,000,000hr MTBF on this little jewel-how cool.
Well, I’ve got the cart well ahead of the horse here for sure because it is still too expensive, but put it on the back burner, because we aren’t using Betamax now and I expect we’ll be using fewer round-brown spinning things in the span of our careers.

13 Responses to “SAN Admins: Please Give Me As Much Capacity From As Few Spindles As Possible!”

Feed for this Entry Trackback Address

1 Alex Gorbachev June 12, 2007 at 5:43 am

Hm… It’s a surprise that you still compare prices using almost abstract for Oracle OLTP databases $$/GB. What about $$/IOps?

Reply
2 Connor June 12, 2007 at 1:35 pm

http://www.macnn.com/articles/07/06/04/sandisk.ssd.64gb/

We’re getting there…..

Reply
3 kevinclosson June 12, 2007 at 2:54 pm

“What about $$/IOps?”

Alex,

Yes, leave it to you to pick that out! 🙂 I’ll have to think about that a bit.

Reply
4 Alex Gorbachev June 13, 2007 at 2:57 am

Connor, I actually tried to get some of those drives for a test drive with Oracle database and the vendor representative, initially optimistic, dimmed on the other end of the phone line.

Reply
5 David Aldridge June 13, 2007 at 11:12 am

Yes, disks are good for sequential i/o but again the technology conspires against us with schedulers that won’t give the application more than, say 1Mb of sequential i/o before wandering off to satisfy a different request and burdening us with head movement latency. Maybe I’ll go talk to the SAN admins about that … see what their box of tricks has to offer in helping with that problem.

Reply
6 kevinclosson June 13, 2007 at 2:45 pm

David,

Which Operating System? Sounds like Linux. If so, what scheduler are you using?

Reply
7 Paul June 15, 2007 at 10:19 pm

Hi Kevin,

For the performance savvy DBAs out there who request spindles over just capacity, how much capacity ends up not being used?

Now, if a disk existed that could do LOTS of random read/write IOPS (>25,000) what is the minimum capacity point where it could replace those spindles that are probably being short-stroked and thus wasting the majority of their capacity for the added performance of extra spindles.

Reply
8 kevinclosson June 16, 2007 at 1:07 am

Paul,
I wouldn’t use more than the first 60% of each drive for the database…carve out the other portion of each drive for use as backup space or such. Or, try to get those 18GB drives back 🙂

Reply
9 Alex Gorbachev June 16, 2007 at 3:30 am

carve out the other portion of each drive for use as backup space or such
Well, a performance savvy DBA (or more stability savvy) wouldn’t do that.
Imagine backup or some IO intensive task kicks in one database and for “no apparent reason” another database suffers. Seen that quite a few times and it usually takes quite a bit of time to find out.

And I also wouldn’t throw 60% as rule of thumb. Every requirement is different and usage anywhere between 5 and 100% is appropriate. Mostly it boils down to IOPS capacity. Right?

Reply
10 kevinclosson June 16, 2007 at 5:29 am

Alex,

Like I said in a previous comment thread, you and I agree to disagree more than we agree…and that is fine.

Paul asked about wasted capacity and I suggested using it for backups. I didn’t suggest using it for backups so that raging, bumbling baboons could fire off backups during peak production and slam throughput…wasn’t my idea….

As for the 60% generalization of what portions of the drive to use, uh, the geometry lends itself to using that as a safe start point. Personally I’d prefer to only use the very center of the drive and at that rate only as much as the track buffer can cover, but then I’m a prima donna I suppose.

Reply
11 Paul June 18, 2007 at 3:30 pm

I think IOPS therefore spindles are definitely a critical factor in database performance so I wouldn’t want to jeopardize that performance by putting anything else on those spindles. Are most DBAs OK with discarding that capacity to ensure max performance? What percentage of Oracle installs do you guys figure utilize short-stroking?

So with reference to the SSDs mentioned in the post, there should be a point where it is more cost effective to use an SSD with smaller capacity and lots of IOPS than lots of spindles whose capacity is going to waste. The usual suspects for hogging I/O are the logs, TMP tablespace, and indexes correct? So, if just those files were put on SSD, would that dramatically lower the capacity required? This gets back to the rule of thumb, is there a typical capacity that would be large enough to hold the “usual suspects”?

Sorry for all the questions, I’d be happy to take this offline if needed.

Thanks.

Reply
12 Alex Gorbachev June 18, 2007 at 10:10 pm

Are most DBAs OK with discarding that capacity to ensure max performance?

Well, I bet they are if storage admins and, especially, managers let them do so.

When I tested solid-state storage, I found that redo logs were excellent candidate for it. TEMP – I couldn’t see much benefits compare to high end EMC box – temp IO was in quite large chunks and EMC box was delivering good results. UNDO – couldn’t see much benefits – changes are cached and DBWR is quite effective with delayed flushing to disk.

In some cases, datafiles that were very active on writes (including undo) played nice on SSD and gave a lot of relief to EMC cache – stopped saturating it with writes.

Indexes – I found increasing SGA is much more effective and usually cheaper.

In the end – I think it depends a lot on your application.

Reply

1 Databases are the Contents of Storage. Future Oracle DBAs Can Administer More. Why Would They Want To? « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on July 14, 2009 at 3:27 pm

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage