Disk Drives: They’re Not as Slow as You Think! Got Junk Science?

I was just taking a look at Curt Monash’s TDWI slide set entitled How to Select an Analytic DBMS when I got to slide 5 and noticed something peculiar. Consider the following quote:

Transistors/chip:  >100,000 since 1971

Disk density: >100,000,000 since 1956

Disk speed: 12.5 since 1956

Disk Speed == Rotational Speed?
The slide was offering a comparison of “disk speed” from 1956 and CPU transistor count from 1971 to the present. I accept the notion that processors have outpaced disk capabilities in that time period-no doubt! However, I think there is too much emphasis placed on disk rotational speed and not enough emphasis on the 100 million-fold increase in density. The topic at hand is DW/BI and I don’t think as much attention should be given to rotational delay. I’m not trying to read into Curt’s message here because I wasn’t in the presentation, but it sparks food for thought. Are disks really that slow?

Are Disks Really That Slow?
Instead of comparing modern drives to the prehistoric “winchester” drive of 1956, I think a better comparison would be to the ST-506 which is the father of modern disks. The ST506 of 1984 would have found itself paired to an Intel 80286 in the PC of the day. Comparing transistor count from the 80286 to a “Harpertown” Xeon yields an increase of 3280-fold and a clock speed improvement of 212-fold. The ST506 (circa 1984) had a throughput capability of 625KB/s). Modern 450GB SAS drives can scan at 150MB/s-an improvement of 245-fold. When considered in these terms, the hard drive throughput and CPU clock speed have seen a surprisingly similar increase in capability. Of course Intel is cramming 3280x more transistors in a processor these days, but read on.

The point I’m trying to make is that disks haven’t lagged as far behind CPU as I feel is sometimes portrayed. In fact, I think the refrigerator-cabinet array manufacturers disingenuously draw attention to things like rotational delay in order to detract from the real bottleneck, which is the flow of data from the platters through the storage processors to the host. This bottleneck is built into modern storage arrays and felt all the way through the host bus adaptors. Let’s not punish ourselves by mentioning the plumbing complexities of storage networking models like Fibre Channel.

Focus on Flow of Data, Not Spinning Speed.
Oracle Exadata Storage Server, in the HP Oracle Database Machine offering, configures 1.05 processor cores per hard drive (176:168).  Even if I clump Flash SSD into the mix (about 60% increase in scan throughput over round, brown spinning disks) it doesn’t really change that much (i.e., not orders of magnitude).

Junk Science? Maybe.
So, am I just throwing out the 3280x increase in transistor count gains I mentioned? No, but I think when we compare the richness of processing that occurs on data coming off of disk in today’s world (e.g., DW/BI) compared to the 80286->ST506 days (e.g., VisiCalc, a 26KB executable), the transistor count gets factored out. So we are left with 245-fold disk performance gains and 212-fold cpu clock gains. So, is it a total coincidence that a good ratio of DW/BI cpu to disk is about 1:1? Maybe not. Maybe this is all just junk science. If so, we should all continue connecting as many disks to the back of our conventional storage arrays as they will support.

Summary
Stop bottlenecking your disk drives. Then, and only then, you’ll be able to see just how fast they are and whether you have a reasonable ratio of CPU to disk for your DW/BI workload.

10 Responses to “Disk Drives: They’re Not as Slow as You Think! Got Junk Science?”


  1. 1 Curt Monash February 26, 2009 at 10:18 pm

    Kevin,

    The point of that slide is this — the average seek time of a disk can never be less than 1/2 the time it takes to make one rotation. Hence doing lots of random reads is guaranteed to slow you down.

    CAM

    • 2 kevinclosson February 27, 2009 at 12:06 am

      Curt,

      I get it. I wasn’t taking pot shots at the slide either. It just got me thinking in terms of sequential throughput (and how far that has come since the days of the ST506+80286) because that is all I care about these days.

  2. 3 marc farley February 27, 2009 at 2:16 am

    You wrote: “I think the refrigerator-cabinet array manufacturers disingenuously draw attention to things like rotational delay in order to detract from the real bottleneck, which is the flow of data from the platters through the storage processors to the host”

    I’m not sure what you are trying to say here, but it seems to be a bit out of context if not out of whack. Curt’s comment nailed it – rotational latencies do matter – as do seek times for most refrigerator array implementations. That’s because most arrays are designed to support mixed workloads and that means there are assumptions about the imapct of mechanical latencies. Calling this disingenuous is a bit of a smear, but that’s blogging.

    That said, there are different array architectures – some more effective than others in dealing with the bottlenecks that impact certain applications (eg. DW/BI). For instance, a provisioning approach that wide stripes data across all drives and constructs volumes horizontally (as opposed to provisioning large partitions or whole disks) is more capable of dealing with long sequential reads than disk-based provisioning.

    • 4 kevinclosson February 27, 2009 at 3:44 am

      Marc,

      Don’t take it personal. If your array can provide me 100% scan bandwidth (conservatively 80MB/s) from, say, 168 15K RPM disks
      then it doesn’t seem like a bottleneck.

      Yes, it is blogging. That’s why I’m letting your comment through.

  3. 5 marc farley February 27, 2009 at 5:53 am

    Kevin, I didn’t take it personal. Was there something I wrote that made you think so?

    • 6 kevinclosson February 27, 2009 at 5:32 pm

      Marc,

      You used the word smear to describe my opinion about how unfit monolithic storage arrays are for DW/BI. I didn’t smear anything. Ask any of the accounts you are trying to sell into whether the biggest names in the storage array business don’t in fact try to tell Oracle DBAs that physical drive characteristics can be mitigated with fat caches. They have been saying this for ages. Your outfit might not…that’s great.

      It’s simple in my mind and sort of like those red lines they have near carnival rides so kids know if they are tall enough to ride. That “red line” for DB/BI workloads is: the array has to service my concurrent complex queries with at least 80MB/s per disk drive to the host and it has to deliver
      14GB/s. And, the point of this whole blog thread is I don’t care about rotational delay so choose 7.2, 10 or 15K RPM. As long as they spin off the minimum of 80MB/s there are certain drive characteristics that I insist can be ignored when DB/BI workloads are on the table.

      Now, once the data is flying into the host unrestricted by the I/O plumbing, you have to leave room in the budget for 1:1 CPU to disk processing power which sounds like a 21-node RAC cluster of 2s4c servers with 35 4GFC HBAs (400*35 == 14GB/s). But then you have to configure balanced plumbing for all hosts so that is more like 42 HBAs. That means you need 2 FC switches that are at least 82-ports each in case one fails. Why 82? Well, we need unrestricted flow through both sides of the switch. Once all that is hooked up, the configuration will look a wee bit like an HP Oracle Database Machine on disk and cpu count, but since the data protocol is fat (FCP) and the wires thin (4GFC) such a configuration will not process that data as efficiently.

  4. 7 Krishna Manoharan February 27, 2009 at 8:52 pm

    Hi Kevin,

    “the array has to service my concurrent complex queries with at least 80MB/s per disk drive to the host”

    When you say 80MB/sec of throughput – are you referring to a work load of large random reads/writes or large sequential reads/writes?

    Thanks
    Krishna

    • 8 kevinclosson February 27, 2009 at 10:19 pm

      Krishna,

      That is a good question. The types of drives I’m talking about can produce data at 150MB/s with pure sequential outer-track scans. Since I focues on concurrent queries there has to be tolerance for some amount of randomization. However, the access patterns for concurrent DW/BI is such that each head movement is followed by a large (4MB+) read thus the seemingly low-ball 80MB/s. In the end, however, 80MBPS/drive is a fast path to head or loop or both saturation on all conventional storage arrays.

  5. 9 accidentalSQL February 27, 2009 at 10:56 pm

    ” However, the access patterns for concurrent DW/BI is such that each head movement is followed by a large (4MB+) read …”

    Does that assume that much of the data was bulk loaded in the first place and not trickle fed, so large chunks of it are placed contiguously on the disk? Does it make any assumptions about pre-allocating extents in the tablespace holding the data?

    I was just thinking that striping scatters data all over the place on the disk and the head would need to jump around to the various pieces of the datafile. I didn’t realize that typical contiguous reads were that big for concurrent DW/BI queries.

    • 10 kevinclosson February 27, 2009 at 11:50 pm

      ” However, the access patterns for concurrent DW/BI is such that each head movement is followed by a large (4MB+) read …”

      Oops! Sorry. That was a typo. What I meant to say (and this is specific to ASM) is that once the head is positioned there are 4 adjacent, async 1MB reads. Honestly though, with modern drives and NCQ, track buffering and all those goodies it is pretty much the same as positioning and scraping 4MB with one I/O. And that 4MB is a reflection of the ASM allocation unit. That is, after all, the stripe width for ASM so the next ASM allocation unit will be on another drive somewhere. And, ASM supports other AU sizes. Hitting the drives this way with concurrent queries (often causes fore and aft seeks) lends me to count on about 80MB/s. As with everything technology wise it varies, but these are the principles involved.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 744 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: