I Know Nothing About Data Warehouse Appliances and Now, So Won’t You – Part II. DATAllegro Supercharges Fibre Channel Performance.

BLOG CORRECTION: The next to the last paragragh has been edited to offer more clarity on which components impose limits on I/O transfer sizes.

I’m going to tell you something nobody else knows. You’ve heard it here first. Ready? Here’s the deal, no more than 800 MB/s can pass through two 4 Gb Fibre Channel HBAs into any host system memory. It’s that simple. If you want more than 800 MB/s available for your CPUs, you have to either add more 4 Gb HBAs or go with 8 Gb Fibre, or drop FCP all together and go with something that can deliver at that level, but this isn’t a plug for the Manly Man Series on Fibre Channel Technology, I’m blogging about Data Warehouse Appliance technology, specifically DATAllegro.

Exit Conventional Wisdom, and Electronics!

Here is a graphic of the V3 DATAllegro building block. It’s two Dell 2950s (a.k.a., Compute Nodes) each plumbed with two 4 Gb Fibre Channel HBAs to a small EMC CX3 array. According to this piece on DATAllegro’s website, they are the only people on the planet to push more than is electronically possible through two 4 Gb HBAs, I quote:

Data for each compute node is partitioned into six files on dedicated disks with a shared storage node. Multi-core allows each of these six partitions to be read in parallel. Data is streamed off these partitions using DATAllegro Direct Data StreamingTM (DDS) technology that maximizes sequential reads from each disk in the array. DDS ensures the appliance architecture is not I/O bound and therefore pegged by the rate of improvement of storage technology. As a result, read rates of over 1.2 GBps per compute node are possible.

That’s right. I wasn’t going to point out that each compute node is fed by six disks, because if I did I’d also have to tell you they are 7200 RPM SATA drives, mirrored. Supposedly we are to believe that the pixy dust known as Direct Data StreamingTM can, uh, pull data at what rate per spindle? Yes, that’s right, they say 200 MB/s per drive! Folks, I’ve got 7200 LFF SATA drives all over the place and you can’t get more than 80 MB/s per drive from these things (and that is actually fairly tough to do). Even EMC’s own specification sheet for the CX3 spells out the limit as 31-64 MB/s. I’ll attest that if your code stays out on the outer, say, 10% of the drive you can stream as much as 75-80 MB/s from these things. So with the DATAllegro system, and using my best numbers (not EMC’s published numbers), you’d only expect to get some 480 MB/s from 6 7200 RPM SATA drives (6×80). Wow, that Direct Data StreamingTM technology must be really cool, albeit totally cloak and dagger. Let’s not stop there.

What about this 1.2 GB/s per compute node claim? How do you pump that through 2 x 4 Gb FC HBAs? You don’t. Not even DATAllegro with all those Cool SoundingTM technologies. What’s really being said in that DATAllegro overview piece is that their effective ingestion rate is some 1.2 GB/s, I quote:

Compression expands throughput: Within each node, two of the multi-core processors are reserved for software compression. This increases I/O throughput from 800MBps from the shared storage node to over 1.2 GBps for each compute node.

They could just come out and say it, but they expect you to believe in magic. I’ll quote Stuart Frost (CEO, DATAllegro) on more of this magic, secret sauce:

Another very important aspect of performance is ensuring sequential reads under a complex workload. Traditional databases do not do a good job in this area – even though some of the management tools might tell you that they are! What we typically see is that the combination of RAID arrays and intervening storage infrastructure conspires to break even large reads by the database into very small reads against each disk.

Traditional databases are only victims of what storage arrays do with the I/O requests by way of slicing and dicing. Further, the OS and FC HBA impose limits for the size of large I/O requests. It is not a characteristic of a traditional database system. Even a Totally Rad Non-Traditional RDBMSTM like the one DATAllegro embeds in their compute nodes (spoiler: it’s Ingres, nothing new) will fall prey to what the array controller does with large I/O requests. But more to the point, FC HBAs and the Linux (CentOS for DATAllegro) block I/O layer impose limits on the size of transfers and that is generally 1MB.

If I’m wrong, I expect DATAllegro to educate us, with proof, not more implied Awesomely Fabulicious CoolFlips Technology TM. In the end, however, no matter whether they managed to code custom FC HBA drivers and somehow obtained custom firmware for the CX3 to achieve larger transfer sizes than anyone else or not, I’ll bet dollars to donuts they can’t push more than 800 MB/s through dual 4 Gb FCP HBAs, and certainly not from 6 7200 RPM SATA drives.

13 Responses to “I Know Nothing About Data Warehouse Appliances and Now, So Won’t You – Part II. DATAllegro Supercharges Fibre Channel Performance.”


  1. 1 David Aldridge July 7, 2008 at 8:45 pm

    Hmmm, so this means that if Oracle can achieve a compression rate of 5:1 on data warehouse data, its “effective ingestion rate” is … wow!

    Cool.

  2. 2 Noons July 9, 2008 at 6:30 am

    “Awesomely Fabulicious CoolFlips Technology TM”
    LOL! Man, haven’t laughed like this for a looooong time!

    Please, Kevin: let me use this one on the next EMC meeting.
    The Clarion guy is trying to convince us sharing a CX between my DW nodes and all other file servers, print servers and sql servers around the place makes a lot of sense…

  3. 3 kevinclosson July 9, 2008 at 4:14 pm

    Noons,

    You’ve always got carte blanche around here…go for it. Which CX to share by the way?

  4. 4 billy bathgates July 9, 2008 at 8:18 pm

    That is a somewhat low upper limit to stripe chunk size on that array, if it’s really 256 sectors (128KB, I assume?), but I wonder how much it ‘s really hurting physical drive performance. I think it’s not really that much, after about 32-64 sectors the sequential transfer rate of most drives starts leveling off a lot. Correct me if I’m wrong. If all segments of a transfer are initiated in parallel this should generally be a win, until the number of outstanding host I/O’s gets high enough to cause all the seeking on all those drives to be a problem.

  5. 5 kevinclosson July 9, 2008 at 10:04 pm

    Actually, I just re-read my post and realize I made a mistake regarding the CX3. Only 4-way mirrors and above invoke the stripe size. Nonetheless, DATAllegro uses CentOS and the odds they have manipulated the HBA driver (e.g., Qlogic, Emulex) to push through larger than 1MB I/Os is ever so slim. Further, I have found no evidence that a CX3 supports a transfer larger than 1MB (singleton nor a striped transfer). As for max stripe size…

    They (EMC CX3) actually use the uncommon 520 byte sectors in the CX3. There is nothing strange about bounding stripe width (or RAID op units generally speaking) to 256 sectors..there are many, many that are that way…some smaller. I know that 256 is a very common Engenio limit as well as the HP StorageWorks Arrays I’ve had experience with. Chapparrell are that way. LSI RAC (Raid on a chip) e.g., HP SmartArray P400 is that way. It is only an issue when you read 1MB (common, if not all, Linux 2.6.18 kernel max I/O size on FC, SAS, etc disk).

    The point is that with what ingredients I see (CentOS, FC HBA, CX3), DATAllegro is hitting physical disks with maximum 1MB transfers…until they point out otherwise (with proof).

  6. 6 Noons July 10, 2008 at 3:13 am

    CX340. Apparently they are Fabulicious and can roast and brew 105 different varieties of coffee while servicing a multi-TB dw…

  7. 7 Greg Rahn July 18, 2008 at 6:49 pm

    Digging deeper into this I believe that DATAllegro has never actually observed the numbers they claim (at least not all of them). Not only that, they claim different numbers for what appears to be the exact same metric.

    Lets first look at the claim on this page: “[3:1 compression] increases I/O throughput from 800MBps from the shared storage node to over 1.2GBps for each compute node.” The wording here seems a bit misleading to me: Why are they comparing “from the [single CX3-10] storage node” to “each [of two] compute node[s]“? Since there are two compute nodes per storage node, with 3:1 compression the math at least adds up: 800MBps x 3:1 compression = 2.4GBps = (1.2GBps * 2 nodes). But this claim (800MBps from the storage) is a farce (more on that later).

    Now lets look at DATAllegro and Teradata: A Node-to-Node Comparison. Here the claim for “max I/O rate per node” is 900MBps.

    Note that the two I/O throughput claims do not even match up!!! The first claims 1.2GBps per node and the second claims 900MBps per node. So which is it?!?!? I would believe the latter (900MBps logical throughput) to actually be physically possible (the first is not!), because a EMC CX3-10 storage processor can only output about 600MBps total (physical) regardless of number of drives or workload. This is obviously much less than the throughput capacity of the 4 x 4Gbps FCP. So if one assumes the 3:1 compression ratio, then the storage has a capable throughput of 1800MBps of logical I/O (3 times the physical of 600MBps). Since there are two nodes sharing this 1800MBps, they each would be capable of 900MBps. This equates to about 50MBps per HDD (600MBps / 12 HDDs) and thus does not to exceed the laws of physics or the spec sheet.

    Obviously if the data compression ratio is less than 3:1, this rate will drop and approach the 600MBps physical max I/O throughput for the CX3-10.

  8. 8 kevinclosson July 18, 2008 at 7:21 pm

    Greg,

    OK, I didn’t want to touch the fact that the CX3-10 SP array head is a simple Xeon box, but it is. Nothing magical. It is quite unlikely that there would be enough bandwidth in that thing to shuffle back-end to front-end sufficiently to saturate the 4x4Gb FCP out-bound plumping anyway. Nonetheless, I do hold fast the maximum theoretical ingest rate to a DATAllegro v3 node is 800MB/s. That is a fact (2x4Gb FCP HBA).

    So, thanks for playing devil’s advocate, Greg. I hadn’t seen that 900MB/s figure before but it is as absurd as the 1.2GB figure because DATAllegro cites it as a bandwidth number. They refer to the 900MB/s as, and I quote, “Max I/O Rate per Node.” That is totally dishonest since their compute nodes are plumbed with 2x4Gb FCP HBAs.

    They need to get honest and call it what it is, “Effective I/O Rate per Node.” And, they need to do that soon because voodoo doesn’t stand up to much scrutiny around here.

    I know I’m sending decent traffic to DATAllegro’s site with this thread. So whoever is over there monitoring this ought to take note that we don’t cotton to such tomfoolery round these parts. Fix your verbiage!

  9. 9 Alex Gorbachev July 29, 2008 at 3:21 pm

    Oh this is funny… Following the trackback from here covering M$ acquisition of DATAllegro, we learn that it’s bad news for Oracle folks:

    …it’s bad news for Ingres, bad news for Oracle, bad news for IBM, bad news for Teradata and bad news for HP, all for obvious reasons.

  10. 10 kevinclosson July 29, 2008 at 6:59 pm

    Alex,

    Funny is not the word. I’d say pathetic is more fitting.

  11. 11 L8on August 6, 2008 at 12:49 pm

    Folks, please, let’s not cloud the marketing with facts!

    We all know that no one ( at least those who write the checks{$} ) checks the numbers. We technical people are supposed to just use the technology and if/when it doesn’t perform as advertised, then it’s obviously our incompetence that has configured it incorrectly. ;)

    This is how over 50% of the poorly performing systems I’ve inherited came to the shops I’ve worked with.

    Just my $0.02.

    Thanks for the due diligence and a dedication to real math.


  1. 1 Other early coverage of Microsoft/DATAllegro | DBMS2 -- DataBase Management System Services Trackback on July 24, 2008 at 7:20 pm
  2. 2 Database Customer Benchmarketing Reports | Structured Data Trackback on December 12, 2008 at 5:51 pm

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,050 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 2,050 other followers

%d bloggers like this: