I read Curt Monash’s report on the current state of affairs at Dataupia and it got me thinking. I agree with Curt on his position toward add-on or external accelerator-type technology. See, one of Dataupia’s value propositions was to accelerate I/O for Oracle Database External Tables. To the best of my knowledge it basically offered high bandwidth, cached flat file access.
About this time last year I produced a bit of a toungue-in-cheek post about Dataupia. A blog reader posted the following comment on that thread:
… This product is very, very real.
It works as an external table in Oracle so it’s transparent to all your BI tools. They do a lot of work with SQL that Oracle passes to make it usable.
You have to re-point your ETL loads at Dataupia directly but they should run with very little alteration.
Speed is 10x Oracle at these volumes (2Tb+).
As most folks know I was deep into HP Oracle Exadata Storage server performance work at that time and couldn’t really go toe-to-toe with any of the DW/BI appliance or accelerator folks. Oracle had not yet released Exadata. The idea of accelerating Oracle ten-fold is certainly no longer all that avant-garde given the proven acceleration Oracle Exadata Storage Server provides.
What I wanted to point out at the time is that accelerating the loading of an Oracle Data Warehouse is indeed important, but surely not sufficiently critical to warrant bringing in another vendor and working out all the plumbing. I had a suspicion then that the blog reader who posted that comment was not fully aware that the value proposition supposedly went beyond accelerating ETL to offering run-time access to the flat files they housed in their Satori server. Yes, running queries against External Tables just because they offer a lot of cache and a lot of I/O bandwidth. At least that is what I got from reading their datasheet.
Erroneously Accelerating Accelerates What?
The problem with that story is that query throughput from External Tables is very seldom an I/O issue. See, scanning External Tables requires conversion from ASCII flat-file text to Oracle data types on the fly. To that end, scanning External Tables is a CPU-intensive task. For instance, if you load data from an External Table into a data warehouse (internal, true) table and then compare scan throughput of both you’ll see that processor saturation will impede the External Table scan. Same-query comparisons commonly show 80% lower throughput accessing External Tables compared to internal tables and I’m not talking about an I/O-hobbled External Table comparison. That, of course, depends on the processor bandwidth available to such a test because the less processor bandwidth available, the more significant the skew towards the internal table. What I’m trying to say is that if you accelerate External Table I/O, say, 10x, you need as much as 10x more processor bandwidth to handle it. So, sure, if you take a totally I/O bound query and do something like this External Table acceleration technique, you will see significant performance increase. On the contrary, a host processor-bound situation will not benefit from this sort of accelerator. Architecture…it’s important.
Tacking on accelerators is just not a reasonable approach. I recall a lot of hoopla back in about 2005 or so about another one of these sorts of external accelerator offerings—Xprime. I don’t hear much about them any more, other than perhaps bits and pieces about intellectual property infringement claims against DATAllegro (Microsoft).
I’m no stranger to the external acceleration game, but I have generally steered clear of such approaches. I have always leaned toward a more native approach. Offer a better platform, not an external platform. About the same time Xprime was garnering quite a bit of interest, we at my former company, PolyServe, had been putting the final touches on product infrastructure that offered scale-out reporting using clustering technology. Of course the story had all the common tag words such as transparent, seamless, scalable, etc. Unlike usual, however, the claims were true. But, no matter. Nobody cared. I sure thought people would have clamored for up to 16-fold throughput increase for processor-intensive reporting jobs. Oh well…memories. It was an interesting project to work on though as this old paper I wrote suggests.
So What Does This Have To Do With Exadata?
It’s probably about high time people stop getting venture capital to “solve” a “problem” that Oracle Database supposedly has with data warehouse workloads.