Well, I’ve been in Oracle Server Technologies for a whopping week and in that short time I’m reassured of two constants:
- Blogging takes time (that I don’t have)
- Focusing on a single product, and more specifically a feature within a single product suite, makes a guy very pin-point minded
But, alas, I still need to stay well-rounded. So, thanks to a friend (and VP at a company I once worked for) for alerting me to the fact that Michael Stonebraker and others are blogging at The Database Column.
Yes, I’ll be reading that one. I don’t know if I’d recommend it to my readers though. That may sound odd, but most of my readers are practitioners of real, existing technology filling a production purpose. I expect the stuff on that blog to be laced with theory. Now, having said that, I have to remind myself that I believe well over 90% of my blog content is not exactly what one would call “ready to use” information.
All that aside, I just thought I’d poke my head up from the hole I’m in and make a quick blog entry. OK, there, I did…now it is back into the dungeon…but wait.
First Installment
Just because I said I was going to read it doesn’t imply I’m going to take it as some gulp from the fountain of omniscience.
For instance, in Michael Stonebraker’s post about row versus column orientation, he states:
[…] Vertica can be set-up and data loaded, typically in one day. The major vendors require weeks. Hence, the “out of box” experience is much friendlier. Also, Vertica beats all row stores on the planet – typically by a factor of 50. This statement is true for software only row stores as well as row stores with specialized hardware (e.g. Netezza, Teradata, Datallegro). The only engines that come closer are other column stores, which Vertica typically beats by around a factor of 10.
In my opinion that looks cut and pasted from a Vertica data sheet-but I don’t know. One thing I do know is that it seems unwise to say things like “all” and “typically by a factor of” in the same sentence. And this bit about “set-up and loaded” taking “weeks” for the “major vendors” is just plain goofy.
That’s my opinion. No, hold it, that’s my experience. I’ve loaded Oracle databases sized in the tens of terabytes that certainly didn’t take weeks!
Michael Stonebraker is a self-made “expert” that has spent all his productive life pretending that the world girates around his belly button and whatever product he’s pregnant with at any given instant.
He did it with Ingres, he’s done it again and again since then and this is nothing more than another iteration of his hot-air theories.
Witness the conflicting claims you well pointed out. Witness his claim that db2, sql server and Oracle have roots in Ingres: nothing could be more wrong than that one, for Chrissakes!
If anything, Ingres got shafted in the market because it refused to acknowledge that SQL was the language standard to follow, not Quel.
And because under Michael’s influence, it concentrated on useless bells and whistles instead of solid, dependable code.
Kinda like what is happening to Oracle now.
Ooops, that last one might not pass the censorship…. 😉
Noons,
Eek! Cringe…
It certainly seems to be doing the rounds this one.
As I commented on the place I first saw this and again on Mark Rittman’s blog, It read like a sales pitch for Vertica…and then I realised that Stonebraker worked for/owns them…go figure.
I agree entirely with you picking up on the data loading comment…as always, it comes down to the details…what are you loading, how big is it, how are you loading it…where’s the full disclosure report purleeze! Lets see what kind of oranges you are comparing with which variety of apple.
There may be circumstances where it’s more effective to choose a column based approach over row based ones, but hot air like that doesn’t help their case one bit. I would like to know more about the technicalities of how these column based approaches work – not just the marketing stuff but the real under the covers stuff – that way, it would be possible to understand the architectural differences, do some benchmarking and then understand where, if anywhere, they offer performance improvements…and also at what cost, if any, in terms of management, availability, scalability etc…sounds like a competitive analysis paper that Oracle should write…now, who do we know that works in Oracle I wonder?
😉
Kevin,
your last sentence “I’ve loaded Oracle databases sized in the tens of databases that certainly didn’t take weeks!” looks a bit strange.
Bence
Kevin,
Nice to know you’re still around.
Thanks for re-issueing the pointer to Stonebreaker and column-oriented databases. Still curious to see if it will finally catch on.
LOL @ Noons,
That was, Kinda like, well put.
Bence,
Type, thanks, fixed.
Jeff Wrote:
“It read like a sales pitch for Vertica…and then I realised that Stonebraker worked for/owns them…go figure.”
Yep, I blogged that back in February:
https://kevinclosson.wordpress.com/2007/02/15/database-systems-pioneer-starts-database-company/
A simple look at the TPC-H results shows that you can load at about 1.5TB/hour with Oracle. This is certainly not weeks.
Should we gently point out that relational theory says nothing about storage mechanisms; so declaring relational databases dead because they “store data in rows” is a strange confusion of concept and implementation.
Columnar storage with compression: think “single column bitmap index on every single column in the table (and then forget about the table)”. That tells you where every implementation problem with the approach is going to be.
It’s only a storage method – you put data in in one order, you want to get it out in another. You pay the price somewhere for re-arranging the data.
Regards
Jonathan Lewis