Nearly Free or Not, GridSQL for EnterpriseDB is Simply Better Than Real Application Clusters. It is Shared-Nothing Architecture After All!

Published August 9, 2007 oracle 42 Comments

According to this businesswire.com piece, EnterpriseDB made quite the splash at LinuxWorld San Francisco 2007 by introducing a clustered version of EnterpriseDB called GridSQL for EnterpriseDB. It’s no surprise that a product based on open source would get honorary mention at LinuxWorld, but I don’t care about that.

What I’m blogging about is the fact that I’m already seeing blogs and press pieces that clump GridSQL in with Oracle Real Application Clusters (RAC). What’s the problem? There is a cluster involved and both products use a cluster so they must be birds of a feather.

Po-tay-toe / Po-tah-toe
Oracle Real Application Clusters is a shared-disk architecture. This GridSQL product is yet another shared-nothing implementation as is IBM’s DB2 UDB EEE (cousin of SP2). Mainframe DB2 is shared-everything of course. Also in the shared-nothing camp are: Teradata, Informix XPS, Microsoft SQL Server with Distributed Partitioned Views (Egad!), Sybase Navigation Server (there’s a blast from the past), Greenplum and the hardware niche guys like DATAllegro and Netezza. Ingres r3 takes a shared everything approach like Oracle RAC and mainframe DB2, but I’m not sure what those folks are up to these days.

Everyone Else Is Doing It
Simply put, the fastest way to get a clustered database out the door is to implement shared-nothing. Take your regular database engine, throw some replication and data shipping in there and poof-you have a shared nothing database. I think everyone got the memo back in 1986 when Michael Stonebraker put his foot down and said that shared nothing was the way to go-in spite of the fact that shared nothing architectures include the necessary evil of data shipping. That Stonebraker paper was written a long time ago and most of the data points in it are OLTP-minded. For instance, I’ll quote Stonebraker:

Consider the number of messages which an SN [shared nothing] system must incur in a typical high transaction processing environment. The example consists of a data base with N objects subject to a load consistin entirely of transactions containing exactly k commands, each affecting only one record. (For TP1 the value of k is 4). For any partitioning of the data base, these k commands remain single-site commands. Suppose that there exists a partitioning of the data base into non-overlapping collections of objects such that all transactions are locally sufficient [WONG83]. Such a data base problem will be termed delightful. Most data base applications are nearly delightful. For example, the TP1 in [ANON84] has 85% delightful transactions.

Using TP1 (a batch mode ATM transaction workload) as a case in point for the claim that most applications will get along nicely (thus the term delightful) in a shared-nothing database! Wow, that was then and this is now. Folks, today’s applications are built on large numbers of tables and complex joins. The reason shared-nothing is nothing like RAC is because instead of only shipping functions (or tasks) and lock messages to the clustered nodes, as is the case with RAC, shared-nothing requires the shipping of data. And as soon as you have a problem with too much data shipping you are required to reload and repartition your database to mitigate the problem. Try getting your partitioning right with an application that has 1,000 tables and 1,300 indexes. Ever chew on crushed glass?

OK, let me try it this way. Let’s say shared-nothing is in fact the best approach for data warehousing. Greenplum puts it this way :

Most of today’s general-purpose relational database management systems are designed for Online Transaction Processing (OLTP) applications. By default, business intelligence applications have inherited this less than optimal architecture. The reality is that BI workloads are fundamentally different from OLTP transaction workloads and therefore require a profoundly different architecture.

So in this view it is one or the other. DSS or OLTP, you choose.

So humor me for a moment. Let’s say for instance that a shared-nothing architecture product on the same hardware outperforms RAC by a flat 50% across all measurements. Nifty! Now let me ask, where is your OLTP? If some shared-nothing product does in fact out perform RAC for data warehousing, that benefit gets canceled out by the fact that the same shared-nothing architecture cannot do OLTP at all. At least not ERP. I won’t even touch the fact that most ERP apps are a bit of an amalgam of both OLTP and DSS-style accesses.

If anyone was going to have phenomenal success with shared nothing it would be IBM with DB2 UDB EEE given its long heritage (SP2) of shared nothing. What do the top 10 audited clustered TPC-H results tell us?

100GB Scale. Oracle chooses not to put much effort in this scale of the benchmark. Neither does IBM really since their last entry there was a 2003 result.
300GB Scale. Of the top 10, Oracle holds spots 1-6. In fact, Oracle’s RAC result in the 6^th position has stood since September 2005 so IBM’s shared-nothing product has had ample time to trump at least that result. Instead, the most recent DB2 result at this scale was in 2004 which oddly didn’t even beat their own prior 2003 result in position 7 of the top ten! How bizarre-especially for a fundamentally better architecture for DSS-style workloads! Oracle has held the top spot in this scale since December 2006 so there is clearly no leap-frogging going on. The 300GB scale is a clear lock-down on the part of RAC.
1000GB Scale. Here again RAC has held the top position for many months-10 months to be exact. That is plenty of time for any product of superior architecture to advance in the list. Oracle holds the number 3 slot at this scale as well.
3000GB Scale. Oracle RAC took the top slot at this scale just 2 months ago with a result that more than doubles the IBM DB2 UDB number at only 15% higher cost!
10000GB Scale. Here IBM hold the top spot-out of only two entries. Two years elapsed from the time Oracle entered this scale of the benchmark and IBM took the lead.

Based upon points 2-4 in the list above it seems shared-nothing is not somehow inherently superior to shared-disk architecture even for the supposed preferred workload which is warehousing!

42 Responses to “Nearly Free or Not, GridSQL for EnterpriseDB is Simply Better Than Real Application Clusters. It is Shared-Nothing Architecture After All!”

Feed for this Entry Trackback Address

1 LewisC August 10, 2007 at 12:18 am

Hi Kevin.

I don’t actually have anything to add to your post. I did want to say, though, that I think you should win two awards. The longest blog title and the longest post titles. 😉

I have my resolution set to 1280×1024 and the blog title wraps twice and the post title wraps three times! I’ve read blogs that had less information in the entire post than yours do in the title.

heh. Just felt the need to point that out.

Take care,

LewisC

Reply
2 kevinclosson August 10, 2007 at 12:43 am

Fair shot, Lewis 🙂

Do you mean the blog title wraps when you are viewing it on my blog or if I should cut and paste the URL somewhere (such as, uh uh, on your blog)?

Reply
3 snarky August 10, 2007 at 4:58 am

This is a totally weak vaporware thing. They bought a single developer that made a java front end called Extendb that doesn’t even work. Why not check out CJDBC while you are at it.

Reply
4 kevinclosson August 10, 2007 at 5:09 am

Snarky,

C-JDBC…very (un)interesting. Thanks for the heads up!

Reply
5 Doug Burns August 10, 2007 at 5:44 am

I do question how anyone could claim support for Oracle compatibility with enterprise customers without lab gear to reproduce the out-cases.

Now *that’s* a very interesting point, Kevin. I often work with enterprise clients and they would expect this as a key component of any support offering. How can you say how compatible something is without being able to run side-by-side comparisons?

Reply
6 snarky August 10, 2007 at 6:15 am

No problem.

If you want to download it (gridsql) go here:
http://freshmeat.net/projects/extendb/

Note that when you click on http://www.extendb.com you are redirected to edb’s website.

I’m so sick of vendor claims without proof, where are the customers?

Reply
7 Niall Litchfield August 10, 2007 at 9:10 am

The blog main title wraps, and overwrites the little menu thingies (term of art :)) on your blog viewed at 1400×1050, not that I care much because of the content.

Reply
8 Niall Litchfield August 10, 2007 at 9:13 am

Update – the overwriting happens on I.E7, but not on Firefox 2 – the wrapping happens on both to some extent. Lesson learned, don’t read kevin with Microsoft products.

Reply
9 Dominic Delmolino August 10, 2007 at 1:31 pm

“I won’t even touch the fact that most ERP apps are a bit of an amalgam of both OLTP and DSS-style accesses.”

My favorite line in the whole post. 🙂

Reply
10 kevinclosson August 10, 2007 at 4:02 pm

Dominic,

Hmmm…I best start keeping it short and sweet then 🙂

Reply
11 kevinclosson August 10, 2007 at 6:46 pm

Niall, LewisC,

According to statscounter, the minority of readers of this blog (42%) use IE6/7. Lewis apparently is one of those. No matter, by reading the tea leaves I can tell Lewis doesn’t care about seeing the content of the blog correctly anyway.

I just took a look with IE6. I had to view->text size->smaller to view as cleanly as Firefox (which is what I use).

I hope sincere readers using IE6 will be able to cope with setting the text size accordingly. The wordpress motif I use is not the most flexible so we sort of get what we get I guess.

Reply
12 kevinclosson August 10, 2007 at 11:53 pm

mordred,

Nice trackback! I wished to comment on your post but comments are closed. I’ll do it here:

Folks, follow that trackback from bonglonglong…it is a good read and a touch of class.

Reply
13 Ken Jacobs August 11, 2007 at 12:02 am

Great (but long) post, Kevin.

Another thing to consider (beyond the performance of a specific workload — whether OLTP or DW — running at a particular scale as you implicitly do here) has to do with scalability … of workload and database size. What do you do with a shared nothing system when you need to add processors? Or disk? As you know, it’s simple with RAC (just plug in more disk or servers), but not so simple with a shared nothing system (reload/repartition).

And, with respect to workload and data skewing, there is the problem that only some of the processors can do the work required, since with shared nothing, it is not the case that all processors have access to all the data. Workloads and the “interesting” part of the database do shift over time. A statically partitioned/shared nothing system can’t adapt, like RAC can.

And, using a middleware approach for OLTP on shared nothing hardware like GridSQL apparently does or XKoto or Continuent do is also problemmatic as you scale the workload and number of nodes. As the update volume increases, adding nodes doesn’t really help, since the fraction of each node used for doing updates grows too, leaving less and less available for queries.

Hmmm … RAC is a truly amazing technology.

Reply
14 kevinclosson August 11, 2007 at 12:17 am

Ken,

Thanks for that. “Good( but long)”, argh. 🙂

Your points are (of course) spot-on about shared nothing. I’ve been there and done that with Informix XPS. I was in Sequent Database Engineering when Informix Online DSA was being developed (Sequent was the dev platform thanks Gary Kelley).

In my experience (based on a lot of side-by side with both PQO and XPS), if any shared nothing was going to have made it (based upon pure technology) it would have been XPS. That still wouldn’t make XPS better than PQO/RAC however since it couldn’t/can’t do OLTP. The poison pill for shared-nothing is always the inevitable re-partitioning (reload your database) and data shipping (joins, etc). These pitfalls are intrinsic to the architecture and no amount of tender love and care will fix that.

I regularly state that this is not a techno-religious blog, but when it comes to shared-disk versus shared-nothing I usurp my authority (it’s my blog) and break the rules. I’ve been into both technologies (hands-on) deeply.

Reply
15 LewisC August 11, 2007 at 2:20 am

Kevin,

I use Firefox 2. I didn’t mean the screen was messed up, just that the titles were very long. It was meant as a humorous comment. That’s all.

Why do you think I don’t like the content? I do like the content. I wouldn’t read it otherwise. Because I also happen to like PostgreSQL and EnterpriseDB?

Databases are not a religion to me. It’s ok to work with multiple databases. It’s ok to like working with multiple databases. If I had more time, I would work with even more databases. I’ve used DB2. It’s an ok database. The latest incarnations are much better than earlier versions. c’est la vie.

I’m not in any manner an expert on clustering. I read your blog (and Doug and Niall and etc) to learn about things I don’t know.

Thanks for your (long) blog. 😉

LewisC

Reply
16 kevinclosson August 11, 2007 at 5:07 am

LewisC,

No bad blood. I presumed that since you read a blog entry rich in content yet only commented on the visual appeal you must not be “on board” as they say. As for “other” databases, read my “About” section…you’ll see I’ve been around the block with other products.

Again, no ill intent. Stay with us.

Reply
17 Gregory August 11, 2007 at 1:28 pm

Actually there is no cluster neither IBM result for 30,000 GB. You probably meant 10,000 GB or you know something I don’t 😉 !

Reply
18 kevinclosson August 11, 2007 at 3:04 pm

Gregory,

Good catch, thanks. Yes, I meant 10,000GB scale and just made that correction. As for 30,000GB scale TPC-H, there are no clustered results. Instead, there is a non-clustered 30,000GB Oracle/HP result.

Thanks again.

Reply
19 LewisC August 11, 2007 at 5:57 pm

Kevin,

>>No bad blood.

Cool. I’m glad.

>>As for “other” databases, read my “About” section…you’ll see I’ve
>>been around the block with other products.

Wow. My weakest area is HA. I just haven’t worked in an environment that used RAC or clustering (or even data guard for that matter). I see you’ve done a little bit in that area. heh

>>I presumed that since you read a blog entry rich in content yet
>>only commented on the visual appeal you must not be “on board” as
>>they say.

Nah. I just didn’t have anything intelligent to add. 😉

It’s a detailed, well thought out entry.

Take care,

LewisC

Reply
20 Blindman August 13, 2007 at 1:05 pm

So, how does one select an appropriate clustering environment? Oracle is too expensive (for small companies like ours) and MySQL/PostgreSQL are too “small” … whats the most appropriate in-between solution? Cheaper, shared-nothing solutions seem to stand out despite their flaws.

A.

Reply
21 kevinclosson August 13, 2007 at 3:30 pm

Blindman,

I cannot imagine how any company that is looking for a clustered solution could be so small that not even Oracle Standard Edition could fit the budget. I don’t sell the stuff, but since the following URL tells me that SE starts out on the named user front as 300 per user with a minimum of five or $15K per socket. It includes Real Application Clusters. I should think that a company so small as to care about such dollar values would also quite likely be able to fit their workload into a 2 node cluster of single socket (quad core) servers with, for instance, the Intel 5355 processor. Connect it to some inexpensive (but stable) NAS and you have a very simple, solid clustered solution.

Click to access database-11g-standard-edition-datasheet.pdf

Reply
22 Blindman August 15, 2007 at 1:36 am

It looks simple on paper for sure. But trying to get a neat and tidy answer out of Oracle like that is quite the challenge. Ever consider doing sales for Oracle?

Reply
23 kevinclosson August 15, 2007 at 4:05 am

It looks simple on paper for sure. But trying to get a neat and tidy answer out of Oracle like that is quite the challenge. Ever consider doing sales for Oracle?

Blindman,

Thanks for stopping by, but no.

Reply
24 blindman August 15, 2007 at 7:11 am

Now how about that … a quick read of your blog and:
a/ I have met the guys at Pythian
b/ Oracle has come back this time with a quote of AUD$3000 for what they quoted $60k last week just by switching to 5 user license model.

Who would have thunk it. Thanks for being you Kevin!

Reply
25 kevinclosson August 15, 2007 at 3:15 pm

Blindman,

I’m not sure how to read your last comment. Those sound like kind words, but be it far from me to ever take credit for anything that means less revenue for Oracle. And when I say far, I mean FAR!

Reply
26 Lori Nichols August 23, 2007 at 7:19 pm

Your comments on Oracle RAC scaling to 10TB are somewhat valid. However what RAC does not give you is the ability to parallel process a query unless you are using parallel query with its very heavy processor overhead. In these instances, the likes of greenplum and Datallegro may perform better. Certainly over the 10TB mark, Netezza, Datallegro are likely to outperform Oracle RAC where breaking a query down into parallel simultaneous processing on several engines is appropriate for the application.

Reply
27 kevinclosson August 23, 2007 at 9:55 pm

Lori,

Oracle has an audited 30TB result too, but that is nothing. I was involved with a project based on an 80TB Oracle warehouse in the late 90s with Oracle8i. Oracle needs no more Proof of Concept in the VLDB area.

Also, if parallel query has “heavy processor overhead” as you state, that would pale in comparison to the data-shipping shared-nothing approaches incur.

Reply
28 Reason Truth August 28, 2007 at 4:04 am

Shared Nothing databases CRUSH Oracle RAC. Oracle doesn’t scale on complex query, plain and simple. Your 80TB database had indexed access to small chunks of data no doubt – try building aggregates or doing ETL transformations with your most powerful Oracle RAC installations and they FAIL. I’ve seen Oracle beaten by 100x on query times overall after the Oracle people try over and over to make it scale and they just can’t do it.

Oracle has a failed architecture for big data and they know it. They’ll fix it after it’s too late and when they do, realize that you’ve been misled all this time.

Reply
29 M Smith October 26, 2007 at 3:22 pm

“I won’t argue that it is conceivable that nobody in the entire PostgreSQL (upon which EnterpriseDB is built) development community ever ran instances of PostgreSQL and Oracle (freshly downloaded from OTN) side by side. After all, doing so would violate the OTN license.”

With such deliberately arrogant and restrictive licensing (a tactic shared with microsoft) surely the time is right for someone like EnterpriseDB to offer significantly cheaper and less restrictive alternatives to Oracle.

Reply
30 Chris S October 28, 2007 at 8:01 pm

Kevin, The posters above fall into the trap of seeing it as a
large BI/DSS database;

whereas RAC acts in both ERP , OLTP and BI/DSS.

The TPC benchmark was passed. You can download for free how they set
it up from the TPC website AFAIR; Look for full disclosure.

RAC works

Reply
31 Jonathan Moore August 20, 2008 at 5:00 pm

So one of the main points here seems to be that data shipping is bad but I don’t understand how shipping data across a SAN is worse then shipping data between hosts in the case that both have the same interconnect. It may even be that shipping data between hosts is better as the host can prune the data down before shipping it.

Reply
32 PB August 21, 2008 at 6:05 am

Last time I went to Oracle and said, “I’m evaluating a change in database platform for my data warehouse — only about 3TB, largest table with 2.5B rows x 4k/row,” Oracle didn’t even come to the table. We were a sizable Oracle shop. The data warehouse was on DB2 UDB. We went to Teradata. Yes, the shared-nothing architecture does a lot of shipping around of data, but there really are times with the brute force algorithm out-performs all others. Feel free to do the big-O analysis of the algorithms to prove it to yourself.

Oh, and I wouldn’t trust TPC-H benchmarks worth anything. The only benchmarks that I ever trust are the ones using “my data” and “my queries.” I’m just that skeptical.

Reply
33 kevinclosson August 21, 2008 at 7:44 pm

Paul,

A agree with everything you just said. What most readers of this post and comment thread are missing is the fact that I hinge-pin my architectural bias to shared disk because you simply cannot do any decent OLTP with shared nothing. I am biased toward an RDBMS architecture that suits both OLTP and DW and, yes, the common Oracle deployment approach for DW (e.g., huge server pushing some SAN arrays that actually suck life **out** of Oracle) can stand for improvement. Yes, even a lot of improvement.

Let me put it this way. Imagine, just for a moment, that Oracle could push scan rates at the same level as an MPP architecture (e.g., DATAllegro) on a per-disk basis. Wouldn’t Oracle then be infinitely better than a DW-only solution since it would be well suited to both/either DW or OLTP? The answer to that is yes.

Finally, Paul, I don’t doubt that what you are saying is true regarding your RFP for a DW configuration that can handle scans of 2.5 billion 4KB rows. But humor me please by answering 2 things:

1) How do you get a table of 2.5 billion rows at 4KB per into a 3TB warehouse since that is the better part of 10TB (9.3TB to be exact). Are you forgetting to mention compression or did you mean less than 2.5 billion rows

2) What is your teradata full scan rate on this 2.5 billion rows with something light such as avg(some column in the middle of the row). How many teradata nodes does it take to produce said scan rate? I should think you wouldn’t accept a scan taking any longer than, say, 60 seconds?

You are anonymous and there should be no reason not to give us some details on the power you are getting out of that teradata in terms such as these…terms we can all appreciate.

Finally, yes, TPC-H is a joke…unless of course it is a great shared-nothing result. No, honestly, it is a joke.

Reply
34 Matt September 18, 2008 at 9:38 pm

I like all of the comments –
I agree with Kevin Whole heartedly – It is about Balance.
If I am trying to build the fastest Top-Fuel Dragster then I build that car, I tune it for that specific race.

Benchmarks are just the same. TPC-C TPC-H, I like to use them when I talk to customers. Two reasons 1) they are a common bench mark and all the vendors build their dragster to win. 2) No one buys and configures H/W in that manner to run real-world applications.

For any specific job there is a tool that will uniquely fit the purpose better, but for a price that may not be justifiable.

The Terra Data Solutions are not real good at running Peoplesoft Finance.
HPs Neoview – still cannot find a customer running their supply chain system on that platform.

Oracle provides all customers a flexible solution to answer their needs. And this is where I agree with Kevin, it offers the most flexibility to meet the variable needs of DSS, ODS, OLTP, OLAP, ROLAP etc.

Reply

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage