Archive Page 18

“Feel” Your Processor Cache. Oracle Does. Part II.

That’s funny, but I had a sneaking suspicion it was going to happen, so….

In yesterday’s post entitled “Feel” Your Processor Cache. Oracle Does. Part I., I pointed out that the newest comer to the ever-growing crowd of in-memory open source database products and accelerators got it a bit wrong when they described what a level-two processor cache is. However, before I made that blog post I took a screen shot of the Csqlcache blog. Notice the description of L2:

csqlcache_before

"Before" Description: L2 Cache Soldered to the Motherboard

I just checked and it looks like they took the hint, but in what some would consider poor blogging style by simply changing it as opposed to making an edit to draw attention to the change.  But that’s not why I’m blogging. The fact that someone made a blog correction is not interesting to me. Please see the “after rendition” in the next screen shot:

csclecache_after

"After" Description: L2 On-die.

So, yes, they took the hint that expressing cache latency in wall clock time is messy, but they now cite fixed latencies for L1, L2 and memory. First off, 5-cycle L1 would be disastrous! Then citing 10-cycle L2 is truly a number out of the hat. But that is not what I’m blogging about.

The new page is citing memory latency at 5-50ns. Oh how I’d love to have a system that chewed up memory at 50ns! But what about that low bound 5ns? Wow, memory latencies at modern L2 cache speed. That would be so cool! I wonder where these Csqlcache folks get their hardware? It is definitely out-of-this-worldly.

It’s All About Cache Lines
I don’t get this bit about “granularity” in that page either. Folks, modern microprocessors map memory to cache with in nits known as cache lines. All processors that matter (no names but their initials are x64) use a cache line size of 64 bytes (8 words). In order to access any bits within a 64-byte line of memory, the entire line must be installed in the processor cache. So I think it would be a bit more concise to specify granularity at the base operational chunk the processor deals with, which is a cache line. That’s the point of the Silly Little Benchmark by the way.

The workhorse of SLB (memhammer) randomly picks a line and writes a word in the line. The control loop is tight and the work loop is otherwise light so this test creates maximum processor stalls with minimum extraneous cycles. That is, it exhibits a miserably high CPI (cycles per instruction) cost. That’s why it is called memhmmer.

I’ve got the “before” screen shot. Let’s see if it silently changes. I hate to sound critical, but these Csqlcache folks are hanging their hat on producing a database accelerator. You have to know a lot about memory and how it works to do that sort of thing well. And, my oh my how the in-memory and in-line database accelerators field is so saturated. That reminds me of the company that my SQL Server-focused counterparts at PolyServe were all excited about back in about 2005 called Xprime. It looked like a holy grail back then. I recall it even took a best new product sort of award at a large SQL Server convention.

It didn’t work very well.

“Feel” Your Processor Cache. Oracle Does. Part I.

At about the same time I was reading Curt Monash’s mention of yet another in-memory database offering, my friend Greg Rahn started hammering his Nehalem-based Mac using the Silly Little Benchmark (SLB). Oh, before I forget, there is an updated copy of SLB here.

This blog entry is a quick and dirty two-birds-one-stone piece.

Sure About In-Memory Database? Then Be Sure About Memory Hierarchy
Curt’s post had a reference to this blog entry about levels of cache on the Csqlcache blog. I took a gander at it and immediately started gasping for air. According to the post, level-2 processor cache is:

Level 2 cache – also referred to as secondary cache) uses the same control logic as Level 1 cache and is also implemented in SRAM and soldered onto the motherboard, which is normally located close to the processor.

No, it isn’t. The last of the volume microprocessors to use off-die level-2 cache was the Pentium II and that was 11 years ago. So, no, processors don’t jump off-die to access static RAMs glued to the motherboard. Processor L2 caches are in the processor (silicon) and, in fact, visible to other cores within a multi-core package. That’s helpful for cache-to-cache transfers, which occur at blisteringly high frequencies with an Oracle Database workload since spinlocks (latches) sustaining a high acquire/release rate will usually have another process trying on the latch on one of the other cores. Once the latch is released, it is much more efficient to shuttle the protected memory lines via a cache-to-cache transfer than in the olden days where L2 cache required a bus access. These shared caches dramatically accelerate Oracle concurrency. That’s scalability. But that isn’t what I’m blogging about.

Get Your Stopwatch. How Fast is That L2 Cache?
In the Csqlcache blog it was stated matter-of-factly that L2 cache has a latency of 20ns. Well, ok, sure, there are or have been L2 processor caches with 20ns latency, but that is neither cast in stone, nor the common nomenclature for expressing such a measurement. It also happens to be a very poor L2 latency number, but I digress. Modern microprocessor L2 cache accesses are in phase with the processor clock rate. So, by convention, access times to L2 cache are expressed in CPU clock cycles. For example, consider a processor clocked at 2.93 GHz. At that rate, each cycle is 0.34 nanoseconds. Let’s say further that a read of a clean line in our 2.93 GHz processor requires 11 clock cycles. That would be 3.75 ns. However, expressing it in wall clock terms is not the way to go, especially on modern systems that can throttle the clock rate as per load placed on the processor. Let’s say, for example, that our 2.93 GHz processor might temporarily be clocked down to 2 GHz. Loading that same memory line would therefore require 5.5 ns.

We can use SLB to investigate this topic further. In the following session excerpt I ran SLB on the first core of the second socket on a Xeon 5400 based server. I had SLB (memhammer) allocate 4 MB of memory from which memhammer loops picking random 64 byte offsets in which to write. It turns out that SLB is the most pathological of workloads because it requires a processor L2 line load prior to every write–except, that is, in the case where I allocate a sufficiently small chunk of memory to fit in the L2 cache. As the session snapshot shows, memhammer was able to write at random locations within the 4 MB chunk at the rate of 68.86 million times per second or 14.5 ns per L2 cache access.


# cat r
./create_sem
taskset -pc 4 $$
./memhammer $1 $2 &
sleep 1
./trigger

wait

#  ./r 1024 3000000
pid 23384's current affinity list: 0-7
pid 23384's new affinity list: 4
Total ops 3072000000  Avg nsec/op    14.5  gettimeofday usec 44614106 TPUT ops/sec 68857145.8

When I increased the chunk of memory SLB allocated to 64 MB, the rate fell to roughly 9.3 million writes per second (107.8ns) since the test blew out the L2 cache and was writing to memory.


#  ./r 16384 30000
pid 22919's current affinity list: 0-7
pid 22919's new affinity list: 4
Total ops 491520000  Avg nsec/op   107.8  gettimeofday usec 52970954 TPUT ops/sec 9279047.5

I don’t know anything about Csqlcache. I do know that since they are focused on in-memory databases they ought to know what memory really “looks” like. So, put away your soldering iron and that bag full of SRAM chips. You can’t make your modern microprocessor system faster that way.

“I Still Want My Fibre Channel.” Thus Sayeth Manly Man!

Just a quick alert that one of my installments in the “Manly Man” series has just come alive once again through its comment thread. I find it interesting that 20 months after I wrote that it still gets read quite frequently-a proximately 150 times per month following the first month after the original posting-as per the WordPress analytics on the post. But the visit rate on the post isn’t what I’m blogging about.

Let’s face it, that post-and the majority of the Manly Man series-are basically an indictment against the storage presentation weaknesses specific to Oracle Real Application Clusters in an FC SAN environment. During that timeframe a lot had transpired that makes me feel a bit prescient on the matter. Consider:

  1. Oracle released Direct NFS in Oracle Database 11g:
    1. Manly Men Only Deploy Oracle with Fibre Channel – Part VI. Introducing Oracle11g Direct NFS!
  2. SCSI RDMA (SRP) became a viable option:
    1. Oracle on Opteron with Linux-The NUMA Angle (Part III). Introducing The Silly Little Benchmark.
    2. HP’s Optimized Warehouse (Blades) Reference Platform ( http://h18004.www1.hp.com/products/blades/oow/index.html )
  3. Switched Serial Attached SCSI became increasingly more popular

…and, of course Oracle Exadata Storage Server and the HP Oracle Database Machine!

Just How Well Do You Know Your Oracle Home Directory Tree?

How Deep is Your…Oracle Home?
This is likely the most trivial of pursuits sort of posts I’ve made in a long while. Please don’t ask me why, but I had to take a few minutes to inventory directory depth in an Oracle Database 11g Enterprise Edition (with Real Application Clusters) Oracle Home directory tree on Linux. The data in the following box shows each directory depth that exits under my Oracle Home and a tall of how many directories are nested that deeply.

Maybe we should all breathe a sigh of relief that there are only 3 directories laying 13 levels deep? That is, after all, only .07% of all directories (4208) under a typical 11g Oracle Home! I’m not losing sleep.

Hey, like I said, trivial pursuit. Ho hum.

SQL> select d,count(*) from oh_dirs
  2  group by d
  3  order by 2 desc ;

         D   COUNT(*)
---------- ----------
         8        796
         5        729
         6        663
         4        651
         9        437
         7        369
         3        306
        10        145
         2         70
        11         31
        12          6

         D   COUNT(*)
---------- ----------
        13          3
        14          1
         1          1

14 rows selected.

SQL> select count(*) from oh_dirs;

  COUNT(*)
----------
      4208

Webcast Announcement: Oracle Exadata Storage Server Technical Deep Dive – Part II.

Oracle Exadata Storage Server Technical Deep Dive – Part II.

Thursday, April 16, 2009 12:00 PM – 1:00 PM CDT

This is the second webinar in the series of the Oracle Exadata Storage Server Technical Deep Dive. Kevin Closson will offer a recap of his exciting first webinar on Exadata Storage Server and HP Oracle Database Machine internals and performance characteristics. He will revisit Unanswered Questions from Part I and also offer a new segment:

What About All That “Brainy Software Part I”

– Examination of Index Creation

– Index Smart Scan

The session will conclude with Q & A. Given the fact that this is a series, Kevin will try to make Part II feel a bit more “town-hall”-like such that Q/A will have higher priority than time permitted in Part I.

Oracle Exadata Storage Server Technical Deep Dive: Part II. (Registration Link)

Don’t Blog About What You Intend To Blog About!

Well, after that “memory lane” post I just made (about SGI ), and feeling a bit hungry, there is an old blog post of mine that comes to mind. I know it is faux pas to blog-about-what-you-are-going-to-blog-about, but although I’m working on a good technical entry, it isn’t buttoned up quite yet. So, yes, I’m blogging about the fact that I will soon be blogging about something good (has to do with column ordinality, but the Vertica guys should get too excited).

So, like I said, I’m hungry and reminiscing and wishing I had the time tonight to prepare something worthwhile, but I don’t…but I could, and routinely do:

Wildfleisch Ragout mit Champignon

When Sun Microsystems Got Their First Big System It Was No April Fool’s Day Joke

I don’t think this (SGI Sold for 25 Million Dollars) is an April Fool’s Day joke either!

Wow, what a wild ride that has been.  See, SGI holds a special place in my heart. While working in Sequent Computer Systems’ Advanced Oracle Engineering Group in the mid 1990’s I recall SGI selling the technology assets that included Cray’s CS6400 to Sun Microsystems for what was rumored to be about $50 million dollars. That was Sun’s first big system (a.k.a. UE 10000) thank you very much. Not that the UE 6000 was a loaf, but the UE 6000 was not about to stand toe-to-toe with a period Sequent NUMA-Q 2000-or hold a candle to it for that matter. Before the UE10K, Sun systems were “quick” but very limited bandwidth machines.  It is fairly well known that Sequent management of the time didn’t think to buy and burn the CS6400 technology like they should have. It was, after all, developed no more than 400 meters from Sequent’s HQ. Figuring out a way to buy and burn that system house would have been a better “waste” of money than “The Dragster.”

If only someone, anyone, besides Sun would have bought that CS6400 division… if only…

Sun Microsystems went on the sell over 1,000 of those CS6400 (UE10K) jobbies per year for an annual take of over a billion dollars.

Memories… but it has all brought NUMA back to the forefront of my thinking today…

Enter, Nehalem.

PS. I need to point out to my Oaktable Network friends that this post was indeed a part of my ex-Sequent 12-step program.

What Good Are “Vendor Blogs” Anyway?

Curt Monash is a prolific writer and analyst who maintains several blogs and routinely contributes to online publications. I try to keep up on his writings at DBMS2 as there is plenty of interesting DW/BI-related content there.

In a recent post in Text Technologies Blog, Curt was making some points about what effect social media might have on the future of the “information ecosystem.”  When referring to “Vendor Blogs”, Curt had the following to say:

Presenters of news. Vendors with stories to tell will take increasing responsibility for telling them deeply and well. Their economic motivation is obvious. And sometimes it goes beyond money. One of the most effective vendor blogs is surely Kevin Closson’s, and I know from talking with Kevin’s boss’s boss that Oracle was as surprised as anybody when his blog burst into popularity.

I was quite surprised to see Curt making mention of my blog out of seemingly thin air because (admittantly to my discredit) he and I have locked antlers a couple of times as a result of my zeal for all things Exadata.

But I’m not blogging about any of that.

Rockets: Red Glare. Blogs: Bursting in Air (or Popularity)
I thought Curt’s quote of my boss’s boss’ surprise was interesting and it got me curious. Has my blog “burst into popularity?”

I started blogging 11 months before I joined Oracle. So I thought I’d check my WordPress statistics to see what the average page viewing traffic was for:

  • The six months leading up to my starting date with Oracle
  • The 12 months prior to the release of Exadata
  • The 30 days following the release of Exadata
  • The last 6 months (dates back from today to the release of Exadata)

I’ll treat the average of the first row in the following table as the baseline and represent all other figures relative to that baseline. The data:

Time Frame

Average Page View Units

6 Months Prior to Joining Oracle

1.000

6 Months Prior to Exadata Release

0.876

30 Days After Exadata Release

1.460

6 Months After Exadata Release

0.986

The 30 days following the release of Exadata was a roller coaster with a 46% jump in reader activity, but I have to admit that losing 1.4% compared to my pre-Oracle days feels more like waning into obscurity than bursting into popularity 🙂

Vendor Blogs
I don’t really consider my blog a “Vendor Blog” per se, but I’m glad to get the hat-tip from Curt nonetheless. I admit there are technology-related topics I’d love to blog about but feel constrained as a corporate employee. I wonder, does that make me a shill?

Oracle Database 11g with Intel Xeon 5570 TPC-C Result: Beware of the Absurdly Difficult NUMA Software Configuration Requirements!

According to this Business Wire article, the Intel Xeon 5500 (a.k.a., Nehalem) is making a huge splash with an Oracle Database 11g TPC-C result of 631,766 TpmC. At 78,970 TpmC/core, that is an outrageous result! I remember when it was difficult to push a 64 CPU system to the level these CPUs get with only one processor core! I had to quickly scurry over to the TPC website to dig in to the disclosures, but, as of 8:12 PM GMT it had not been posted yet:

TPC posting

Jumping the Gun
Ah, but then by the time I got around again to check it had indeed been posted. The first thing I did was check the full disclosure report to see what sort of Oracle NUMA-specific tweaking was done in the init.ora. None. That, is very good news to me. The last thing I want to see is a bunch of confusing NUMA-specific tuning. Allow me to quote myself with a saying I’ve been rattling off for years:

The best NUMA system is the best SMP system.

By that I mean it shouldn’t take application software tuning to get your money’s worth out of the platform. Sure, we had to do it back in the mid to late 1990’s with the pioneer NUMA systems, but that was largely due to the incredible ratio between local memory latency and highly-contended remote memory (and due to the concept of remote I/O which does not apply here). Of course the operating system has to be NUMA aware. Period.

Speeds
I know what the ratios are on Xeon 5500 series but I can’t recall whether or not the specific number I have in mind is one I obtained under non-disclosure  so I’m not going to go blurting it out. However, it turns out that as long as memory is fairly placed (e.g., not a Cyclops ) and the ratio is comfortably below 2:1 (R:L) you’re going to get a real SMP “feel” from the box. Of course, the closer the ratio leans towards 1:1 the better.

Summary
NUMA is a hardware architecture that breaks bottlenecks. It shouldn’t have to break SMP programming principles in the process. The Intel Xeon 5570, it turns out, is the sort of NUMA system you should all be clamoring for. What kind of NUMA system is that? The answer is a NUMA system that is indistinguishable from a flat-memory SMP.

Very cool!

PS. I actually already knew what level of NUMA tuning was used in this TPC-C testing. I just couldn’t blog about it. I also know the precise R:L memory latency ratio for the box. The way I look at it though is since this modern NUMA system gets 78,970 TpmC/core, the R:L ratio is unnecessary minutiae-as is thoughts of NUMA software tuning. I never imagined NUMA would come far enough for me to write that.

Helpful Blogs! Yes, I Read the Documentation, But It Doesn’t Always Sink In.

I was just chatting with my friend Greg Rahn about an External Table related problem that I was hitting when he pointed me to a related post on Tim Hall’s ORACLE-BASE website. One again (as many times before) Tim’s blog proved extremely informative. It is by far one of my favorite blogs. I don’t know if it is a right-brain/left-brain thing, but Tim’s examples always help me smooth over any problems I’m having with documentation complexities. Come on, admit it, all of us have scratched our heads at least once while staring at a convoluted railroad diagram in the documentation and wished there was an example to bring it to life. Well, like I said, Tim always does that!

Come to think of it I almost forgot to mention that fellow Oaktable Network member Jared Still has been blogging at  Jared Still’s Ramblings for a couple of years. It is a good site and I recommend it. I’ll be adding it to my blogroll.

Poll Results: Stop Blogging.

According to the poll on my recent blog anniversary post, 1% of those participating in the poll recommend I stop blogging. There’s proof positive you can’t please everyone. On the other hand, 6% of the participants wanted more blogging about fishing. I am trying to post an occasional photo on my miscellaneous page. I just uploaded a couple of fishing-related photos for you six-percenters.

Bulk Data Loading Rates. Is 7.3 MB/s per CPU Core Fast, or Fast Enough? Part I.

It turns out that my suspicion about late-breaking competitive loading rates versus loading results was not that far off base. As I discussed in my recent post about Exadata “lagging” behind competitor’s bulk-loading proof points, there is a significant difference between citing a loading rate as opposed to a loading result. While it is true that customers generally don’t stage up a neat, concise set of flat files totaling, say, 1TB and load it with stopwatch in hand, it is important to understand processing dynamics with bulk data loading. Some products can suffer peaks and valleys in throughput as the data is being loaded.

I’m not suspecting any of that where these competitor’s results are concerned. I just want to reiterate the distinction between cited loading rates and loading results. When referring to the Greenplum news of customer’s 4TB/h loading rates, I wrote:

The Greenplum customer stated that they are “loading at rates of four terabytes an hour, consistently.” […] Is there a chance the customer loads, say, 0.42 or 1.42 terabytes as a timed procedure and normalizes the result to a per-hour rate?

What if it is 20 gigabytes loading in a single minute repeated every few minutes? That too is a 4TB/h rate.

While reading Curt Monash’s blog I found his reference to Eric Lai’s deeper analysis. Eric quotes a Greenplum representative as having said:

The company’s customer, Fox Interactive Media Inc., operator of MySpace.com, can load 2TB of Web usage data in half an hour

That is a good loading rate. But that isn’t what I’m blogging about. Eric continued to quote the Greenplum representative as saying:

To achieve 4TB/hour load speeds requires 40 Greenplum servers

40 Greenplum servers…now we are getting somewhere. To the best of my knowledge, this Greenplum customer would have the Sun fire 4500 based Greenplum solution.  The 4500 (a.k.a. Thumper) sports 2 dual-core AMD processors so the configuration has 160 processor cores.

While some people choose to quote loading rates in TB/h form, I prefer expressing loading rates in megabytes per second per processor core (MBPS/core). Expressed in MBPS/core, the Greenplum customer is loading data at the rate of 7.28 MBPS/core.

Summary
Bold statements about loading rates without any configuration information are not interesting.

PS. I almost forgot. I still think Option #2 in this list is absurd.

The HP Oracle Database Machine is Too Large and Too Powerful…Yes, for Some Applications!

Hypothetical Problem Scenario
Imagine your current Oracle data warehouse is performing within, say, 50% of your requirements. You’re a dutiful DBA. You have toiled, and you’ve tuned. Your query plans are in order and everything is running “just fine.” However, the larger BI group you are supporting is showing a significant number of critical queries that are completing in twice the amount of time specified in the original service level agreement. You’ve examined these queries, revisited all the available Oracle Database Data Warehousing features that improve query response time but you’ve determined the problem is boiling down to a plain old storage bottleneck.

Your current system is a two-node Real Application Clusters (RAC) configuration attached to a mid-range storage array (Fibre Channel). Each RAC server has 2 active 4GFC HBA ports (e.g., a single active card). The troublesome queries are scanning tables and indexes at an optimal (for this configuration) rate of 800 MB/s per RAC node for an aggregate throughput of 1.6 GB/s. Your storage group informs you that this particular mid-range array can sustain nearly 3 GB/s. So there is some headroom at that end. However, the troublesome queries are processor-intensive as they don’t merely scan data-they actually think about the data by way of joining, sorting and aggregating. As such, the processor utilization on the hosts inches up to within, say, 90% when the “slow” queries are executing.

The 90% utilized hosts have open PCI slots so you could add another one of those dual-port HBAs, but what’s going to happen if you run more “plumbing?” You guessed it. The queries will bottleneck on CPU and will not realize the additional I/O bandwidth.

Life is an unending series of choices:

  • Option 1: Double the number of RAC nodes and provision the 3 GB/s to the 4 nodes. Instead of 1.6 GB/s driving CPU to some 90%, you would see the 3 GB/s drive the new CPU capacity to something like 80% utilization. You’ have a totally I/O-bottlenecked solution, but the queries come closer to making the grade since you’ve increased I/O bandwidth  88%. CPU is still a problem.
  • Option 2: Totally jump ship. Get the forklift and wheel in entirely foreign technology from one of Oracle’s competitors.
  • Option 3: Wipe out the problem completely by deploying the HP Oracle Database Machine.

The problem with Option 1 is that it is a dead-end on I/O and it isn’t actually sufficient as you needed to double from 1.6 GB/s but you hit the wall at 3 GB/s. You’re going to have to migrate something somewhere sometime.

Option 2 is very disruptive.

And, in your particular case, Option 3 is a bit “absurd.”

He’s Off His Rocker Now
No, honestly, deploying a 14 GB/s solution (HP Oracle Database Machine) to solve a problem that can be addressed by doubling your 1.6 GB/s throughput is total overkill. This all presumes, of course, that you only have one warehouse (thus no opportunity for consolidation) and a powerful HP Oracle Database Machine would be too much kit.

No, He’s Not Off His Rocker Now
We had to be hush-hush for a bit on this, but I see that Jean-Pierre Dijcks over at The Data Warehouse Insider finally got to let the cat out of the bag. Oracle is now offering a “half-rack” HP Oracle Database Machine.

This configuration offers a 4-node Proliant DL360 database grid and 7 HP Oracle Exadata Storage Servers. This is, therefore, a 7GB/s capable system. To handle the flow of data, there are 88 Xeon “Harpertown” processors performing query processing that starts right at the disks where filtration and projection functions are executed by Exadata Storage Server software.

So, as far as the option list, I’d now say Option 3 is perfect for the hypothetic scenario I offered above. Just order, “Glass half empty, please.”

Option 2 is very disruptive.

Webcast Announcement: Oracle Exadata Storage Server Technical Deep Dive. Part I.

Wednesday, March 25, 2009 12:00 PM – 1:00 PM CDT

Kevin Closson will offer an in-depth presentation on Exadata Storage Server and HP Oracle Database Machine internals and performance characteristics. Topics planned for this installment in the series include:

  • Brief Technical Architecture Overview
  • Understanding Producer/Consumer Data Flow Dynamics
  • A “How” and “Why” Comparison of Exadata versus Conventional Storage
  • Storage Join Filters

Link to the Registration Page for the Webcast.

Where’s the Proof? Poof, It’s a Spoof! Exadata Lags Competitor Bulk Data Loading Capability. Are You Sure?

I’ve received a good deal of email following my recent blog entry entitled Winter Corporation Assessment of Exadata Performance: Lopsided! Test it All, or Don’t Test at All? I’m not going to continue the drama that ensued from that blog post, but an email I received the other day on the matter warrants a blog entry. The reader stated:

[…text deleted…] that is why I sort of agree with Dan. It makes no sense to load a huge test database without showing how long it took to load it. Now I see Greenplum has very fast data loading Oracle should wake up or […text deleted…]

That is a good question and it warrants this blog entry. I was quite clear in my post about why the Winter Corporation report didn’t cover every imaginable test, but I want to go into the topic of data loading a bit.

The reader asking this question was referring to a blogger who took Richard Winter’s Exadata performance assessment to task citing three areas he deemed suspiciously missing from the assessment. The first of these perceived shortcomings is what the reader was referring to:

High Performance Batch Load – where are the performance numbers of high performance batch load, or of parallel loads executing against the device?  How many parallel BIG batch loads can execute at once before the upper limits of the machine and Oracle are reached?

So the blogger and the reader who submitted this question/comment are in agreement.

Nothing Hidden
The Winter Corporation Exadata performance assessment is quite clear in two areas related to the reader’s question. First, the report shows that the aggregate size of the tables was 14 terabytes. With Automatic Storage Management, this equates to 28 terabytes of physical disk. Second, the report is clear that Exadata Storage Server offers 1 GB/s (read) disk bandwidth. If Exadata happened to be unlike every other storage architecture by offering parity between read and write bandwidth, loading the 28 TB of mirrored user data would have taken only 2000 seconds which is a load rate of about 50 TB/h. Believe me, it wasn’t loaded at a 50 TB/h rate. See, Exadata-related literature is very forthcoming about the sustained read I/O rate of 1 GB/s per Exadata Storage Server. It turns out that the sustained write bandwidth of Exadata is roughly 500 MB/s or 7 GB/s for a full-height HP Oracle Database Machine.  Is that broken?

Writes are more expensive for all storage architectures. I personally don’t think a write bandwidth equivalent to 50% of the demonstrated read bandwidth is all that bad.  But, now the cat is out of the bag. The current maximum theoretical write throughput of a HP Oracle Database Machine is a paltry 7 GB/s (I’m being facetious because, uh, 7GB/s write bandwidth in a single 42U configuration is nothing to shake a stick at). At 7 GB/s, the 28 TB of mirrored user data would have been loaded in 4000 seconds (a load rate of about 25 TB/h). But, no, the test data was not loaded at a rate of 25 TB/h either. So what’s my point?

My point is that even if I told you the practical load rates for Exadata Storage Server, I wouldn’t expect you to believe me. After all, it didn’t make the cut for inclusion in the Winter Corporation report so what credence would you give a blog entry that claims something between 0 TB/h and 25 TB/h? Well, I hope you give at least a wee bit of credence because the real number did in fact fall within those bounds. It had to. As an aside, I have occasionally covered large streaming I/O topics specific to Exadata but remember that a CTAS operation is much lighter than ingesting ASCII flat file data and loading into a database. I digress.

This is an Absurd Blog Entry
Is it? Here is how I feel about this. I know the practical data loading rates of Exadata but I can’t go blurting it out without substantiation. Nonetheless, put all that aside in your minds for a moment.

Until quite recently, bulk data loading claims by DW/BI solution providers such as Netezza and Vertica were not exactly phenomenal. For instance, Netezza advertises 500 GB/hour load rates in this collateral and Vertica claims of 300 MB/minute and 5 GB/minute seemed interesting enough for them to mention it in their collateral. As you can tell, data loading rates are all over the map. So, one of two things must be true; either a) Oracle is about to get more vocal about Exadata bulk loading capabilities, or b) Exadata is a solution that offers the best realizable physical disk read rates but an embarrassing bulk data loading rate.

I know the competition is betting on the latter because they have to.

What About the Reader’s Question?
Sorry, I nearly forgot. The reader was very concerned about bulk data loading and mentioned Greenplum. According to this NETWORKWORLD article, Greenplum customer Fox Interactive Media is quoted as having said:

We’re loading at rates of four terabytes an hour, consistently.

Quoting a customer is a respectable proof point in my opinion. The only problem I see is that there is no detail about the claim. For instance, the quote says “rates of four terabytes an hour […]” There is a big difference between stating a loading rate and a loading result. For example, this Greenplum customer cites a rate of 4TB/h (about 1.1GB/s) without mention of the configuration. Let’s suppose for a moment that the configuration is a DWAPP-DW40, which is the largest single-rack configuration available (to my knowledge) from Greenplum. I admit I don’t know enough about Greenplum architecture to know, but depending on how RAM is used by the four Sun fire x4500 servers in the configuration (aggregate 64 GB), it is conceivable that the customer could “load” data at a 40 TB/h rate for an entire minute without touching magnetic media.

I presume Greenplum doesn’t cache bulk-inserted data, but the point I’m trying to make is the difference between a loading rate and a loading result. It is a significant difference. Am I off my rocker? Well, let’s think about this. The Greenplum customer stated that they are “loading at rates of four terabytes an hour, consistently.” Who amongst you thinks the customer stages up exactly 1TB of data in production and then plops it into the database with stopwatch in hand? Is there a chance the customer loads, say, 0.42 or 1.42 terabytes as a timed procedure and normalizes the result to a per-hour rate? Of course that is possible and totally reasonable. So why would it be totally absurd for me to suggest that perhaps the case is an occasional load of, say, 100 gigabytes taking about 350 seconds (also an approximate 4TB/h rate)? What if it is 20 gigabytes loading in a single minute repeated every few minutes? That too is a 4TB/h rate. And, honestly, both would be truthful and valid depending on the site requirements. The point is we don’t know. But I’m not going to totally discount the information because it is missing something I deem essential.

I’m willing to take the information reported by the Greenplum customer and say, “Yes, 4 TB/h is a good load rate!”  Thus, I have answered the blog reader’s original question about bulk loading vis a vis the recent Greenplum news on the topic.

Under-Exaggeration
The NETWORKWORLD article quotes Ben Werther, director of product marketing at Greenplum, as having said:

This is definitely the fastest in the industry, [ … ]  Netezza for example quotes 500GB an hour, and we have not seen anyone doing more than 1TB an hour.

Well, I think Ben has it wrong. Vertica has a proof point of a loading result of 5.4 terabytes in 57 minutes 21.5 seconds which is a rate of 5.65 TB/h. This result was independently validated (no doubt a paid engagement which is no-less valid) and to be trusted.  So, Ben’s statement represents a 5X under-exaggeration, which is better than 100X, 100X+ exaggerations I occasionally rant about.

A Word About the Vertica Bulk-Loading Result
It used TPC-H DBGEN to load data into a 3rd normal form schema. While I’m not going to totally discount a test because it doesn’t embody a tutorial on schema design, many readers may not know this fact about that particular proof point. The proof point is about data loading, not about schema design. The blocks of data were splat upon the round, brown spinning thingies at the reported, validated rate. No questioning that. Schemas have nothing to do with that though and thus it is just fine to use a 3rd normal form schema for such a proof point.

Summary
Greenplum and Vertica have good proof points out there for bulk data loading and single-rack HP Oracle Database Machine cannot bulk-load data faster than roughly 25 terabytes per hour. Oracle’s competitors rest on hopes that the actual bulk-loading capability of HP Oracle Database Machine is a small fraction of that. For the time being, it seems, that will remain the status quo.

I’ve lived prior lives wishing the competition was pidgeon-holed one way or the other.


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 741 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.