I’ve never met Mike Ault, but some friends of mine, who are fellow OakTable Network members, say he’s a great guy and I believe them. Mike works at Texas Memory Systems and I know some of those guys as well (Hi Woody, Jamon). Pleasantries aside, I have to call out some of the content Mike posted on a recent blog entry about HP Oracle Database Machine and Exadata Storage Server. Just because I blog about someone else’s posted information doesn’t mean I’m “out to get them.” Mike’s post made it clear I need to address a few things. Mike’s post was not vicious anti-Technical Marketing by any means, but it was riddled with inaccuracies that deserve correction.
Errata
While these first two errata I will point out may seem moot to many readers, I think accuracy is important if one intends to contrast one technology offering against another. After all, you won’t find me posting blog entries about Texas Memory Systems SSD being based on core memory or green Jell-O.
The first error I need to point out is that Mike refers to Oracle Exadata Storage Server cells as “block” or “blocks” 8 times in his post.
The second error I need to point out is rooted in the following quote from Mike’s blog entry:
These new storage and database devices offer up to 168 terabytes of raw storage with 368 gigabytes of caching and 64 main CPUs
That is partially true. With the SATA option, the HP Oracle Database Machine does offer 168TB of gross disk capacity. The error is in the “368 gigabytes of cache” bit. The HP Oracle Database machine does indeed come with 8 Real Application Clusters hosts in the Database grid configured with 32GB RAM and 14 Exadata Storage Server cells with 8GB each. However, it is entirely erroneous to suggest that the entirety of physical memory across both the Database grid and Storage grid somehow work in unison as “cache.” It’s not that the gross 368GB (8×32 + 14×8) isn’t usable as cache. It’s more the fact that none of it is used as user-data cache–at least not cache that somehow helps out with DW/BI workloads. The notion that it makes sense to put 368GB of cache in front of, say, a 10TB table scan, and somehow boost DW/BI query performance, is the madness that Exadata aims to put to rest. Here’s a rule:
If you can’t cache the entirety of a dataset you are scanning, don’t cache at all.
– Kevin Closson
Cache, Gas or a Full Glass. Nobody Rides for Free.
No, we don’t use the 8x32GB physical memory in the Database grid as cache because cycling, say, the results of a 2TB table scan through 368GB aggregate cache would do nothing but impede performance. Caching costs, and if there are no cache hits there is no benefit. Anyone who claims to know Oracle would know that parallel query table scans do not pollute the shared cache of Oracle Database. A more imaginative, and correct, use for the 32GB RAM in each of the hosts in the Database grid would be for sorting, joins (hash, etc) and other such uses. Of course you don’t get the entire 32GB anyway as there is an OS and other overhead on the server. But what about the 8GB RAM on each Oracle Exadata Storage cell?
One of the main value propositions of Oracle Exadata Storage Server is the fact that lower-half query functionality has been offloaded to the cells (e.g., filtering, column projection, etc). Now, consider the fact that we can scan disks in the SAS-based Exadata Storage Server at the rate of 1GB/s. We attack the drives with 1MB physical reads and buffer the read results in a shared cache visible to all threads in the Oracle Exadata Storage Server software. To achieve 1GB/s with 1MB I/O requests requires 1000 physical I/Os per second. OK, now I’m sure all the fully-cached-conventional-array guys are going to point out that 1000 IOPS isn’t worth talking about, and I’d agree. Forget for the moment that 1GB/s is in fact very close to the limit of data transfer many mid-range storage arrays have to offer. No, I’m not trying to get you excited about the 1GB/s because if that isn’t enough you can add more. What I’m pointing out is the fact that the results of 1000 IOPS (each 1MB in size) must be buffered somewhere while the worker threads rip through the data blocks applying filtration and plucking out cited columns. That’s 125 1MB filtration and projection operations per second per processor core. There is a lot going on and we need ample buffering space to do the offload processing.
Mike then moved on to make the following statement:
The Oracle Database Machine was actually designed for large data warehouses but Larry assured us we could use it for OLTP applications as well. Performance improvements of 10X to 50X if you move your application to the Database Machine are promised.
I’m not going to write guarantees, but no matter, that statement only lead in to the following:
This dramatic improvement over existing data warehouse systems is provided through placing an Oracle provided parallel processing engine on each Exadata building block so instead of passing data blocks, results are returned. How the latency of the drives is being defeated wasn’t fully explained.
Exadata Storage Server Software == Oracle Parallel Query
Folks, the Storage Server software running in the Exadata Storage Server cell is indeed parallel software and threaded, however, it is not entirely correct to state that there is a “parallel processing engine” that returns “results” from Exadata cells. More correctly, we offload scans (a.k.a. Smart Scan) to Exadata cells. Smart Scan technology embodies I/O, filtration, column projection and rudimentary joins. Insinuating otherwise makes Exadata out to be more of a database engine than intelligent storage and there is more than a subtle difference between the two concepts. So, no, “results” aren’t returned, filtered rows and projected columns are. That is not a nit-pick.
DW/BI and I/O Latency
Mike finished that paragraph with the comment about how Oracle Exadata Storage Server “defeats” (or doesn’t) drive latency. I’ll simply point out that drive latency is not an issue with DW/BI workloads. The problem (addressed by Exadata) is the fact that attaching just a few modern hard drives to a conventional storage array leaves you with a throughput bottleneck. Exadata doesn’t do anything for drive latency because, shucks, the disks are still round, brown spinning thingies. Exadata does, however, make a balanced offering that that doesn’t bottleneck the drives.
Mike continued with the following observation:
So in a full configuration you are on the tab for a 64 CPU Oracle and RAC license and 112 Oracle parallel query licenses
Yes, there are 64 processor cores in the Database grid component of the HP Oracle Database Machine, but Mike mentioning the 112 processor cores in the Exadata Storage Server grid is clearly indicative of the rampant misconception that Exadata Storage Server software is either some, most, or all of an Oracle Parallel Query instance. People who have not done their reading quickly jump to this conclusion and it is entirely false. So, mentioning the 112 Exadata Storage Server grid processors and “Oracle parallel query licenses” in the same breath is simple ignorance.
Mike continues with the following assertion:
Targeting the product to OLTP environments is just sloppy marketing as the system will not offer the latency needed in real OLTP transaction intensive shops.
While Larry Ellison and other important people have stated that Exadata fits in OLTP environments as well as DW/BI, I wouldn’t say it has been marketed that way, and certainly not sloppily. Until you folks see our specific OLTP numbers and value propositions I wouldn’t set out to craft any positioning pieces. Let me just say the following about OLTP.
OLTP Needs Huge Storage Cache, Right?
OLTP is I/O latency sensitive, but mostly for writes. Oracle offers a primary cache in the Oracle System Global Area disk buffer cache. Applications generaly don’t miss SGA blocks and immediately re-read them at a rate that requires sub-millisecond service times. Hot blocks don’t age out of the cache. Oracle SGA cache misses generally access wildly random locations, or result in scanning disk. So, for storage cache to offer read benefit it must cover a reasonable amount of the wildly, randomly accessed blocks. The SGA and intelligent storage arrays share a common characteristic: the same access patterns that blow out the SGA cache also blow out storage array cache. After all, architecturally speaking, the storage array cache serves as a second-level cache behind the SGA. If it is the same size as the SGA it is pretty worthless. If it is, say, 10 times the size of the SGA but only 1/50th the size of the database it is also pretty useless-with the exception of those situations when people use storage array cache to make up for the fact that they are using, say, 1/10th the number of drives they actually need. Under-provisioning spindles is not good but that is an entirely different topic.
I know there are SAN array caches in the terabyte range and Mike speaks of multi-terabyte FLASH SSD disk farms. I suppose these are options-for a very select few.
Most Oracle OLTP deployments will do just fine running against non-bottlenecked storage with a reasonable amount of write-cache. Putting aside the idea of an entirely FLASH SSD deployment for a moment, the argument about storage cache helping OLTP boils down to what percentage of the SGA cache misses can be satisfied in the storage array cache and what overall performance increase that yields.
The Eye of a Needle
Recently, I was looking at the specification sheet for a freshly released mid-range Fibre Channel SAN storage array that supports up to 960 disk drives plumbed through a two-headed controller. The specification sheet shows a maximum of 16GB cache per storage processor (up to two of them). I should think the cache is mirrored to accommodate storage processor failure-maybe it isn’t, I don’t know. If it is mirrored, let’s pretend for a moment that mirroring storage processor cache is free even with modify-intensive workloads (subliminal man says it isn’t). Given this example, I have to ask who thinks 16GB of storage array in front of hundreds of drives offers any performance increase? It doesn’t, so let’s put to rest the OLTP storage cache benefit argument.
But Mike Wasn’t Talking About Storage Array Cache
Right, Mike wasn’t talking about storage array cache benefit in an OLTP environment, but he was talking about nosebleed IOP rates from FLASH SSD. When referring to Exadata, Mike stated (quote):
What might be an alternative? Well, how about keeping your existing hardware, keep your existing licenses, and just purchase solid state disks to supplement your existing technology stack? For that same amount of money you will shortly be able to get the same usable capacity of Texas Memory Systems RamSan devices. By my estimates that will give you 600,000 IOPS, 9 GB/sec bandwidth (using fibre Fibre Channel , more withor Infiniband), 48 terabytes of non-volatile flash storage[S1] , 384 GB of DDR cache and a speed up of 10-50X depending on the query (based on tests against the TPCH data set using disks and the equivalent Ram-San SSD configuration).
OK, there is a lot to dissect in that paragraph. First there is the attractive sounding 600,000 IOPS with sub-millisecond response time. But wait, Mike suggests keeping your existing hardware. Folks, if you have existing hardware that is capable of driving OLTP I/O at the rate of 600,000 IOPS I want to shake your hand. Oracle OLTP doesn’t just issue I/O. It performs transactions that hammer the SGA cache and suffer some cache misses (logical to physical I/O ratio). The CPU cost wrapped around the physical I/O is not trivial. Indeed, the idea is to drive up CPU utilization and reduce physical I/O through schema design and proper SGA caching. Those of you who are current Oracle practitioners are invited to analyze your current production OLTP workload and assess the CPU utilization associated with your demonstrated physical I/O rate. If you have an OLTP workload that is doing more than, say, 5000 IOPS (physical) per processor core and you are not 100% processor-bound, tell us about it.
Yes, there are tricked out transactional benchmarks that shave off real-world features and code path and hammer out as much as 10,000 IOPS per processor core (on very powerful CPUs), but that is not your workload, or anyone else’s workload that reads this blog. So, if real OLTP saturates CPU at, say, 5000 IOPS I have to wonder what your “existing hardware” would look like if it were also able to take advantage of 600,000 IOPS. That would be a very formidable Database grid with something like 120 CPUs. Remember, Mike was talking about using existing hardware to take advantage of SSD instead of Exadata. If you have a 120 CPU Database grid, I suspect it is so critical that you wouldn’t be migrating it to anything. It is simply too critical to mess with. I should hope. Oh, it’s actually more like about 2000 IOPS per processor core in real life anyway, but that that doesn’t change the point much. And Exadata isn’t really about OLTP.
Let’s focus more intently on Mike’s supposition that an alternative to Exadata is “keeping your existing hardware” and feeding it 9GB/s from SSD. OK, first, that is 36% less I/O bandwidth than a single HP Oracle Database Machine can do, but let’s think about this for a moment. The Fibre Channel plumbing required for the Database grid to ingest 9GB/s is 23 active 4GFC FC HBAs at max theoretical throughput. That’s a lot of HBAs, and you need systems to plug them into. Remember, this is your “existing system.”
How much CPU does your “existing hardware” require to drive the 23 FC HBAs? Well, it takes a lot. Yes, I know you can use just a blip of CPU to mindlessly issue I/O in such a fashion as Orion or some pure I/O subsystem invigoration like dd if=/dev/zero of=/dev/sda bs=1024k, but we are talking about DW/BI and Oracle. Oracle actually does stuff with the data returned from an I/O call. With non-Exadata storage, the CPU cost associated with I/O (e.g., issuing, reaping), filtration, projection, joining, sorting, aggregation, etc is paid by the Database grid. So your “existing system” has to be powerful enough to do the entirety of SQL processing at a rate of 9GB/s. Let’s pretend for a moment that there existed on the market a 4-socket server that could accommodate 23 FC HBAs. Does anyone think for a moment that the 4 processors (perhaps 8 or 16 cores) can actually do anything reasonable with 9GB/s I/O bandwidth? A general rule is to associate approximately 4 processor cores with each 4GFC HBA (purposefully ignoring trick benchmark configurations). I think it looks like “your existing system” has about 96 processor cores.
A Chump Challenge
I’d put a HP Oracle Database Machine (64-core/14 cell) up against a 96-core/9GBPS FLASH SSD system any day of the week. I’d even give them 128 Database tier CPUs and not worry.
People keep forgetting that scans are offloaded to Exadata with the HP Oracle Database Machine. People shouldn’t craft their position pieces against Exadata by starting at the storage-regardless of the storage speeds and feeds.
It will always take more Database grid horsepower, in a non-Exadata environment, to drive the same scan rates offered by the HP Oracle Database Machine.
FLASH SSD
Did I mention that there is nothing (technically) preventing us from configuring Exadata Storage Server with 3.5″ FLASH SSD drives? Better late than never, but it isn’t really worth mentioning at this time.
Recent Comments