In my recent post entitled Sun Oracle Database Machine: The Million Oracle Database IOPS Machine or Marketing Hype? Part I, I started a discussion about why this million IOPS-capable platform is interesting to Oracle deployments that don’t require quite (or nowhere near) that much I/O. Since I, more or less, dismissed the idea that there are but a handful of applications in production that require 1 million IOPS it may seem on the surface as though I am in disagreement with my friend Glenn Fawcett’s post regarding the physical drive savings with Sun Oracle Database Machine. Glenn writes:
Consider for a moment the number of drives necessary to match the 1 million IOPS available in the database machine. Assuming you are using the best 15,000 rpm drive, you would be able to do 250 IOPS/drive. So, to get to 1 million IOPS, you would need 4,000 drives! A highly dense 42U storage rack can house any where from 300-400 drives. So, you would need 10 racks, just for the storage and at least one rack for servers.
I agree with Glenn’s math in that post, in so far as the fact that Sun Oracle Database Machine can match the read IOPS of the 4,000 drives Glenn speaks of. However, I still hold fast that any production site with 4,000 spindles provisioned specifically to meet an IOPS requirement of a single production database is a rarity. So, does that mean I am dismissing the value of Sun Oracle Database Machine? No.
Consolidation, Again
Both Glenn and I have spoken about applying Sun Oracle Database Machine for database consolidation for good reason. I think it would be difficult to win CIO mind share with a million IOPS value proposition when the same CIO is probably quite aware that his entire enterprise data center IOPS load comes nowhere near that amount. Let me see if I understand enough about Sun Oracle Database Machine, and what goes on in real life production data centers, well enough to help make some sense of what may appear to be an overly capable platform.
Consolidating Chaos Into…Chaos?
The value proposition supporting database consolidation centers on reducing chaos. It all comes down to breaking the relationship of 1 Operating System per Oracle Database. Consider, for example, 32 hypothetical Oracle Database instances deployed in the standard 1:1 (OS:DB) deployment model (let’s leave high availability out of the equation for a moment). These hypothetical database instances would require 32 systems to maintain. What if the systems are, say, 3 years old and each consist of 2 socket, multi-core processors of that time period. It is quite likely that the same 32 databases could be deployed in a single Sun Oracle Database Machine and experience significant performance improvement. What does this have to do with chaos, IOPS and FLASH you ask?
Real Grid Infrastructure Makes For Good Consolidation
In my view, consolidating those hypothetical 32 instances into a database grid of 8 hosts (all Real Application Clusters-ready) would reduce a lot of chaos because you wouldn’t have 32 pools of storage to manage.
Disk space in the Sun Oracle Database Machine can simply be provisioned from one easily managed storage pool (Automatic Storage Management disk group). That seems a lot simpler to me as does maintaining 24 fewer OS images. Other consolidation considerations include the ability to run both RAC and non-RAC instances in the Database Machine. This differs from provisioning discrete systems some of which being RAC-ready (e.g., provisioned shared storage and interconnects configured, etc) and others not. With the Database Machine it is a simple task to switch a database from non-RAC to RAC because all the requirements are in place. Another thing to consider about consolidation of this type is the fact that any of the databases can run on any of the hosts. The hosts serve as a true grid of resources. I know folks speak of grid computing often, but the 32 servers with their 32 pools of storage really don’t fit the definition of a grid any more than would 32 2-node RAC clusters each with their own pool of storage. Once the storage is central and shared, and all hosts interconnected, you have a grid. But what does that have to do with IOPS and FLASH you ask?
“I Don’t Need One Million IOPS. I Need What I Need When I Need It, But Usually Don’t Get It Anyway”
Let’s say you’re a lot like the hypothetical 32-host, 32-database DBA. Isn’t it quite a task keeping up with which of those databases demand, say, 2,000 IOPS per processor core (e.g., 8,000 for 4-core a server) and which ones are more compute intensive demanding only 200 IOPS per processor core? So, what do you do? Do you argue your case for pools of “common denominator storage” that can handle the 8,000 IOPS case or just suffer through with something in the middle? How much waste does that lead to? How poorly under-provisioned are your heavy I/O database instances?
Consolidating databases into a Sun Oracle Database Machine allows IOPS-hungry applications to scale up to 8 nodes and 1 million read IOPS with RAC. Conversely, there are 8 125,000 IOPS-capable (approximate) units to be provisioned according to multiple database needs. For instance, several compute-light but IOPS-intensive databases could likely be hosted in a single database server in the Database Machine since there is a demonstrated 125,000 IOPS worth of bandwidth available to each host. That’s over 15,000 per processor core. Quick, run off to your repository of AWR reports and see how many of your databases demand 15,000+ IOPS per processor core! Now, while I have you thinking about AWR reports, do make sure to ascertain your read:write ratio. As I pointed out in Part I of this series the datasheets are quite clear on how the Sun Oracle Database Machine can service 1,000,000 read IOPS but is limited to 50,000 gross write IOPS in a full-rack configuration. The Exadata Smart Flash Cache adds no value to writes. Also, from that 50,000 write IOPS comes the overhead of mirrored writes so a full-rack Sun Oracle Database Machine has the capacity to service 25,000 writes per second a ratio of (40:1).
IOPS, SchmIOPS. I Care About Latency!
Let’s look at this from a different angle. What if you have a database that doesn’t require extreme read IOPS but requires very low latency reads served from a data set of, say, 1 TB. Imagine further that this latency-sensitive database isn’t the only database you have. Imagine that! Today’s DBA managing more than a single Oracle Database! Well, your world today is full of difficulty. Today you have pools of storage fed to you from the “Storage Group” which may or may not even satisfy the requirements of your run-of-the-mill databases more less this hypothetical 1 TB latency-sensitive database. What to do? Round it all up and consolidate the whole set of databases into a Database Machine. The lower-class “citizens” can be stacked together inside one or a few database hosts and their I/O demand controlled with Exadata I/O Resource Management (IORM). The latency-sensitive 1 TB database (be it single instance or RAC), on the other hand, can operate entirely out of FLASH because the architecture of Sun Oracle Database Machine is such that the entire aggregate FLASH capacity is available to all databases in the grid. That is, databases don’t have to be scaled with RAC to have access to all FLASH capacity. So, that latency-sensitive database can even grow to 5 TB and still be covered with FLASH and further, it can grow to become more processor-intensive as well and scale with RAC to multiple instances without procuring a new cluster.
As and aside, it is nearly impossible to control the I/O demand of hosted databases in any other consolidation scheme since there would be is no IORM. In those other schemes you can certainly control the amount of CPU and memory a database is allocated, but it doesn’t take much CPU, or memory, to put significant strain on the central I/O subsystem if things go awry (e.g., run away queries, application server flooding, etc). If you don’t know what I’m talking about simply examine how little CPU something like Orion requires while obliterating an I/O subsystem.
Fixed FLASH Assets Fixes All? Far-Fetched!
There are storage vendors out there stating that you can reach parity with the Database Machine by simply plugging in one of their FLASH devices. I won’t argue that it is possible to work out the system and storage infrastructure necessary for a million IOPS. That rate is feasible even with Fibre Channel, although it would require some 20 active 4GFC paths (given an Oracle data block of 8 KB) from some number of hosts.
No, I’m not as excited about the IOPS capability of the Database Machine as much as the FLASH storage-provisioning model it offers. In spite of all the hype, the Database Machine IOPS story is every bit as much a story of elegance as it is brute force. Allow me to explain.
If you want to apply the power of FLASH with most conventional storage offerings you have to use FLASH as ordinary disks. That requires, for availability sake, mirroring. Further, you have to figure out which portions of your database contain the high-I/O rate objects and commit them permanently to FLASH. No big deal, I suppose, if you choose a 100% FLASH storage approach. I think doing so is unlikely the case though. What most databases have are hot spots and those must go into FLASH while other parts of the database remain on spinning disk. Well, there is a problem with that. Hot spots move. So now you find yourself shuffling portions of your database in and out of FLASH disks—manually. That’s messy at best. What if you have databases that only occasionally go critical? Do you provision to them permanent, mirrored FLASH disk as well?
The FLASH cache in Sun Oracle Database Machine is dynamic. Data flows through the cache where hot stuff stays and cold stuff leaves (based on capacity). What does this have to do with consolidation? Well, some of your databases can be IOPS hungry, others not, and those personalities can shift without notice. That sort of dynamism is a real headache with conventional systems. With the Database Machine, on the other hand, the problem sort of solves itself.
There is a reason I use the phrase “Million IOPS Capable.” I think the term “Million IOPS Machine” presumes too much while insinuating far too little.
That’s my opinion.
Kevin,
Do you know or have you heard if the SATA version of Exadata is orderable? They tell us up here in Canada that it is not, and may never be.
I have not heard about when but I have also certainly not heard “never.” I’d ask your Oracle sales representative. Sales should know!
Have configured my first Exadata V2 machine today with the Oracle folks… The I/O throughput seems awesome… Only complaint so far is that creating a DBFS filesystem takes forever
Hi Fairlie,
Creating a DBFS file system is ordinarily a near-instantaneous operation. We should take that up through official support channels, ok?
Thanks much.. Will do..
Hi Kevin,
Just a doubt. Is the Oracle Exadata storage machine sold separately ?
Now I never came across this information , however int he link below Bill mentions it to be so.
http://forums.oracle.com/forums/thread.jspa?threadID=984157&tstart=0
Thanks,
Waseem.
Kevin,
What are the chances of Exadata V2 on Solaris ?
W.
Waseem,
You must surely know I can’t answer that!
Hi Guys,
You must have heard about EMC FAST (Fully Automated Storage Tiering) technology. It is sold as a lincense and you need some SSD/EFD drives in your storage array.
http://www.emc.com/about/glossary/fast-cache.htm
I can’t say wheather this is fully comparable with what’s included with the Exadata, but I wouldn’t be so sure that there is nothing on the market that can substitute Database Smart Flash Cache (PCI Flash Card).
And now the question: why Oracle supports the DSFC only on Solaris and OEL?
Regards,
Oczkov,
Yes, I am aware of FAST and certainly know about all the other technologies you mention. I cannot speak for Oracle Corporation nor their reasoning behind what platforms support what. I’m sorry.
Kevin,
The Exadata X2-2 datasheet states that a Quarter Rack can perform 375K IOPS from Flash Cache. About 125K IOPS for each Cell Storage, I presume. On the other hand the doc.:
Click to access f20-data-sheet-403555.pdf
states that the card can perform 101K IOPS (4k block) and the following blog
https://blogs.oracle.com/mrbenchmark/entry/inside_the_sun_oracle_database
affirms 75K (8k block).
This manner each Cell Storage with four cards “should” perform about 300K IOPS (considering 4 flash cards per Cell Storage).
I am confused, how many reads can each card perform, really? Of course considering an ideal senario and at peak times.
How many blocks does each card can perform in parallel? I presume 16 blocks per card, am I right?
Thanks in advance
Hello @jlskrock
Your analysis of Oracle’s Exadata and F20 data sheets affirms that the flash cards are capable of much more than Exadata is able to drive. All systems have bottlenecks. Separating PCI flash from the host CPUs is in my assessment silly…especially given how many IPC points there are along the way.