In my recent post entitled Something to Ponder? What Sort of Powerful Offering Could a Filesystem in Userspace Be?, I threw out what may have seemed to be a totally hypothetical Teaser Post™. However, as our HP Oracle Exadata Storage Server and HP Oracle Database Machine customers know based on their latest software upgrade, a FUSE-based Oracle-backed file system is a reality. It is called Oracle Database File System (DBFS). DBFS is one of the corner stones of data loading infrastructure in the HP Oracle Database Machine environment. For the time being it is a staging area for flat files to be accessed as external tables. The “back-end”, as it were, is not totally new. See, DBFS is built upon Oracle SecureFiles. FUSE is the presentation layer that makes for mountable file systems. Mixing FUSE with DBFS results in a distributed, coherent file system scalable due to Real Application Clusters. This is a file system that is completely managed by Database Administrators.
So, I’m sure some folks’ eyes are rolling back in their head wondering why we need YAFS (Yet Another File System). Well, as time progresses I think Oracle enthusiasts will come to see just how feature rich something like DBFS really is.
If it performs, is feature rich and incremental to existing technology, it sounds awfully good to me!
I’ll be discussing DBFS in Oracle Exadata Technical Deep Dive – Part IV session tomorrow.
Here is a quick snippet of DBFS in action. In the following box you’ll see a DBFS mount of type FUSE on /data and the listing of a file called all_card_trans.ul
$ mount | grep fuse dbfs on /data type fuse (rw,nosuid,nodev,max_read=1048576,default_permissions,allow_other,user=oracle) $ pwd /data/FS1/stage1 $ ls -l all_card_trans.ul -rw-r--r-- 1 oracle dba 34034910300 Jun 15 15:30 all_card_trans.ul
In the next box you’ll see ssh jumping to 4 of the servers in an HP Oracle Database Machine to list the contents of the DBFS file system and md5sum output to validate that it is the same file.
$ for n in r1 r2 r3 r4 > do > ssh $n md5sum `pwd`/all_card_trans.ul & > done [5] 3943 [6] 3945 [7] 3946 [8] 3947 $ 1adbff1a36a42253c453c22dd031b48b /data/FS1/stage1/all_card_trans.ul 1adbff1a36a42253c453c22dd031b48b /data/FS1/stage1/all_card_trans.ul 1adbff1a36a42253c453c22dd031b48b /data/FS1/stage1/all_card_trans.ul 1adbff1a36a42253c453c22dd031b48b /data/FS1/stage1/all_card_trans.ul [5] Done ssh $n md5sum `pwd`/all_card_trans.ul [6] Done ssh $n md5sum `pwd`/all_card_trans.ul [7] Done ssh $n md5sum `pwd`/all_card_trans.ul [8] Done ssh $n md5sum `pwd`/all_card_trans.ul
In the next box you’ll see concurrent multi-node throughput. I’ll use one dd process on each of 4 servers in the HP Oracle Database Machine each sequentially reading the contents of the same DBFS-based file and achieving 876 MB/s aggregate throughput. And, no, there is no cache involved.
$ for n in r1 r2 r3 r4; do ssh $n time dd if=`pwd`/all_card_trans.ul of=/dev/null bs=1M &; done [5] 13325 [6] 13326 [7] 13327 [8] 13328 $ 32458+1 records in 32458+1 records 34034910300 bytes (34 GB) copied, 154.117 seconds, 221 MB/s real 2m34.127s user 0m0.014s sys 0m3.073s 32458+1 records in 32458+1 records out 34034910300 bytes (34 GB) copied, 155.113 seconds, 219 MB/s real 2m35.123s user 0m0.020s sys 0m3.127s 32458+1 records in 32458+1 records out 34034910300 bytes (34 GB) copied, 155.813 seconds, 218 MB/s real 2m35.821s user 0m0.026s sys 0m3.210s 32458+1 records in 32458+1 records out 34034910300 bytes (34 GB) copied, 155.89 seconds, 218 MB/s real 2m35.901s user 0m0.017s sys 0m3.039s
With Exadata in mind, the idea is to offer a comprehensive solution for data warehousing. All too often I see data loading claims that start with the flat files sort of magically appearing ready to be loaded. Oh no, we don’t think that way. The data is outside on a provider system somewhere and has to be staged in advance of ETL/ELT. Since DBFS exploits the insane bandwidth of Exadata, it is an extremely good data staging solution. The data has to be rapidly ingested into the staging area and then rapidly loaded. A bottleneck on either part of that equation will be your weakest link.
Just think, no external systems required for data staging. No additional storage connectivity, administration, tuning, etc.
And, yes, it can do more than a single dd process on each node! Much more.
Exciting stuff.
wow! that is very exciting news… linux clustered file system on top of SecureFiles… that could be an extremely powerful tool. I can actually think of several projects I’ve seen that could have improved their architecture with such magic. We sure live in interesting times 🙂
Too bad the web cast will be in nine in the evening here – I have previous engagements…
Ofir,
Have you noticed I don’t call it a “linux clustered file system?” It is distributed, scalable and fully coherent in a clustered environment, but I’m not calling it a CFS.
Sorry to post on what is likely to be a somewhat dead blog entry but we recently were starting research into using dbfs for a shared application tier for EBS. We came across this blog entry a =nd while I appreciate there are a lot of limitations with regards to dbfs ( scripts etc) do you forsee dbfs morphing into a full fledged CFS anytime soon… or will that be up to ACFS ?
Downside of ACFS seems to be that its specifically for the database tier only whereas DBFS can be mounted on any linux box that can make a client connection to the database.
Thanks
I don’t know how “full fledged” or, if you will, how general-purpose DBFS might evolve into but I do think DBFS can prove very helpful in certain use cases.
It seems to me that DBFS adoption has sunk to the point of a single use case: staging for data loading on Exadata. DBFS is a generic Oracle Database 11g feature and until Oracle (perhaps) plays the “discover Oracle HW or disable” trick (such as in the Hybrid Columnar Compression case), folks should put it to use where it can solve problems.
I’m quite fond of DBFS really…just so long as it doesn’t start showing up on my radar billed “The World’s First” or the “World’s Best” or “First for The Cloud” or “10x, 100x this or that.” DBFS is a tool, and a useful one at that, but I can’t predict how far it will evolve. As of my last experience with DBFS it was quite challenged by use cases that involve large(ish) numbers of files. Oracle may have improved the DBFS in that regard though. The following of my webcasts on that matter might prove of interest:
“Using DBFS on Exadata. This tutorial covers the installation and implementation best practices for using DBFS with Oracle Exadata.”
http://apex.oracle.com/pls/apex/f?p=44785:24:0
Hi,
You said:
“…,as our HP Oracle Exadata Storage Server and HP Oracle Database Machine customers know based on their latest software upgrade, a FUSE-based Oracle-backed file system is a reality.”
Upgrade of what software? Exadata?
Thanks for the clarification
regds
/Peter
Yes, the latest Exadata distribution. Exadata has both storage and database grid software components.
Absolutely fantastic! Just get buy-in from the storage admins, the server admins, and those DW types that think in-side the box and like slow by slow.
Philip,
I don’t understand the “slow by slow” part of your comment. As for buy-in, there should be none required since the storage admins don’t have to lift a finger when Exadata is involved and server admins aren’t involved with DBFS either. After installation, DBFS is fully administered by the DBA. That includes space provisioning, file system creation/deletion, mounting/unmounting, etc.