Staging Data For ETL/ELT? Flat Files Appear Magically! No, Load Time Starts With Transfer Time.

In my recent post entitled Something to Ponder? What Sort of Powerful Offering Could a Filesystem in Userspace Be?, I threw out what may have seemed to be a totally hypothetical Teaser Post™. However, as our HP Oracle Exadata Storage Server and HP Oracle Database Machine customers know based on their latest software upgrade, a FUSE-based Oracle-backed file system is a reality. It is called Oracle Database File System (DBFS). DBFS is one of the corner stones of data loading infrastructure in the HP Oracle Database Machine environment. For the time being it is a staging area for flat files to be accessed as external tables. The “back-end”, as it were, is not totally new. See, DBFS is built upon Oracle SecureFiles. FUSE is the presentation layer that makes for mountable file systems. Mixing FUSE with DBFS results in a distributed, coherent file system scalable due to Real Application Clusters. This is a file system that is completely managed by Database Administrators.

So, I’m sure some folks’ eyes are rolling back in their head wondering why we need YAFS (Yet Another File System). Well, as time progresses I think Oracle enthusiasts will come to see just how feature rich something like DBFS really is.

If it performs, is feature rich and incremental to existing technology, it sounds awfully good to me!

I’ll be discussing DBFS in Oracle Exadata Technical Deep Dive – Part IV session tomorrow.

Here is a quick snippet of DBFS in action. In the following box you’ll see a DBFS mount of type FUSE on /data and the listing of a file called all_card_trans.ul

$ mount | grep fuse
dbfs on /data type fuse (rw,nosuid,nodev,max_read=1048576,default_permissions,allow_other,user=oracle)
$ pwd
$ ls -l all_card_trans.ul
-rw-r--r-- 1 oracle dba 34034910300 Jun 15 15:30 all_card_trans.ul

In the next box you’ll see ssh jumping to 4 of the servers in an HP Oracle Database Machine to list the contents of the DBFS file system and md5sum output to validate that it is the same file.

$ for n in r1 r2 r3 r4
> do
> ssh $n md5sum `pwd`/all_card_trans.ul &
> done
[5] 3943
[6] 3945
[7] 3946
[8] 3947
$ 1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
[5]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul
[6]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul
[7]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul
[8]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul

In the next box you’ll see concurrent multi-node throughput. I’ll use one dd process on each of 4 servers in the HP Oracle Database Machine each sequentially reading the contents of the same DBFS-based file and achieving 876 MB/s aggregate throughput. And, no, there is no cache involved.

$ for n in r1 r2 r3 r4; do ssh $n time dd if=`pwd`/all_card_trans.ul of=/dev/null bs=1M &; done
[5] 13325
[6] 13326
[7] 13327
[8] 13328
$ 32458+1 records in
32458+1 records
34034910300 bytes (34 GB) copied, 154.117 seconds, 221 MB/s

real    2m34.127s
user    0m0.014s
sys     0m3.073s
32458+1 records in
32458+1 records out
34034910300 bytes (34 GB) copied, 155.113 seconds, 219 MB/s

real    2m35.123s
user    0m0.020s
sys     0m3.127s
32458+1 records in
32458+1 records out
34034910300 bytes (34 GB) copied, 155.813 seconds, 218 MB/s

real    2m35.821s
user    0m0.026s
sys     0m3.210s
32458+1 records in
32458+1 records out
34034910300 bytes (34 GB) copied, 155.89 seconds, 218 MB/s

real    2m35.901s
user    0m0.017s
sys     0m3.039s

With Exadata in mind, the idea is to offer a comprehensive solution for data warehousing. All too often I see data loading claims that start with the flat files sort of magically appearing ready to be loaded. Oh no, we don’t think that way. The data is outside on a provider system somewhere and has to be staged in advance of ETL/ELT. Since DBFS exploits the insane bandwidth of Exadata, it is an extremely good data staging solution. The data has to be rapidly ingested into the staging area and then rapidly loaded. A bottleneck on either part of that equation will be your weakest link.

Just think, no external systems required for data staging. No additional storage connectivity, administration, tuning, etc.

And, yes, it can do more than a single dd process on each node! Much more.

Exciting stuff.

8 Responses to “Staging Data For ETL/ELT? Flat Files Appear Magically! No, Load Time Starts With Transfer Time.”

  1. 1 Ofir June 18, 2009 at 7:47 am

    wow! that is very exciting news… linux clustered file system on top of SecureFiles… that could be an extremely powerful tool. I can actually think of several projects I’ve seen that could have improved their architecture with such magic. We sure live in interesting times 🙂
    Too bad the web cast will be in nine in the evening here – I have previous engagements…

    • 2 kevinclosson June 18, 2009 at 2:45 pm


      Have you noticed I don’t call it a “linux clustered file system?” It is distributed, scalable and fully coherent in a clustered environment, but I’m not calling it a CFS.

      • 3 robin chatterjee June 23, 2012 at 8:02 pm

        Sorry to post on what is likely to be a somewhat dead blog entry but we recently were starting research into using dbfs for a shared application tier for EBS. We came across this blog entry a =nd while I appreciate there are a lot of limitations with regards to dbfs ( scripts etc) do you forsee dbfs morphing into a full fledged CFS anytime soon… or will that be up to ACFS ?
        Downside of ACFS seems to be that its specifically for the database tier only whereas DBFS can be mounted on any linux box that can make a client connection to the database.

        • 4 kevinclosson June 24, 2012 at 8:36 am

          I don’t know how “full fledged” or, if you will, how general-purpose DBFS might evolve into but I do think DBFS can prove very helpful in certain use cases.

          It seems to me that DBFS adoption has sunk to the point of a single use case: staging for data loading on Exadata. DBFS is a generic Oracle Database 11g feature and until Oracle (perhaps) plays the “discover Oracle HW or disable” trick (such as in the Hybrid Columnar Compression case), folks should put it to use where it can solve problems.

          I’m quite fond of DBFS really…just so long as it doesn’t start showing up on my radar billed “The World’s First” or the “World’s Best” or “First for The Cloud” or “10x, 100x this or that.” DBFS is a tool, and a useful one at that, but I can’t predict how far it will evolve. As of my last experience with DBFS it was quite challenged by use cases that involve large(ish) numbers of files. Oracle may have improved the DBFS in that regard though. The following of my webcasts on that matter might prove of interest:

          “Using DBFS on Exadata. This tutorial covers the installation and implementation best practices for using DBFS with Oracle Exadata.”

  2. 5 Peter June 18, 2009 at 10:34 am


    You said:
    “…,as our HP Oracle Exadata Storage Server and HP Oracle Database Machine customers know based on their latest software upgrade, a FUSE-based Oracle-backed file system is a reality.”

    Upgrade of what software? Exadata?

    Thanks for the clarification


  3. 7 Philip Papadopoulos June 18, 2009 at 1:36 pm

    Absolutely fantastic! Just get buy-in from the storage admins, the server admins, and those DW types that think in-side the box and like slow by slow.

    • 8 kevinclosson June 18, 2009 at 2:43 pm


      I don’t understand the “slow by slow” part of your comment. As for buy-in, there should be none required since the storage admins don’t have to lift a finger when Exadata is involved and server admins aren’t involved with DBFS either. After installation, DBFS is fully administered by the DBA. That includes space provisioning, file system creation/deletion, mounting/unmounting, etc.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,890 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories


All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: