My post entitled File Systems For A Database? Choose One That Couples Direct I/O and Concurrent I/O. What’s This Have To Do With NFS? Harken Back 5.2 Years To Find Out has not been an incredibly popular post by way of page views (averages about 10 per day for the last six months), but it has generated some email from readers asking about EXT4.
I’ve been putting off the topic but it is fresh on my mind.
Today I put out a quick tweet about concurrent writes on Ext4 (https://twitter.com/kevinclosson/status/177111985790525440) that started a small tweet-thread by others looking for clarification. This blog entry aims to clarify my point about concurrent writes on EXT4 compared to XFS. As an aside, if you have not read the above referenced blog post, and you are interested in concurrent writes and how the topic pertains to several file systems including NFS, I recommend you give it a read.
The topic at hand—EXT4 versus XFS—concurrent write handling is a very brief topic so this will be a brief blog post. Allow me to explain. The following really sums it up:
EXT4 does not support concurrent writes, XFS does.
So, in spite of the fact that the topic is brief, I’d like to expound upon the matter and offer some proof.
In the following you will see two proof cases—one EXT4 and the other XFS. The proof case is as follows:
- The previous file system is unmounted
- An XFS file system is created in my md(4) SW RAID LUN
- The XFS file system is mounted on /mnt/dsk
- A script called simple.sh is executed to prove the volume supports high-performance sequential writes by first initializing a test file through the direct I/O code path
- The simple.sh script then measures 196,608 64KB sequential writes to the test file. The file is opened without truncate so this is an operation that merely over-writes the file. The writes are performed with direct I/O.
- The simple.sh script then performs concurrent writes of the same file—again the writes are through the direct I/O code path and the file is not truncated. There are two dd(1) processes—one over-writes the first half of the file the other over-writes the second half of the file.
I’ll paste the silly little simple.sh script at the bottom of this post.
The measure of goodness is , of course, whether or not the two-process case is able to push more I/O in aggregate than the single writer case. You’ll see that with very large writes the LUN can sustain 3.7 GB/s with a single writer through the direct I/O code path on both XFS and EXT4 files. The concurrent versus single write test cases were conducted with 64KB writes. Again, with both file systems (XFS and EXT4) the single writer was able to push 1.4 GB/s. As the following shows, the XFS two-writer case scaled at 1.7x.
Now it’s time to move on to EXT4. Here you’ll see the same baseline of 3.7 GB/s when initializing the file and the familiar 1.4 GB/s for the single 64KB serial writer. That, however, is the extent of the similarities. The two-writer case on EXT4 sadly de-scales. The 2.4 GB/s seen in the XFS case f alls to aggregate of 1048 MB/s with two writers on EXT4.
The following is the simple.sh script:
#!/bin/bash myfile=$1 echo "Creating test file $myfile using direct I/O" dd if=/dev/zero of=$myfile bs=1024M count=12 oflag=direct sync;sync;sync;echo 3 > /proc/sys/vm/drop_caches echo "Single Direct I/O writer" ( dd if=/dev/zero of=$myfile bs=64K count=196608 conv=notrunc oflag=direct > thread1.out 2>&1 ) & wait cat thread1.out echo "Two Direct I/O writer" ( dd if=/dev/zero of=$myfile bs=64K count=98304 conv=notrunc oflag=direct > thread1.out 2>&1 ) & ( dd if=/dev/zero of=$myfile bs=64K count=98304 seek=98304 conv=notrunc oflag=direct > thread2.out 2>&1 ) & wait cat thread1.out thread2.out
Ok, so what about BTRFS is that good competitor for XFS ?
Best Regards.
GregG
Hello GregG,
I would put btrfs way ahead of ext4 for features and so forth. I fully admit I am biased towards XFS at this time. All those years of all those long-of-tooth SGI engineers’ time stack up to goodness.
I do not test btrfs so mum’s the words from my unfortunately. I have heard my friend Dave Chinner speak kind words about btrfs in the past…maybe he was dodging drop bears 🙂
BTRFS is (like WAFL and ZFS) copy-on-write. It is my understanding that this means every block of written data eventually ends up at a pseudo-random place on disk (i.e. fragmentation by design – which might work well for fileservers but less so for small-block-update OLTP databases). If you later do a sequential read (and I believe Oracle does a lot of short-sequential-“ish” I/O (better explanation needed 😉 then it might cause a lot of excess physical disk seeks, therefore unnecessarily heavy disk utilization (more than you would have on an FS with minor fragmentation). Until everybody is running on 100% flash disk sometime in the future, I think this will not improve performance (actually, the opposite…)
Not to mention the negative effects of physically moving datafile logical block locations to things like virtual provisioning, EMC FLASH cache, FAST-VP and the like…
Am I right or am I missing something?
Bart,
I don’t know…I don’t spend time scrutinizing BTRFS 😦
Nice article. Good validation points for our choice of XFS. Thank you SGI & Dave Chinner.
Dave Chinner is a ridiculously talented individual (and quite a good chuckle over beers I’ll add).
We all owe Dave and Red Hat a lot for their commitment. And, as you point out, to the heritage of the SGI code.
Kevin,
You got very strange results, because according http://www.mysqlperformanceblog.com/2012/03/15/ext4-vs-xfs-on-ssd/ XFS is definitely slower than ext4 on concurrent writes
My results are my results, their results are their results and Chinner pointed out a bug. I can’t go back to my tests and see if I can change my test in an attempt to hit that bug. The end result is the same: XFS is better. Bugs are bugs.
_currently_ XFS is only better for two-thread dd not for database
I can’t argue that. If I post up Oracle benchmark results that contradict your assertion would that be a OK?
http://martincarstenbach.wordpress.com/2014/10/31/interesting-observations-executing-slob2-with-ext4-and-xfs-on-ssd/
That XFS test reflects an XFS concurrent write regression which was fixed. See: http://oss.sgi.com/archives/xfs/2012-02/msg00307.html
can we mount both xfs and ext4 on the same RDBMS (11gR2) Linux?
I’m not sure what it means to mount XFS or Ext4 on a RDBMS. Can you rephrase the question perhaps?