BLOG UPDATE 21-NOV-2011: The comment thread for this post is extremely relevant.
I recently had an “exchange of ideas” with an individual. It was this individual’s assertion that modern systems exhibit memory latencies measured in microseconds.
Since I haven’t worked on a system with microsecond-memory since late in the last millennium I sort of let the conversation languish.
The topic of systems speeds and feeds was fresh on my mind from that conversation when I encountered something that motivated me to produce this installment in the Little Things Doth Crabby Make series.
This installment in the series has to do with disk scan throughput and file system fragmentation. But what does that have to do with modern systems’ memory latency? Well, I’ll try to explain.
Even though I haven’t had the displeasure of dealing with microsecond memory, this century, I do recall such ancient systems were routinely fed (and swamped) by just a few hundred megabytes per second disk scan throughput.
I try to keep things like that in perspective when I’m fretting over the loss of 126MB/s like I was the other day. Especially when the 126MB/s is a paltry 13% degradation in the systems I was analyzing! Modern systems are a modern marvel!
But what does any of that have to do with XFS and fragmentation? Please allow me to explain. I had a bit of testing going where 13% (for 126MB/s) did make me crabby (it’s Little Things Doth Crabby Make after all).
The synopsis of the test, and thus the central topic of this post, was:
- Create and initialize a 32GB file whilst the server is otherwise idle
- Flush the Linux page cache
- Use dd(1) to scan the file with 64KB reads — measure performance
- Use xfs_bmap(8) to report on file extent allocation and fragmentation
Step number 1 in the test varied the file creation/initialization method between the following three techniques/tools:
- xfs_mkfile(8)
- dd(1) with 1GB writes (yes, this works if you have sufficient memory)
- dd(1) with 64KB writes
The following screen-scrape shows that the xfs_mkfile(8) case rendered a file that delivered scan performance significantly worse than the two dd(1) cases. The degradation was 13%:
# xfs_mkfile 32g testfile # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches # dd if=testfile of=/dev/null bs=64k 524288+0 records in 524288+0 records out 34359738368 bytes (34 GB) copied, 40.8091 seconds, 842 MB/s # xfs_bmap -v testfile > frag.xfs_mkfile.out 2>&1 # rm -f testfile # dd if=/dev/zero of=testfile bs=1024M count=32 32+0 records in 32+0 records out 34359738368 bytes (34 GB) copied, 22.1434 seconds, 1.6 GB/s # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches # dd if=testfile of=/dev/null bs=64k 524288+0 records in 524288+0 records out 34359738368 bytes (34 GB) copied, 35.5057 seconds, 968 MB/s # xfs_bmap -v testfile > frag.ddLargeWrites.out 2>&1 # rm testfile # df -h . Filesystem Size Used Avail Use% Mounted on /dev/sdb 2.7T 373G 2.4T 14% /data1 # dd if=/dev/zero of=testfile bs=1M count=32678 32678+0 records in 32678+0 records out 34265366528 bytes (34 GB) copied, 21.6339 seconds, 1.6 GB/s # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches # dd if=testfile of=/dev/null bs=64k 522848+0 records in 522848+0 records out 34265366528 bytes (34 GB) copied, 35.3932 seconds, 968 MB/s # xfs_bmap -v testfile > frag.ddSmallWrites.out 2>&1
I was surprised by the xfs_mkfile(8) case. Let’s take a look at the xfs_bmap(8) output.
First, the two maps from the dd(1) files:
# cat frag.ddSmallWrites.out testfile: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..9961471]: 1245119816..1255081287 6 (166187576..176149047) 9961472 1: [9961472..26705919]: 1342791800..1359536247 7 (84037520..100781967) 16744448 2: [26705920..43450367]: 1480316192..1497060639 8 (41739872..58484319) 16744448 3: [43450368..66924543]: 1509826928..1533301103 8 (71250608..94724783) 23474176 # # cat frag.ddLargeWrites.out testfile: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..9928703]: 1245119816..1255048519 6 (166187576..176116279) 9928704 1: [9928704..26673151]: 1342791800..1359536247 7 (84037520..100781967) 16744448 2: [26673152..43417599]: 1480316192..1497060639 8 (41739872..58484319) 16744448 3: [43417600..67108863]: 1509826928..1533518191 8 (71250608..94941871) 23691264
The mapping of file offsets to extents is quite close in the dd(1) file cases. Moreover, XFS gave me 4 extents for my 32GB file. I like that..but…
So what about the xfs_mkfile(8) case? Well, not so good.
I’ll post a blog update when I figure out more about what’s going on. In the meantime, I’ll just paste it and that will be the end of this post for the time being:
# cat frag.xfs_mkfile.out testfile: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..10239]: 719289592..719299831 4 (1432..11671) 10240 1: [10240..14335]: 719300664..719304759 4 (12504..16599) 4096 2: [14336..46591]: 719329072..719361327 4 (40912..73167) 32256 3: [46592..78847]: 719361840..719394095 4 (73680..105935) 32256 4: [78848..111103]: 719394608..719426863 4 (106448..138703) 32256 5: [111104..143359]: 719427376..719459631 4 (139216..171471) 32256 6: [143360..175615]: 719460144..719492399 4 (171984..204239) 32256 7: [175616..207871]: 719492912..719525167 4 (204752..237007) 32256 8: [207872..240127]: 719525680..719557935 4 (237520..269775) 32256 [...3,964 lines deleted...] 3972: [51041280..51073535]: 1115787376..1115819631 6 (36855136..36887391) 32256 3973: [51073536..51083775]: 1115842464..1115852703 6 (36910224..36920463) 10240 3974: [51083776..51116031]: 1115852912..1115885167 6 (36920672..36952927) 32256 3975: [51116032..54897663]: 1142259368..1146040999 6 (63327128..67108759) 3781632 3976: [54897664..55078911]: 1146077440..1146258687 6 (67145200..67326447) 181248 3977: [55078912..56094207]: 1195607400..1196622695 6 (116675160..117690455) 1015296 3978: [56094208..67108863]: 1245119816..1256134471 6 (166187576..177202231) 11014656







Recent Comments