Little Things Doth Crabby Make – Part XVI (Addendum). Hey ls(1) And du(1) Are Supposed To Agree.

My last installment in the Little Things Doth Crabby Make  series had a lot of readers stepping up to remind me that ls(1) and du(1) aren’t always supposed to report the same size-related information on files. Uh, I actually knew that!

The post wasn’t about sparse files or any other such remedial aspects of file sizes.

In the post I mentioned that I was taking some rather unseemly actions against my XFS file system.

One particular unseemly thing I did was a the result of a bug in a small piece of my code.  Imagine for a moment that the loff_t variable sz in the following snippet was stupidly uninitialized/unassigned and the program steps on this syscall(__NR_fallocate,,,,) landmine.

 if ((ret = syscall(__NR_fallocate, fd, 0, (loff_t)0, (loff_t)sz)) != 0 )
 perror ("syscall.fallocate");

Well, if whatever happens to be stored in the variable sz is a really large value you’ll have a.out (allocate_file in my case) spinning in kernel mode for the rest of your life (at least on a 2.6.18 kernel). However, I got tired of it shortly after I snapped the following top(1) information:

 
 top - 11:47:27 up 3 days, 17 min, 4 users, load average: 1.00, 1.00, 1.00
 Tasks: 481 total, 2 running, 479 sleeping, 0 stopped, 0 zombie
 Cpu(s): 0.0%us, 4.2%sy, 0.0%ni, 95.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Mem: 49451520k total, 4065088k used, 45386432k free, 121492k buffers
 Swap: 50339636k total, 1044k used, 50338592k free, 3609352k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 12682 root 25 0 3648 308 248 R 99.7 0.0 880:23.09 allocate_file
 3997 root 15 0 13008 1416 816 R 1.0 0.0 10:25.16 top
 10100 gpadmin 15 0 111m 17m 2032 S 1.0 0.0 9:13.49 collectl
 1 root 15 0 10352 692 580 S 0.0 0.0 0:13.40 init
 2 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/0
 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
 5 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/1
 6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
 7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
 8 root RT -5 0 0 0 S 0.0 0.0 0:00.21 migration/2
 9 root 34 19 0 0 0 S 0.0 0.0 0:00.08 ksoftirqd/2
 10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2
 11 root RT -5 0 0 0 S 0.0 0.0 0:04.91 migration/3
 12 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3
 13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
 14 root RT -5 0 0 0 S 0.0 0.0 0:00.09 migration/4

It turned out my stupid error put the file system up to the task of allocating nearly 14TB to a file in a file system with about 200GB free. My mistake. However, the call should have failed instead of leaving me with a kernel-mode process that required a server reset to clear. But, alas, I was using a very old interface. If the particular test system I was investigating was running a more recent kernel I would have called fallocate(2) and the situation would most likely have been different but the kernel was older than the 2.6.23 minimum requirement for the fallocate(2) call.

So what does this have to do with ls(1) and du(1). Well, I had a lot of programs running that were thrashing the file system. I unearthed a race condition of some sort where my looping call to ls(1) managed to catch a glimpse of the file being populated by PID 12682 (see the top(1) output above). The ls(1) command reported zero bytes. The next line of the script executed microseconds (or less) later at which point du(1) was under the opinion the file was 287GB. Both the initial and subsequent df(1) information was consistent. I haven’t studied the transactional nature of this old rendition of fallocate so I can’t speculate what was going on. The only thing executing on the system at the time was, indeed, several invocations of the allocate_file program. It turns out that none of them branched to that call with an uninitialized grenade—as it were.

I was unable to reproduce the situation and lost interest after fixing that stupid bug in the allocate_file program.

If there is any moral to this story it would be that the level of unpredictability is unpredictable if a process unpredictably asks the kernel to do something it cannot possibly do such as allocate terabytes to a file in gigabytes of free space. I would predict, however, that >2.6.23 fallocate() would handle my goofy mistake differently.

I hate it when I can’t reproduce a problem.

0 Responses to “Little Things Doth Crabby Make – Part XVI (Addendum). Hey ls(1) And du(1) Are Supposed To Agree.”



  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,961 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: