Busy Idle Processes. Huh? The AIX KPROC process called “wait”.

A recent thread on the oracle-l email list was about the AIX 5L KPROC process called “wait”. The email that started the thread reads:

We are reviewing processes on our P690 machine and get the following.

I’ve googled a little bit but can’t find anything of interest. Are these processes that I should be concerned with – should we kill them? A normal ps -ef | grep 45078 does not return the process, so I really can’t figure out what these are.

$ ps auxw | head -10

USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
root 45078 9.3 0.0 48 36 – A Oct 13 120026:37 wait
root 40980 9.0 0.0 48 36 – A Oct 13 116428:47 wait
root 36882 8.9 0.0 48 36 – A Oct 13 114010:26 wait
root 32784 8.8 0.0 48 40 – A Oct 13 113205:56 wait
[…output truncated…]

Another participant in the thread followed up with:

you will find the answer in:

http://www-304.ibm.com/jct09002c/isv/tech/faq/individual.jsp?oid=1:89156

And yet another good member of the list added:

Also, the reason you don’t see it with “ps -ef” is that ps doesn’t show kernel processes by default – you have to specify the “-k” flag, e.g.:

/opt/oracle ->ps -efk|grep wait

root 8196 0 0 Nov 11 – 720:31 wait
root 53274 0 0 Nov 11 – 3628:35 wait
root 57372 0 0 Nov 11 – 554:40 wait
root 61470 0 0 Nov 11 – 1883:24 wait
[…output truncated…]

So What Do I Have To Add?
So why am I blogging about this if the mystery has been explained? Well, I think having a kernel process attributed with time when the processor is in the idle loop is just strange. Microprocessors only have two states; running and idle. On a Unix system, the running state is attributed to either user or kernel mode. Attributing the idle state to anything is like charging nothing to something.

Yes, I suppose I’m nit-picking. There is something about the running state that I find so many people do not know and it has to do with processor efficiency. Regardless of which mode—user or kernel—the processor monitoring tools can only report that the processor was idle or not. That’s all. Processor monitoring tools (e.g., vmstat, sar, etc) cannot report processor efficiency. Remember that a processor is not always getting work done efficiently. Not that there is anything you can do about it, but a processor running in either mode accessing heavily contended memory is getting very little work done per cycle. The term CPI (cycles per instruction) is used to represent this efficiency. Think of it this way, if a CPU accesses a memory location in cache, the instruction completes in a couple of CPU cycles. If the processor is accessing a word in a memory line that is being completely hammered by other processors (shared memory), that single instruction will stall the processor until it completes. As such, the workload is said to execute with a high CPI.

There you have it, some trivial pursuit.

What Does This Have To Do With Oracle?
Well, I’ll give you an example. A process spinning on a latch is executing the test loop in cache. The loop executes at a very, very low CPI. So if you have a lot of processes routinely spinning on latches, you have a low CPI—but that doesn’t mean you are getting any throughput. Latch contention is just tax if you will. When the latch is released, the processors that are spinning get a cacheline invalidation. They immediately read the line again. The loading of that line brings the CPI way up for a moment as the line is installed into cache, and on and on it goes. The “ownership” of the memory line with the latch structure just ping-pongs around the box. Envision a bunch of one-armed people standing around passing around a hot potato. Yep, that about covers it. No, not actually. Somewhere there has to be a copy of the potato and a race to get back to the original. Hmmm, I’ll have to work on that analogy—or take an interest in hierarchical locking. <smiley>

Therein lies the reason that just a few contended memory lines with really popular Oracle latches (e.g., redo allocation, hot chains latches, etc) can account for reasonable percentages of the work that gets done on an Oracle system. On the other hand, systems with really balanced processor/memory capabilities (e.g., System p, Opteron on Hypertransport, etc), and systems with very few processors don’t have much trouble with this stuff. And, of course, Oracle is always working to eliminate singleton latches as well.

6 Responses to “Busy Idle Processes. Huh? The AIX KPROC process called “wait”.”

Feed for this Entry Trackback Address

1 Rob Johnson January 12, 2007 at 4:53 pm

Recording idle time as process time seems strange, but I can think of two possible reasons for doing so:

(1) It allows you to take snapshots of idle time on a regular basis, by checking the difference in kproc wait time between successive measurements. Then you can subtract that idle time from clock time, and end up with work time.

The processes which do “real work” may come and go, so their accumulated processor times do not persist, and you can’t get work time in snapshots by looking at those processes.

(2) Since each processor has a separate wait entry in the process list, you can tell if the CPU load is concentrated on one or a few CPUs, or spread evenly. Another, rougher way to do this is to look at vmstat and see if busy time is roughly 100/(number of your CPUs).

I really enjoy this blog, and I’m lucky to understand about 1/2 of what you write here. The level of detail you provide has really enhanced my understanding of RAC. Thanks!

2 kevinclosson January 12, 2007 at 5:23 pm

Rob,

Good points…very good in fact. Thanks for reading.

3 Perry Lovill June 22, 2009 at 8:57 am

Even though a microprocessor may only be “busy” or “idle”, a O/S (kernel) scheduling routine should almost always be “busy” – especially on a multiprocessor system. So, accounting for the “idle” time of not executing user code under the name “wait” does not seem that strange to me since the scheduler threads are contstanly waking up to check if anything needs to be moved from the “wait” queue to the “runnable” queue, and from the “runnable” queue to the “running”/”executing” queue. So, “wait” waits for everything else to be ready, but its *always* ready — unless its NOT “wait”ing becasue its CPU is actually running a user or kernel thread (other than the scheduler itself).

4 Perry Lovill June 22, 2009 at 9:16 am

Sorry… I am *way too tired. Replace “busy” with “running” in the above reply. My point being, even if a CPU is counted as being “idle” its not really truly idle as long as the OS scheduler thread is running on it (and checking up on all of the other threads to see if any are now in need of a CPU timeslice.

- 5 kevinclosson June 22, 2009 at 5:49 pm
  
  The idle process/thread does nothing as its name implies (well, ok, most implementations just loop on a word in L1, but I call that nothing). A scheduler process/thread requires motivation to run. On a totally idle system (i.e., a system in the idle loop on all processors) that motivation is the hard clock callout table that gets executed every time a cpu gets interrupted by the hard clock. If there are no runable processes the scheduler has spent just a few clock cycles to see if there was in fact anything that needs to be done. So, you’re right that a CPU always has to be doing something. But the idle loop is just that. Let’s just say that it is busily doing nothing…until it gets popped with an interrupt.

1 wait process - UNIX for Advanced & Expert Users - The UNIX and Linux Forums Trackback on December 9, 2008 at 11:51 am

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage