Using Linux Perf(1) To Analyze Database Processes – Part I

Troubleshooting Runaway Processes
Everyone reads Tanel Poder’s material—for good reason.

I took particular interest in his recent post about investigating where an apparently runaway, cpu-bound, Oracle foreground process is spending its time.  That article can be found here.

I’ve been meaning to do some blogging about analyzing Oracle execution with perf(1). I think Tanel’s post is a good segue for me to do so. You might ask, however, why I would bother attempting to add value in a space Tanel has already blogged. Well, to that I would simply say that modern systems professionals need as many tools as they can get their hands on.

Tanel, please ping me if you object, for any reason, to my direct links to your scripts.

Monitoring Window
Tanel’s approach to monitoring his cpu-bound foreground process is based on the pstack(1) utility. Once he identified the spinning process he fired off a 100 pstack commands in a loop. For each iteration, the output of pstack was piped through a text processing script (os_explain).  From my reading of that script I estimate it probably takes about 10 milliseconds to process the data flowing into it. I’m estimating here so please bear with me. If pstack execution requires zero wall clock time I think these 100 snoops at the process stack likely occur in 1 second of wall clock time. According to Tanel, the SQL takes about 23 minutes to complete. If the foreground process is looping just a small amount of code then I see no problem monitoring a small portion of its overall execution time. Since you can see what Tanel is doing you know it would be simple to grab the process every so often throughout its life and monitor the approximate 1 second with the 100 iterations of pstack. The technique and tools Tanel shares here are extensible. That is good.

The Linux pstack utility stops the execution of a process in order to read its stack and produce output to standard out. Performance engineering techniques should always be weighed against the perturbation levied by the monitoring method.  I recall the olden days of CA Unicenter brutally punishing a system just to report performance metrics. Ouch.

Monitoring a cpu-bound process, even periodically, with pstack will perturb performance. There is such a thing as an acceptable amount of perturbation though. Once must decide for themselves what that acceptable level is.

I would like to offer an example of pstack perturbation. For this exercise I will use the silly core-stalling program I call “fat.” This program suffers a lot of processor stalls because it has very poor cache locality. The program can be found here along with its cousin, “skinny.” Aptly named, skinny fits in cache and does not stall.  The following shows that the program executes in 51.251 seconds on my Nehalem Xeon server when executed in isolation.  However, when I add 100 pokes with pstack the run time suffers 5% degradation.

$ cat

time ./fat &
p=`ps -f | awk ' $NF ~ /.fat/ { print $2 }'`

for i in {1..100}
	pstack $p
done  > /dev/null 2>&1
$ sh ./

real	0m53.836s
user	0m50.604s
sys	0m0.024s
$ time ./fat

real	0m51.251s
user	0m51.221s
sys	0m0.013s

I know 5% is not a lot but this is a cpu-bound process so that is a lot of cycles. Moreover, this is a process that is not sharing any data nor exercising any application concurrency mechanisms (e.g., spinlocks, semaphores).  I understand there is little danger in perturbing the performance of some runaway processes but if such a process is also sharing data there can be ancillary affect when troubleshooting a large-scale problem.  Before I move on I’d like to show how the execution time of “fat” changes when I gather 100 pstacks as soon as the process is invoked and then wait 10 seconds before gathering another 100 stack readings. As the following shows, the perturbation meter climbs up to 13%:

$ cat

time ./fat &
p=`ps -f | awk ' $NF ~ /.fat/ { print $2 }'`

for t in 1 2
	for i in {1..100}
		pstack $p
	done  > /dev/null 2>&1
sleep 10

$ sh ./

real	0m57.930s
user	0m51.480s
sys	0m0.050s

Performance Profiling With perf(1)
As I said, I’ve been trying to work out the time to blog about how important perf(1) should be to you in your Oracle monitoring—or any performance analysis for that matter. It requires a modern Linux distribution (e.g., RHEL 6, OEL 6) though. I can’t write a tutorial on perf(1) in this post. However, since I’m on the topic of perturbation via monitoring tools, I’ll offer an example of perf-record(1) using my little “fat” program:

$ time perf record ./fat
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 1.812 MB (~79187 samples) ]

real	0m52.299s
user	0m52.234s
sys	0m0.046s

$ perf report --stdio | grep fat
# cmdline : /usr/bin/perf record ./fat
    99.81%      fat  fat                [.] main
     0.14%      fat  [kernel.kallsyms]  [k] __do_softirq
     0.02%      fat       [.] __memset_sse2
     0.01%      fat  [kernel.kallsyms]  [k] clear_page_c
     0.01%      fat  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     0.00%      fat  [kernel.kallsyms]  [k] get_page_from_freelist
     0.00%      fat  [kernel.kallsyms]  [k] free_pages_prepare

So, we see that monitoring with perf(1) does levy a small tax—2%. However, I just monitored the entire execution of the program.

Monitor Oracle Foreground Process With perf-record(1)
Now I’ll run the program Tanel used in his example.  I’ll record 5 minutes of execution by identifying the process ID of the spinning shadow process and then using the –p option to perf-record(1). The way perf(1) works is to monitor the entire execution of the program you attach to—unless you give it a process to keep it busy (or use the -c option). Now don’t be confused. In the following example I’m telling perf-record(1) to monitor the shadow process. The usage of sleep 300 is just my way of telling it to finish in 5 minutes.

$ cat

sqlplus / as sysdba < /dev/null 2>&1 &
[1] 462
$ ps -f
oracle     462  2259  0 10:11 pts/0    00:00:00 sh ./
oracle     463   462  0 10:11 pts/0    00:00:00 sqlplus   as sysdba
oracle     465  2259  0 10:11 pts/0    00:00:00 ps -f
oracle    2259  2258  0 Feb14 pts/0    00:00:00 -bash
$ ps -ef | grep 463 | grep -v grep
oracle     463   462  0 10:11 pts/0    00:00:00 sqlplus   as sysdba
oracle     464   463 89 10:11 ?        00:00:14 oracleSLOB (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
$ sudo perf record -p 464 sleep 300
[ perf record: Woken up 35 times to write data ]
[ perf record: Captured and wrote 8.755 MB (~382519 samples) ]

$ perf report
$ perf report --stdio
# ========
# captured on: Mon Feb 18 10:17:02 2013
# ========
# Events: 286K cpu-clock
# Overhead  Command      Shared Object                       Symbol
# ........  .......  .................  ...........................
    29.84%   oracle  oracle             [.] kglic0
    11.08%   oracle  oracle             [.] kgxExclusive
    11.02%   oracle  oracle             [.] kglGetHandleReference
     6.66%   oracle  oracle             [.] kglGetMutex
     6.47%   oracle  oracle             [.] kgxRelease
     4.91%   oracle  oracle             [.] kglGetSessionUOL
     4.48%   oracle  oracle             [.] kglic_cbk
     3.79%   oracle  oracle             [.] kgligl
     3.44%   oracle  oracle             [.] kglMutexHeld
     3.12%   oracle  oracle             [.] kgligp
     1.96%   oracle  oracle             [.] kqlpgCallback
     1.92%   oracle  oracle             [.] kglReleaseMutex
     1.88%   oracle  oracle             [.] kglGetBucketMutex
     1.80%   oracle  oracle             [.] kglReleaseBucketMutex
     1.47%   oracle  oracle             [.] kglIsMutexHeld
     1.18%   oracle  oracle             [.] kglMutexNotHeld
     1.04%   oracle  oracle             [.] kglReleaseHandleReference
     0.51%   oracle  oracle             [.] kghalf
     0.33%   oracle  oracle             [.] qerfxFetch
     0.32%   oracle  oracle             [.] lnxmin
     0.31%   oracle  oracle             [.] qerfxGCol
     0.26%   oracle  oracle             [.] qeruaRowProcedure
     0.24%   oracle  oracle             [.] kqlfnn

$ perf report --stdio | grep oracle | sed 's/\%//g' | awk '{ t=t+$1 } END { print t }'

If you study Tanel’s post and compare it to mine you’ll see differences in the cost accounting Tanel has associated with certain Oracle kernel routines. That’s no huge problem. I’d like to explain what’s happening.

When I used perf-record(1) to monitor the shadow process for 300 seconds it collected 382,519 samples. When using a pstack approach, on the other hand, just be aware that you are getting a glimpse of whatever happens to be on the stack when the program is stopped. You might be missing a lot of important events. Allow me to offer an illustration of this effect.

Do You see What I See?
Envision a sentry  guarding a wall and taking a photo every 1 second. Intruders jumping over the wall with great agility are less likely to be in view when he snaps his photo. On the other hand, an intruder taking a lot longer to cross the wall (carrying a large bag of “loot” for instance) suffers greater odds of showing up in one of his photos. The sentry might get 10 photos of slow intruders while missing hundreds of intruders who happen to be more nimble–thus getting over the wall more quickly.  For all we know, the slow intruder is carrying bags of coal while the more nimble intruders are packing a pockets full of diamonds. Which intruder matters more? It depends on how cold the sentry is I guess 🙂 It’s actually a wee-bit like that with software performance analysis. Catch me some time and I’ll explain why a CPI (cycles per instruction)  of 1 is usually bad thing. Ugh, I digress.

Tanel’s approach/kit works on a variety of operating systems and versions of Linux. It also does not require any elevated privilege. I only aim to discuss the differences.

I’ve lightly covered the topic of performance monitoring perturbation and have given a glimpse into what I envision will end up as a multiple-part series on perf(1).


8 Responses to “Using Linux Perf(1) To Analyze Database Processes – Part I”

  1. 1 goryszewskig February 18, 2013 at 12:36 pm

    Perfect timing Kevin, Am I right saying that perf would be useful in mutex issues tracing when things are going really fast ?

  2. 3 Tanel Poder (@TanelPoder) February 19, 2013 at 4:31 pm

    That’s why I said “poor-man’s stack profiler” in my post. Even though the core topic of my post was about drilling down into CPU usage, the important side-topic there was that you do not always need to install additional tools to troubleshoot something right when it happens. Often a text datasource combined with a couple of OS commands or AWK do the trick and you can adapt to changing demands – like just adding “sleep 1” between the stack samples in the loop.

    But perf itself is cool, even though oprofile can do the same things, usability matters – a single command which is able to show the top vs a sequence of commands to initiate collection and then parse & report is going to win. But I still don’t get perf’s hierarchical profiler output logic, I think stack profiles are much better readable if done like Windows ProcMon, Mac OS “Sample Process” or my oradebug-based OStackProf are doing it.

  3. 5 Anantha March 24, 2013 at 10:23 am

    @Kevin, thanks for these posts. Now we’re back to great technical articles that we’ve come to expect from your blog.

  1. 1 Peeking into Linux kernel-land using /proc filesystem for quick’n’dirty troubleshooting | Tanel Poder's blog: IT & Mobile for Geeks and Pros Trackback on February 21, 2013 at 7:47 am
  2. 2 Изучаем внутреннюю кухню ядра Linux с помощью /proc для быстрой диагностики и решения проблем « Домик Миа Trackback on February 23, 2014 at 9:01 am

Leave a Reply to Anantha Cancel reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.


I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 742 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories


All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

%d bloggers like this: