The SLOB code that is generally available here differs significantly from what I often test with in labs. Recently I was contorting SLOB to hammer an EMC XtremIO All-Flash Array in a rather interesting way. Those of you in the ranks of the hundreds of SLOB experts out there will notice two things quickly in the following AWR snippet:
1) Physical single block reads are being timed by the Oracle wait interface at 601 microseconds (3604/5995141 == .000601) and this is, naturally for SLOB, the top wait event.
2) Disk file operations I/O is ranking as a top 5 timed event. This is not typical for SLOB.
The 601us latencies for XtremIO are certainly no surprise. After all, this particular EMC storage array is an All-Flash Array so there’s no opportunity for latency to suffer as is the case with alternatives such as flash-cache approaches. So what is this blog post about? It’s about Disk file operations I/O.
I needed to refresh my memory on what the Disk file operations I/O event was all about. So, I naturally went to consult the Statistics Description documentation. Unfortunately there was no mention of the wait even there so I dug further to find it documented in the Description of Wait Events section of the Oracle Database 11g documentation which states:
This event is used to wait for disk file operations (for example, open, close, seek, and resize). It is also used for miscellaneous I/O operations such as block dumps and password file accesses.
Egad. A wait is a blocking system call. Since open(2)/close(2) and seek(2) are non-blocking on normal files I suppose I could have suffered a resize operation–but wait, this tablespace doesn’t allow autoextend. I suppose I really shouldn’t care that much given the fact that the sum total of wait time was zero seconds. But I wanted to understand more so I sought information from the user community–a search that landed me happily at Kyle Hailey’s post on oaktableworld.com here. Kyle’s post had some scripts that looked promising for providing more information about these waits but unfortunately in my case the scripts returned no rows found.
So, at this point, I’ll have to say that the sole value of this blog post is to point out the fact that a) the Oracle documentation specifically covering statistics descriptions is not as complete as the Description of Wait Events section and b) the elusive Disk file operations I/O wait event remains, well, elusive and that this is now part I in a multi-part blog series until I learn more. I’ll set up some traces and see what’s going on. Perhaps Kyle will chime in.
if linux of a recent version you can try using latencytop or perf uprobe to understand those 170 odd waits reported by oracle via some i/o syscall.
If solaris you can use dtrace with a syscall provider.
However since it is reported in awr possibly some session wait v$ view or historic ash/dba view could help provided the data sampled is retained.
Due to sampling errors at oracle db and even at os kernel level it may be interesting to see first what was those 170 odd waits caused by a suspected syscall wait to understand if they did occur at all in first place than to measure their latency ..i may be wrong
I did 10046 tracing before I posted this…part II will have some words on the matter.
Thanks for reading and commenting.