Some time back I made a blog entry about network performance monitoring tools with a slant towards monitoring Oracle over NFS. The blog entry contains a very long list of all the various tools out there, none of which did what I wanted.
That was then, this is now. Mark Seger (the author of collectl) commented as follows on that blog entry:
Sorry to hear you haven’t found any tools you like, but perhaps you haven’t looked at collectl yet.
Indeed I have looked into collectl. In fact, not only have I looked into it, but I absolutely love it and have been using it extensively for months. I recommend you take a gander at the collectl website.
In its simplest form, I feel it captures very good quick health check style information. The following is an example of a small Linux server performing a little over 200MB/s of disk and network throughput. As you can see, monitoring this sort of performance data would require several stock Linux commands.
# collectlwaiting for 1 second sample…#<——–CPU——–><———–Disks———–><———–Network———->#cpu sys inter ctxsw KBRead Reads KBWrit Writes netKBi pkt-in netKBo pkt-out
28 23 28325 57874 202880 562 0 0 5692 8414 227323 28862 29 25 28782 59234 222560 573 0 0 5616 8338 226129 28701 28 22 28333 57916 235776 634 2048 34 5517 8252 223717 28419 27 22 28874 58156 209848 597 1 1 5477 8162 222559 28290 28 23 28214 58068 220328 569 0 0 5620 8245 221651 28165 29 21 27871 57898 220224 582 0 0 5534 8213 225510 28606 28 24 27923 59021 223184 581 0 0 5536 8244 224676 28531 65 47 29300 57973 216152 580 316 16 5891 8364 226310 28725 |
Kudos, Mark. Great tool!
Does collect only work on Linux?
Nmon (and Nmon analyser) take some beating too.
http://www-941.haw.ibm.com/collaboration/wiki/display/WikiPtype/nmon
Hi Phil,
I investigated nmon, thanks for bringing it up. For me it is a bit too much. I think for a general purpose server or a file and print server it would be helpful, but for a database server it seems like overkill. Just my opinion.
Hi Amir,
Collectl is only Linux–to the best of my knowledge.
I just wanted to let people know that I’ve just released a new version of collectl that monitors process i/o stats on kernels that have it built in – I’m not sure when they first appeared but I’ve been developing against 2.6.23. If you want to see what this looks like without going to the effort of actually installing collectl, I have some examples poster here – http://collectl.sourceforge.net/ProcessIOStats.html
enjoy…
-mark
Hi Kevin,
There is a tool available called SWAT from Sun which gives you in depth information on Storage and NFS performance – IOPS, Queue Depth, Throughput, size of IOPS, Read/Write, Response times etc. It is an ideal tool for Storage Performance analysis with very low impact on source systems.
It is Java based and can be run continuously in the background with real time trace abilities. It is not well known to the public and requires you to connect with a Sun Rep to get it, however it is well worth it (it is free).
While I do know that it can run on Solaris and Windows, I am not sure if it runs on Linux.
Thanks
Krishna Manoharan
I guess my comment on any tool that only looks at a subset of performance counters it that it will give you an incomplete picture regardless of how much extra detail it provides. If you’re doing nfs testing and only looking at nfs data, how are you to know if your problems lie outside nfs itself? For example, I was recently doing some nfs testing and found my CPU was getting hammered. Further investigation showed all the interrupts were going to CPU 0, which in fact inspired me to to add interrupts by cpu to collectl.
I suppose if there is another tool that provides a more needed level of detail beyond what collectl can provide, and yes I realize such tools do exist 9-), perhaps the answer is to run both and use collectl to show what’s happening on the rest of the system during a test run assuming of course that the other tool provides timestamped history.
-mark
I like collectl so much I’m about to start sensorship for any negative feedback!