Please note, the SLOB Resources page is always the sole, official location to obtain SLOB software and documentation: SLOB Resource Page.
Please visit the following post for a long list of industry vendors’ use cases for SLOB. SLOB has become the primary tool kit for testing a platform’s suitability for Oracle Database. The following blog post makes this case rather strongly: Industry Vendors’ SLOB Use Cases
Background
We’ve all been there. You’re facing the need to assess Oracle random physical I/O capability on a given platform in preparation for OLTP/ERP style workloads. Perhaps the storage team has assured you of ample bandwidth for both high-throughput and high I/O operations per second (IOPS). But you want to be sure and measure for yourself so off you go looking for the right test kit.
There is no shortage of transactional benchmarks such as Hammerora, Dominic Giles’ Swingench, and cost-options such as Benchmark Factory. These are all good kits. I’ve used them all more than once over the years. The problem with these kits is they do not fit the need posed in the previous paragraph. These kits are transactional so the question becomes whether or not you want to prove Oracle scales those applications on your hardware or do you want to test database I/O characteristics? You want to test database I/O! So now what?
What About Orion?
The Orion tool has long been a standard for testing Oracle block-sizes I/O via the same I/O libraries linked into the Oracle server. Orion is a helpful tool, but it can lead to a false sense of security. Allow me to explain. Orion uses no measurable processor cycles to do its work. It simply shovels I/O requests into the kernel and the kernel (driver) “clobbers” the same I/O buffers in memory with the I/O (read) requests again and again. Additionally, Orion does not involve an Oracle Database instance at all! Finally, Orion does not care about the contents of I/O buffers (no load/store operations to/from the I/O buffers before or after physical I/O) and therein lies the weakness of Orion for testing database I/O. It’s not database I/O! Neither is CALIBRATE_IO for that matter. More on that latter…
At one end of the spectrum we have fully transactional application-like test kits (e.g., Swingbench) or low-level I/O generators like Orion. What’s really needed is something right in the middle and I propose that something is SLOB.
What’s In A Name?
SLOB is not a database benchmark. SLOB is an Oracle I/O workload generation tool kit. I need to point out that by force of habit many SLOB users refer to SLOB with terms like benchmark and workload interchangeably. SLOB aims to fill the gap between Orion and CALIBRATE_IO (neither generate legitimate database I/O as explained partly here) and full-function transactional benchmarks (such as Swingbench). Transactional workloads are intended to test the transactional capability of a database.
I assert that by the time customers license Oracle Database they are quite certain Oracle Database is a very robust and capable ACID-compliant transactional engine and unless you are testing your transactions it makes little sense to test any transactions. That is just my opinion and partial motivation behind my desire to create SLOB–a non-transactional database I/O workload generator.
SLOB possesses the following characteristics:
- SLOB supports testing Oracle logical read (SGA buffer gets) scaling
- SLOB supports testing physical random single-block reads (db file sequential read)
- SLOB supports testing random single block writes (DBWR flushing capacity)
- SLOB supports testing extreme REDO logging I/O
- SLOB consists of simple PL/SQL
- SLOB is entirely free of all application contention
Yes, SLOB is free of application contention yet it is an SGA-intensive workload kit. You might ask why this is important. If you want to test your I/O subsystem with genuine Oracle SGA-buffered physical I/O it is best to not combine that with application contention.
SLOB is also great for logical read scalability testing which is very important, for one simple reason: It is difficult to scale physical I/O if the platform can’t scale logical I/O. Oracle SGA physical I/O is prefaced by a cache miss and, quite honestly, not all platforms can scale cache misses. Additionally, cache misses cross paths with cache hits. So, it is helpful to use SLOB to test your platform’s ability to scale Oracle Database logical I/O.
What’s In The Kit?
There are no benchmark results included–because SLOB is not a benchmark as such. The kit does, however, include:
- README files and documentation. After extracting the SLOB tar archive you can find the documentation under the “doc” directory in PDF form.
- A simple database creation kit. SLOB requires very little by way of database resources. I think the best approach to testing SLOB is to use the simple database creation kit under ~/misc/create_database_kit. The directory contains a README to help you on your way. I generally recommend folks use the simple database creation kit to create a small database because it uses Oracle Managed Files so you simply point it to the ASM diskgroup or file system you want to test. The entire database will need no more than 10 gigabytes.
- An IPC semaphore based trigger kit. I don’t really need to point out much about this simple IPC trigger kit other than to draw your attention to the fact that the kit does require permissions to create a semaphore set with a single semaphore. The README-FIRST file details what you need to do to have a functional trigger.
- The workload scripts. The setup script is aptly named setup.sh and to run the workload you will use runit.sh. These scripts are covered in README-FIRST.
Models
The size of the SGA buffer pool is the single knob to twist for which workload profile you’ll generate. For instance, if you wish to have nothing but random single block reads you simply run with the smallest db_cache_size your system will allow you to configure (see README-FIRST for more on this matter). On the other hand, the opposite is what’s needed for logical I/O testing. That is, simply set db_cache_size to about 4GB, perform a warm-up run and from that point on there will be no physical I/O. Drive up the number of connected pseudo users and you’ll observe logical I/O scale up bounded only by how scalable your platform is. The other models involve writes. If you want to drive a tremendous amount of REDO writes you will again configure a large db_cache_size and execute runit.sh with only write sessions. From that point you can reduce the size of db_cache_size while maintaining the write sessions, which will drive DBWR into a frenzy.
Who Uses SLOB?
SLOB is extremely popular and heavily used. SLOB testing is featured in many industry vendor published articles, books, blogs and so forth. Google searching for SLOB-related content offers a rich variety of information. Additionally, I maintain a page of notable SLOB use cases in the industry at the following web page:
SLOB Use Cases in the Industry
What You Should Expect From SLOB
I/O, lots of it! Oh, and the absolute minimal amount of CPU overhead possible considering SLOB generates legitimate SQL-driven, SGA-buffered physical I/O!
Where Is The Kit?
The only official place to obtain SLOB is at the following web page:
SLOB Resources Page (SLOB distribution and helpful information)
Thx Kevin 🙂
Kevin,
I want to add a think time, so I added a variable to slob.conf and slob.sql. It seems I am breaking the wait mechanism, because I get 0 minutes of benchmark with this statement in the slob.sql loop.
exec dbms_lock.sleep(v_think);
Thanks,
Allen
Send me your slob.sql in its entirety via email and I can have a look… I do have a version of SLOB coming out soon that has sleep time and a lot of other stuff…
Thanks Kevin for all your work on this Kit. I used this Kit to verify that the performance on my new Intel 8 socket servers was significantly different than expected for LIO’s (I suspected this from some of my own testing). Published benchmarks do little to give you insight on how Oracle will behave on hardware. This kit gives you a lot more information on how an oracle workload acts.
Bryan,
You’re welcome. At 8 sockets you are using SLOB to appreciate the NUMA affect on your instances. SLOB is indeed good for that too.
Kevin, once again thanks for the oppportunity to use SLOB. It’s been invaluable given our permanent effort in rationalising, consolidating and virtualizing our db servers.
Orion is too “raw” to cover all bases and most other benchmarks out there measure only one type of environment and are waaay too complex and lengthy to quickly setup and get results.
With SLOB I strongly like the simplicity of the single “tweak button”: db cache size. And the results are measured with AWR, which is something we can all relate to and with.
The results we got with virtualized Aix (vio) went a long way in validating and explaining the advantages and shortcomings we had empirically observed with this architecture. In particular, SLOB was instrumental in proving and confirming some of the observations I had passed on to our IT architects, namely the fact that our db server partitions are inadequately provisioned in terms of I/O pipes.
We can do North of 200K liops/core, but nowhere near an order of magnitude less with physical I/O. This is now in the process of being addressed in the next hardware refresh cycle. Just like in so many other engineering issues, once we can pin where the problem is, we can fix it.
I’m now looking forward to re-testing with 11g and Aix 7.1. Mostly to provide myself and management with a degree of confidence that whatever may break after our upcoming upgrade, is not caused by an underlying architectural or setup defect.
The most striking thing with SLOB is the ease and simplicity of setup, with repeatable and consistent results and easy and simple to control/modify/tweak.
In fact, I’d like to propose you rename it Simple Little Old Benchmark!
Given my engineering background, I like simplicity: it works!
My pleasure, Noons. And thanks back to you for being one of my two AIX beta testers. I’m so glad to hear it is useful in your environment! By the way, I’m going to start quoting this one:
“Just like in so many other engineering issues, once we can pin where the problem is, we can fix it.”
Now *that* is simplicity and truth rolled into one!
as per the golden rule of any performance optimisation, find the bottle neck, fix the bottle neck, root course…
Looking at doing some AIX testing also,
G
Leighton Nelson has tweeted of his recent testing with SLOB on AIX.
Hi Kevin,
thanks a lot for SLOB-I am going to use it extensively. As you say, ORION isn’t the answer. I am currently involved (sadly not leading!) an evaluation of Flash memory in PCIe devices, or shared via FC for a standard hardware setup. We were missing a tool that creates real Oracle workload, instead of iometer and the likes. Now I’m also being able to BAARF and show them log file parallel writes > 10 ms!
Hi Martin,
So you are using the “REDO” model of SLOB..that’s a very large SGA db_cache_size and a writer mix (no readers) to dirty up the cache and force large streaming writes to redo logs? If so, you can tweak that even further by going with very large redo logs so DBWR doesn’t get triggered to flush any more than he needs to… this addition to the REDO model enables max+max theoretical REDO writing
Hi Kevin,
I’ve been testing with SLOB for a couple of days now on our p7 series servers and I must commend you on it’s simplicity of setup. This tool provides a great way to verify the capability of our infrastructure which we never really had any insight for before. Our sys admins have also embraced it with open arms – something that DBAs and sysadmins never do 🙂
Anyway, maybe you can follow up with a webinar or do a demo on using SLOB and how to interpret the results?
Thanks for a great tool!
Hello Leighton,
If you post results here I can comment. Upload your AWRs to the cloud and point at them. We can get a helpful thread going that way I think.
Oh please do it! I’m interested in AIX p7 results, I don’t have ready access to (recent) IBM kit.
Here’s are some AWR reports from SLOB that I collected recently.
PIO
http://dl.dropbox.com/u/25153503/Oracle/SLOB/results/pioawr/awr_r32w0.txt
http://dl.dropbox.com/u/25153503/Oracle/SLOB/results/pioawr/awr.txt
LIO
http://dl.dropbox.com/u/25153503/Oracle/SLOB/results/lioawr/awr_r32w0.txt
So based on those results I’m looking at around 20k IOPS/core for physical reads (cpu_count=2). At the same time I was able to drive more bandwith by spreading my IO across more devices.
I’m not too sure what the LIO results mean. It topped out at around 380k/core with 24 readers and 2 CPUs. SGA was set to 4GB. I suppose increasing SGA will provide even more bandwith.
Hi Leighton,
Since your physical reads are falling in the 2-4ms range I recommend you keep all else the same and then try runs at 64, 96 and 128 users. That will drive up the IOPS and if the storage is scalable the service times will remain flat.
I think it’ll make a very interesting read or presentation to compare results of different kit and settings, to point out strengths/weaknesses, bottlenecks and what is responsible for the bottlenecks.
We’ll need people to contribute SLOB AWRs. Maybe a common Cloud site?
Frits, Kevin,
I could host these on my company’s website. I have plenty of space available.
Martin
Per Kevin’s comment –
” Bottom-bin WSM-EP pushing SLOB to over 18K IOPS/core. You should post this on my my SLOB blog post.”
Dual Quad Core X5600s.
Logical reads: 154,012.1 542,874.3
Block changes: 28.0 98.5
Physical reads: 144,248.5 508,459.0
Memory Statistics
~~~~~~~~~~~~~~~~~ Begin End
Host Mem (MB): 60,506.9 60,506.9
SGA use (MB): 668.0 668.0
PGA use (MB): 275.6 207.0
% Host Mem used for SGA+PGA: 1.56 1.45
I will tweak the setup a bit and see what else I might able to get out of it –
Thanks Matt. BTW, you should post the output of /proc/cpuinfo. Most folks would be as surprised as I was to see just how bottom-bin these CPUs are considering the SLOB PIO they are able to drive.
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5607 @ 2.27GHz
stepping : 2
cpu MHz : 1200.000
cache size : 8192 KB
Thanks, Matt. That goes a long way to dispel the assertion that massive IOPS over FC requires too much CPU… these are totally bottom-bin Xeons pushing over 18,000 IOPS/core via fibre channel.
Fact trumps faith…long live SLOB!
Hi Kevin,
another thing I’d like to mention. I have run the LIO test on a T4-3 and get 3,652,597.7 LIOs per second with 0 writes/512 readers. I was wondering if there were any recommendations for driving a platform to the maximum possible? I’m really interested in the LIO test to see which platform scales well.
This machine is 2s16c128t with 528G RAM. I used an 80G SGA (ASMM-which leaves about 72G for the buffer cache). Can share AWR if needed.
Martin
Have you used processor groups to pin it all into a single socket to get your baseline?
Why don’t we take it up on the Oaktable email channel. We can enlist Glenn Fawcett if the AWR has no clues.
I will say that for LIO you should not be over-driving the threads to that level. The LIO model of SLOB has no waits so if you are running 4x processes to processor threads you are creating your own (scheduling) waits. This workload can keep cores busy…should be no reason to lean on the threads. What does it look like if you use processor groups to lock the SGA into the memory of socket0 and run all the foregrounds pinned to socket0 at, say, 16 SLOB readers? Than, perhaps, experiment with 32 SLOB readers in that recipe.
Hi Kevin,
do you have any pointers to further documentation on setting up the T4 by locking memory etc as mentioned above?
Mark
@Mark : Sorry, no, I don’t play with T4 stuff. If init.ora parameters would help I recommend seeing the parameters Oracle uses on “SuperCluster” for TPC-C. Just search for p_run.ora in the following FDR:
Click to access Oracle_SPARC_SuperCluster_with_T3-4s_TPC-C_FDR_120210.pdf
more efficient code on the way ?
here is an update with your latest SLOB kit (2-08-2012)
Logical reads: 164,920.4 656,365.0
Block changes: 14.6 58.2
Physical reads: 166,625.0 663,148.8
>more efficient code
Not sure what you’re asking. This is the latest drop though… so you see about 15% boost from this latest SLOB kit then?
tar: tape blocksize error
Is this user error on my behalf? Trying to extract for solaris x86_64.
please post in the screen output from both the command you are using and its output. Also output of “which tar”
Hi Kevin,
Thanks for the tool kit.
Just a quick question – should we be setting filesystemio_options=directio to bypass file system cache?
Hi Stojan,
I’d recommend filesystemio_options=setall. Enjoy!
Thanks.
Interesting that I get these numbers with setall:
Physical reads: 30,623.2 440,323.0
and these numbers with async:
Physical reads: 132,452.7 4,743,396.0
running with 4 readers on Red Hat 5.6 connected to a NetApp
Not really a mystery Stojan. With filesystemio_options=asynch (no typo) you have files open in buffered mode. So your I/O is simply being satisfied in the operating system page cache. Depending on what the Netapp looks like and how it is connected (FCP, NFS, iSCSI), and how many spindles there are I’d say 30.6K RIOPS from 4 readers isn’t all that bad. I recommend reverting to setall (because we don’t really run with page-cache buffered Oracle datafiles) and ramp up the readers. Questions:
1. How many CPUs of what type do you have?
2. What is db_cache_size set to (do SHOW PARAMETER)
Try setting readers to 2x the number of processor cores and share with us your findings… a more complete snippet of the AWR would be useful too.
Ok, thanks, that makes it clear. Netapp is connected via FC with 40 spindles.
cpu :
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
stepping : 10
cpu MHz : 3000.108
cache size : 6144 KB
db_cache_size big integer 32M
AWR report of test with 16 readers can be viewed at:
https://docs.google.com/open?id=0B8L5rsECrEJBakNUWTd0Z2VRNXFYdVdOT2xfY1p1UQ
Hello Stojan,
That google docs reference takes me to a dead end. Am I doing something wrong?
Sorry about that.
The document should be open to the public (tested from a colleagues pc) I think you need to click on file then download to view it as it hasn’t been converted to google doc format as it messes up the formatting.
Hmm…I still get “No preview available”.
Here’s a link to a version with preview available:
https://docs.google.com/document/d/1d6tXISCWN5VCSwFMUFBn-q_qaWW4Xf1EBxmxyIvNtRY/edit
Hi Kevin
posting back from twitter on an AWR report you’ve referenced.
The report shows ‘db file parallel read’s in Top 5 and it contradicts your description “SLOB supports testing physical random single-block reads (db file sequential read)”. Technically speaking parallel reads are close to random reads except they take into account index structure and utilize async code path. Tanel described the logic behind parallel reads here. So just to be safe you might want to try turning this thing off by a hidden parameter _db_file_noncontig_mblock_read_count=1 (defaults to 11 on a 11.2.0.3).
IOPS reported in the ‘physical read total IO requests’ reflect real random reads as ‘physical read total multi block’ is zero, so looks like Oracle counts them correctly (didn’t know how it is accounted before reading docs).
Hi Timur,
Yes, I need to change that wording. I’ve known all along that both paths were pushed by SLOB (db file parallel read and db file sequential read). The most important thing is that the workload is all single block random as I intended to point out after your tweet, but since you point it out here I’d say all is well.
I personally see no value in setting that particular hidden parameter. In my mind as long as the I/O is all single block random it’s OK by me.
Hi Kevin and Timur,
I’m testing SLOB and I set event 10753 to disable prefetching
and workload pattern has been change. More readers are required to run same IOPS and every run cycle is much longer.
regards,
Marcin
Can you share an AWR report with and without your event? I prefer that approach over the hidden parameter. I’d like to see the AWRs to understand what it does to the workload. Thanks for giving SLOB a try, Marcin!
I just tried this event in my setup:
[oracle@tcag1 SLOB]$ grep ‘Physical reads:’ *awr*even*
awr.txt-with-10753-event: Physical reads: 407,166.4 1,640,695.6
awr.txt-without-10753-event: Physical reads: 393,802.3 1,637,358.4
This is a 2s12c24t server than can handle upwards of 400,000 8K random read IOPS from a XFS file system (filesystemio_options=setall so O_DIRECT) residing in a md(4) SW RAID volume.
The difference between the default behavior and the behavior with the 10753 setting is only ~3%. What sort of IOPS difference is it in your testing?
Hi Kevin,
I found a difference for small number of readers – buffer cache set to 40 MB, 16 readers
Default settings
testdb-ora$ grep “Physical reads” 40/awr_16r_1c.txt
Physical reads: 16,116.9 16,574.7
testdb-ora$ grep “Elapsed:” 40/awr_16r_1c.txt
Elapsed: 21.01 (mins)
Event 10753 set
testdb-ora$ grep “Physical reads” 40_noprefetch/awr_16r_1c.txt
Physical reads: 6,120.1 6,430.1
testdb-ora$ grep “Elapsed:” 40_noprefetch/awr_16r_1c.txt
Elapsed: 55.14 (mins)
Strange thing is that time difference disappear for 64 readers
Default settings
testdb-ora$ grep “Physical reads” 40/awr_64r_1c.txt
Physical reads: 14,854.8 15,224.0
testdb-ora$ grep “Elapsed:” 40/awr_64r_1c.txt
Elapsed: 40.81 (mins)
Event 10753 set
testdb-ora$ grep “Physical reads” 40_noprefetch/awr_64r_1c.txt
Physical reads: 13,277.6 13,643.8
testdb-ora$ grep “Elapsed:” 40_noprefetch/awr_64r_1c.txt
Elapsed: 39.78 (mins)
My system is running RedHat 5.3 2s6c12t on ext3 filesystem,
Right now I’m not sure how to explain this behavior and I will rerun all tests with more monitoring in place. (iostat / sar)
regards,
Marcin
I think you mean 2s12s24t.
At 16 readers there is a not much concurrent I/O outstanding. Disabling non-contiguous multi-block reads just means that now this low-concurrency workload is low-concurrency *and* fully blocking. I think what your test proves to us is that your storage is bottlenecked in the 14K IOPS range and that 64 readers is a sufficient concurrent workload to drive that storage to the limit.
By the way, be mindful that Ext3 does not support concurrent writes (even in O_DIRECT) so when you switch to a writer test you should see a massive penalty…unless, that is, your tablespace consists of lots and lots of really small files.
Keep us posted.
I sure wish we’d build a repository of AWRs that accompany these threads 😦
An interesting thing for you to experiment with might to boot the instance with a small (40MB) recycle pool because the tables are created with this association. Running in this manner will allow for the index blocks to remain in the SGA buffer pool. As it stands now you are suffering PIO for both index and table. Please give it a try and tell us what you find out.
Thanks for participating, Marcin.
Hi Kevin,
I will try recycle pool later and will let you know.
Now I was trying to stress my host with LIO and I have one question related to runit.sh. You are capturing first AWR snapshot before you setup all session and then there is a ./trigger which start all executions and finish with next AWR snapshot. But between sqlplus loop and ./trigger – there is a sleep 10 (I think to allow Oracle to start all session) but this 10 sec is also included in AWR report. It is not a problem for PIO as it run at my host for 10 min and additional 10 sec for IO/s calculation didn’t skew result much. But when I run LIO test it start confusing me. I have added one line of code displaying logical reads and sysdate before ./trigger and after wait command. Could you please help me understand following numbers:
SEC NAME VALUE
—– —————————————————————- ———-
35887 session logical reads 93202884
Tm 4
SEC NAME VALUE
—– —————————————————————- ———-
35891 session logical reads 124096780
So running time was 4 sec (35891 – 35887 ) and same value has been reported in Tm. Number of logical reads is 30 893 896 (124096780 – 93202884). Now time for AWR report numbers:
Statistic Total per Second per Trans
session logical reads 30,899,876 2,221,414.5 401,297.1
So total value is very close to my calculation by LIO per sec is different.
AWR is reporting 2,221,414 which is mean that 30,899,876 LIO has been done in 13.91 sec (30,899,876 / 2,221,414.5) which is almost same you can find in header of AWR report
Begin Snap: 326 06-Mar-12 10:08:17 36 .7
End Snap: 327 06-Mar-12 10:08:31 36 .7
But according to output from v$sysstat similar number of LIO has been done in 4 sec which is around 7 723 474 LIO / sec.
When I moved temporary AWR snapshot from line 16 to line 34 (all sessions are still waiting for trigger to run) AWR report looks closer to my calculations (30893894 LIO in 4 sec) :
Begin Snap: 328 06-Mar-12 10:20:32 58 1.3
End Snap: 329 06-Mar-12 10:20:36 34 .7
Statistic Total per Second per Trans
session logical reads 30,898,215 6,940,299.9 686,627.0
regards,
Marcin
Marcin,
Good find. You are right. The “before” awr snap should not be taken until after the sessions are connected and the 10 seconds of “dust settling.” I’ll post up a fix.
Hi Kevin,
Thanks for sharing the SLOB. At the end of the SLOB/misc/create_database_kit/cr_db.sql, the alter table is actually for the tablespace IOPS?
SQL> — End of pupbld.sql
SQL>
SQL> create BIGFILE tablespace IOPS datafile size 1G NOLOGGING ONLINE
2 PERMANENT EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT SPACE MANAGEMENT AUTO ;
Tablespace created.
SQL>
SQL> alter tablespace SLOB autoextend on next 200m maxsize unlimited;
alter tablespace SLOB autoextend on next 200m maxsize unlimited
*
ERROR at line 1:
ORA-00959: tablespace ‘SLOB’ does not exist
Yes…that part of the kit presumes that naming convention. I’ll look into that.
Here are my results. I will post in 2 different notes because I tested 2 different pieces of equipment
First one is an AMD 6276 chip. The server is a 2 socket, 16 core server with 128g of memory.
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ————— ————— ———- ———-
DB Time(s): 37.9 46.6 0.00 5.96
DB CPU(s): 30.3 37.2 0.00 4.76
Redo size: 15,791.3 19,408.2
Logical reads: 10,119,426.4 12,437,215.0
The AWR report can be found here.
http://dl.dropbox.com/u/23998484/awr_6276_40.txt
These numbers are really impressive for this hardware. Even better than most of the Intel chips that proceeded it.
Here are my results.
Second one is an Intel 2870. The server is a 2 socket, 10 core hyperthreading server with 128g of memory.
./runit.sh 0 40
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ————— ————— ———- ———-
DB Time(s): 37.2 43.4 0.00 5.29
DB CPU(s): 36.4 42.4 0.00 5.17
Redo size: 16,151.6 18,810.7
Logical reads: 12,927,371.4 15,055,624.9
The AWR report can be found here.
http://dl.dropbox.com/u/23998484/awr_2870_40.txt
These numbers bested the numbers in my exadata, and bested the AMD numbers.
There are certainly impressive. This new Intel E7 4S chip Rocks !!
Bryan, I’m confused. You say “The server is 2 socket” but later say “E7 4S chip rocks”
Can you confirm please the following:
1. Exactly how many E7 sockets
2. The SLOB runit.sh args in both cases.
Thanks!
Sorry, both servers are 2 socket.
The AMD tests were 32 processes (0 32), the Intel tests were 40 (0 40)
I actually got a little higher on intel with 43 (0 43), 13.2 million LIO’s
Thanks for the clarification, Bryan. Really cool numbers! I’m not surprised adding on just a few more than core-count for Intel with SMT helps. Would be interested to see where the SMT threads degrade the LIOPs rate. I certainly wouldn’t expect 80 SLOB sessions (in the LIO model) to produce more LIOPS than 40 but I can’t test that. There is likely a break point. Probably at about 56 sessions (70%…queueing theory).
I’ll try to get to it this week, along with SLB numbers. Are you at all interested in any PIO numbers ? the 2 were comparable with the EMC DMX3 I have hooked up.. That’s the best my storage folks could give me for this little test.
I also have a new storage array coming in next week with a lot of SSD to benchmark. Life is good 🙂 I will try to hook it up my little server farm, but the servers are due back to the vendor in a couple of weeks.
I’d love any PIO numbers… all good!
Let me know if you want a SERIOUS Tier 0 disk array to try out. We leave Flash based SSD’s in the dust…literally. I’m just starting my Oracle based benchmark testing, but everything I’ve done up to now has convinced me that the marketing claims around the world records Kove holds are real….we shall see shortly!
I am going to use this to test an XPD L2 Kove disk array. The connection is Fiber Channel, so I’m not going to see 5M IOPS like I could with Infiniband, but I should put just about anything else to shame with this configuration!
If I can get this working, I’ll move up to Infiniband. Question – can you use RAC with SLOB? I just don’t think I’m going to generate enough I/O with a single instance.
I have 48 cores and 500 GB of RAM in the server. 8 FC connections into the switch.
Thanks,
Kim
@Kim Pearson : You can use SLOB with RAC. You’ll have to modify runit.sh so that sqlplus connects with a TNS connect string. You’ll have to work out how many users to connect to each instance and change the connect string at each count. Not too difficult. I’d cook it into the kit myself but I don’t really touch RAC that much any more. If you get a model working feel free to paste in your runit.sh
Will do and thanks so much!
thank you for this tool and it’s method.
I tried to make myself comfortable with SLOB for LIO testing. But I’m somehow confused by my first findings.
According to https://kevinclosson.wordpress.com/2012/05/13/quick-reference-readme-file-for-slob-the-silly-little-oracle-benchmark/
it’s rare to run with more readers than CPUs.
My test node has 24/12/2 (CPU/Core/Socket) X5680 @ 3.33GHz, but the graph (X:readers, Y:TM) scales up to about 120:
https://docs.google.com/spreadsheet/oimg?key=0Aqc1gsEhfcN1dFlLVldBSHhhUHZzc0gyMVJRd0tQN1E&oid=1&zx=r22eyqhu58ln
Do I have some errors in my setup?
Martin
Do you have AWRs? Do they show any serious latch contention? I can envision piling up more reader.sql sessions may keep more LIOPS going if there is some sort of chaotic contention on just a few hash chain latches others not hashing to those buckets can continue to push LIOPS? Just a hypothesis because I have no more to go on that that.
In general a state of health is 1 reader.sql session per core or at most 2 per core (SMT) .. piling up beyond that likely indicates a problem. Look at what reader.sql does and the schema and try to finger any reason these processes would not run full bore. If running full bore with more sessions than cores means waits for CPU.
I have tons of AWRs: http://dl.dropbox.com/u/25789908/LIO_test1_bx.tar.bz2
there I tested with readers 1 2 4 8 16 24 32 40 48 64 72 80 88 96 104 112 120 128 – for each reader I had 3 runs: first as a kind of ‘cache warmup’, the other 2 to see if they are similar.
For that purpose I used a script meta_runit.sh attached. meta_test1_bx.txt is the direct output of this run (with a slightly modified runit.sh).
The highest wait (beside DB CPU) is “library cache: mutex X” in many of the AWRs. – I will search if I can get more library cache hash buckets. – too much parsing?
@martinberx : Uh, that one is ugly..that is a “wait” that burns CPU!
@martinberx: what is your hardware that is giving you MutexMadness(tm) headaches?
I have HP DL 380 G7 for sandbox, test and prod.
In this case: Host CPU (CPUs: 24 Cores: 12 Sockets: 2) – Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
hi kevin,
to test multiblock (storage throughput) can I just simply drop all the indexes?
I will test the machine for data warehouse.
so the scenario is 20% read 80% write (batch, etl)
100% read (reporting)
most of the workload are join and scan huge data.
Hi Kevin,
I’m trying to download SLOB from http://oaktable.net/articles/slob-silly-little-oracle-benchmark. But it seems there is only an update.
Could you (or anyone else following this blog) please help me to find it?
/Sergey
Hi Sergey, This is the entire kit. It is really small (on purpose)… enjoy. Don’t miss this additional info by the way:
https://kevinclosson.wordpress.com/2012/07/01/putting-slob-the-silly-little-oracle-benchmark-to-use-for-knowledge-sake/
hi Kevin,
I had to run zcat on Linux and then transfer tar to Solaris. It fixed an issue I had.
There are my results.
./runit.sh 0 8
Per Second
Logical reads: 1,109.86
Physical reads: 1,067.67
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file sequential read 1,373,047 9,111 7 99.6 User I/O
Statistic Total per Second per Trans
-------------------------------- ------------------ -------------- -------------
physical read IO requests 1,372,362 1,067.7 171,545.3
physical read bytes 11,242,389,504 8,746,382.0 1.40530E+09
physical read total IO requests 1,372,830 1,068.0 171,603.8
physical read total bytes 11,250,089,984 8,752,372.8 1.40626E+09
Looks nice for beginning.
I was instrumented by loadprof.sql as Karl Arao advises. I noticed that a pair physical reads per second/logical reads per second was floating during the test from 500 to 1400. Is that expected? Or should I dig my SAN setup for an answer. I’m not a Storage admin. But I’m pretty sure that my LUN doesn’t own its physical disks exclusively.
ok. A general question. I\m trying to investigate if SLOB is a good tool for me. Now we use UFS for our Solaris boxes. It is simple and pretty fast (and old as dinosaurs). But it isn’t flexible. That’s bad now when we are playing virtualization/consolidation games. I’m thinking about ASM or ZFS. There is one more factor adding complexity. We use EMC VNX with all these super smart Fast Cache, Virtual Pool things. Is SLOB good enough to be a proof of concept tool? Or should we spent time (which is money) creating test machines and reproducing production workload on them?
Thank you!
Sergey
Hi Sergey,
Only you can decide is SLOB is a useful tool. If you want to drive storage with an Oracle Instance I/O workload then, yes, I personally think it is about the best out there because it is free of any transactional overhead whilst performing true, Instance I/O.
As for the little snippet of day you’ve posted it looks to me like one of two things is true, either a) your random reads are touching spinning disk with end-to-end wait time of 6.6ms or b) the physical I/O is much faster than that and there is scheduling time being added to the end-to-end wait time. How many processor cores are there dealing with this 8 session SLOB test?
Can you send the entire AWR? I always like seeing these.
Hi Kevin,
awr – https://dl.dropbox.com/u/13990516/awr.txt.
there are two cores of Ultra SPARC VII+ allocated for the zone.
Sure, it is only me to make a decision 🙂 Thank you!
Could not run it because do not have AWR licensed.
It would be nice if SLOB use Statspack. Is there any reason that SLOB uses AWR not Statspack?
I’ve added runsp.sh and folder statspack/ with set of scripts that use Statspack. Testing now. Let me know if you would like me to send you these files.
Hi Kevin/Mark
following Mark H’s update on statspack (see post dated 2012) , any update on a SLOB statspack version (runsp.sh) as don’t have Diagnostics pack licence for AWR either.
Many thanks!
Stuart
No update. Sorry.
Hi Mark, any chance you send me this script/stuff you’ve made to run with slob ? obtechora@gmail.com
Thanks !
Hi Kevin,
This is neto from Brazil
How are you?
I did a small modification on reader.sql
SELECT /*+ FULL(cf1)*/ COUNT(c2) into x FROM cf1 where custid > v_r – 256 AND custid < v_r;
When I run SLOB, pretty much we are doing a parallel full table scan.
See below:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ————— ————— ———- ———-
DB Time(s): 253.4 1,362.3 40.39 2,140.78
DB CPU(s): 5.1 27.5 0.81 43.16
Redo size: 18,186.2 97,785.5
Logical reads: 329,478.3 1,771,574.8
Block changes: 36.1 194.1
Physical reads: 329,390.4 1,771,102.5
Event Waits Time(s) (ms) time Wait Class
—————————— ———— ———– —— —— ———-
direct path read 1,399,875 14,918 5 99.6 User I/O
DB CPU 302 2.0
control file sequential read 112 1 10 .0 System I/O
db file sequential read 44 1 18 .0 User I/O
db file scattered read 29 0 15 .0 User I/O
My question is:
Why do I have the same number of physical reads and logical reads?
Thank you very much my friend
All the best
neto
Neto my friend, thanks for stopping by and posting. I’ll take a look at this.
@Neto: Because you are scanning via the conventional path (SGA buffered)
Please explain 🙂
blocks that pass through the SGA are prefaced by a logical read
I’m sorry to say, but that’s not correct Kevin 🙂 As you can see Neto’s reads were direct path reads, so they bypassed buffer cache.
However, every physical read is always counted as consistent get (LIO), which is a little bit confusing. If you want find out exact number of LIO counted by direct reads, you should check “consistent gets direct” statistic.
@Pavol I said “blocks that pass through the SGA are prefaced by a logical read” so you countered with “…direct path reads, so they bypassed the buffer cache” What is your point?
Kevin,
Well, to be more precise you stated “Because you are scanning via the conventional path (SGA Buffered)”, whic is not correct since Neto had been obviously performing direct reads (99.6% waits were direct path reads). Of course I’m not going to argue with “blocks that pass through SGA…”
@Pavol ah, ok. I think with that reply you’ve clarified for other readers…unless you think I need to edit something? I don’t want to just dismiss your point. Let me know.
It was my fault. So would you mind edit my original post and move it under your from “January 23, 2013 at 10:08 am”. Or maybe I can do it my self with new reply and you can delete the other (messing) stuff.
Well it doesn’t seem to be correct. As you can see Neto’s reads were direct path reads, so they bypassed buffer cache.
Every physical read is always counted as consistent get (LIO), which is a little bit confusing. If you want find out exact number of LIO counted by direct reads, you should check “consistent gets direct” statistic. That means logical read number from AWR summary will be always greater then physical reads.
yep…I’ll edit my reply… you are right… I should know… I spent many many years looking at kcbget() 🙂
Hi Kevin
I came accross the following locking issue in my PIO test on version 11.2.0.3
Connection count = 99 (./runit 0 99) then everything is fine
select address, count(*) from V$SQL_SHARED_CURSOR where sql_id = ’68vu5q46nu22s’ group by address;
ADDRESS CHILD COUNT
0000000199EA0D90 99
Top wait event: cell single block physical read
When the connection count exceeds 100 (./runit 0 128)
Top wait event: library cache lock
select address, count(*) from V$SQL_SHARED_CURSOR where sql_id = ’68vu5q46nu22s’ group by address;
ADDRESS CHILD COUNT
00000001987EE1B0 100
000000019CA92FC0 100
0000000196ADFC68 100
000000019A96B8F0 100
000000019ADBE938 100
000000019BF95FD0 100
000000019D8870C0 42
……..
The reason is due to new 11.2.0.3 parameter (_cursor_obsolete_threshold default 100) described in ID 296377.1
You must set this parameter to a value greater then concurrent connections.
Thanks, Thomas.
Hi Kevin ,
I’m looking for @reader @writer sql scripts . Possible to post the syntax please .
Thanks
Shanker
Deprecated. Please see the opt of the post and follow the links to SLOB 2.
Hi Kevin,
Thanks for a great and useful IO testing tool.
I tested it on a single instance DB on 3PAR storage and i got unbelievable good numbers.
Here is some snippets from one awr report with 80 readers and o writers.
According to this AWR:
IOPS = 586,244.5
GB/s = 4.80288472E+09 = 4.5 GB/s
Cache Sizes Begin End
~~~~~~~~~~~ ———- ———-
Buffer Cache: 5,056M 5,056M Std Block Size: 8K
Shared Pool Size: 5,024M 5,024M Log Buffer: 16,528K
……
Statistic Total per Second per Trans
——————————– —————— ————– ————-
…..
parse count (total) 1,720 5.7 156.4
parse time cpu 42 0.1 3.8
parse time elapsed 55 0.2 5.0
physical read IO requests 177,143,749 586,244.5 16,103,977.2
physical read bytes 1,451,262,926,848 4.80285050E+09 1.3193299E+11
physical read total IO requests 177,144,393 586,246.7 16,104,035.7
physical read total bytes 1,451,273,266,176 4.80288472E+09 1.3193393E+11
Regards,
Pascal
@Pascal
Thanks for sharing your SLOB2 experiences. I presume this is either 4-node RAC of 2S servers or an 8S Proliant? Am I guess wrong? Please post the slob.conf for readers to learn the relationship between SLOB sessions * slob.conf->SCALE. This is the Active Data Set.
Have also some interesting numbers with IBM 730 Power 7+ 4.2GHz 8 cores (LPAR with 8 cores) with Flash Systems 820 4x 8 Gbit FC
1) disabled pre-feteching (just to make test consistent across platforms)
2) AWR Physical reads per second
Physical reads: 282,327.2
3) latency
storage level (at fibres): 150us
iostat db server: 300us
db level: 400us
it is over 35 000 IOPS per CPU CORE.
It seems Exadata X3-2 1/4 rack with 3 storages won’t catch 280k IOPS with such great latencies
slob.conf:
UPDATE_PCT=0
RUN_TIME=600
WORK_LOOP=0
SCALE=10000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=8
SHARED_DATA_MODULUS=0