The multiple buffer pools feature of Oracle has always seemed much underappreciated. Back in the late 90s we found the feature invaluable when running audited TPC-C benchmarks on Sequent gear. With that workload it is critical to be able to size blocks according to the row size and modify rate of certain relations. I think there may be a perception that this feature is a benchmark special when in fact it can be very helpful.
Half Full or Half Empty
Some folks don’t think about their SGA block buffer pool much beyond the fundamental aspect of having blocks there to satisfy logical reads. Since it is managed with a least recently used (LRU) algorithm, I think folks generally leave it at that. The problem is that if you take a pessimistic look at LRU it is not always sufficient. That is, evicting the least recently used blocks isn’t sufficient if the blocks shouldn’t really be in the cache in the first place. There are some blocks that should simply not remain in the cache and LRU is not aggressive enough to “protect the cache”, as I put it. As an aside, this sort of dovetails in with the thread I have going about saving the overhead of caching certain filesystem data (e.g., compressed archived redo log files) by implementing direct I/O.
The Art of Multiple Buffer Pools Tuning
I think it is an art. The general approach is to figure out what is consuming cache and whether or not you want that particular object to be consuming the cache. The following is a quote from a whitepaper I wrote entitled Buffer Cache Performance Analysis and Multiple Buffer Pools in Oracle9i:
Any table or index sustaining a large number of reads should also have a commensurate presence in the cache. If not, it is safe to surmise that the blocks are being read at random and therefore are not being shared.
Although that paper is old, I think the principles apply quite well even today. Note, I have not done this testing with Oracle10g yet so the queries in the paper may not be correct.
A Current Topic
In a recent thread on the oracle-l list, fellow OakTable Network member Mark Bobak wrote:
I went through an exercise a while ago where I tried to determine which segments were candidates for a KEEP or RECYCLE pool.
The query I came up with is:
select vss.owner,
vss.object_name, vss.statistic_name, vss.value,
ds.bytes segsize, ds.buffer_poolfrom v$segment_statistics vss, dba_segments ds
where vss.statistic_name =’physical reads’
and vss.value > 5000000 —You may need to play with this threshold value for your environment
and ds.segment_type = vss.object_type
and ds.segment_name = vss.object_name
and ds.owner=vss.owner
and ds.buffer_pool = ‘DEFAULT’
order by value
/Now, this helped me identify the segments that had the largest levels of physical IO, and the size of the segment. The idea is, “show me which segments are in the DEFAULT pool and are doing the largest numbers of physical reads, and show me how big those segments are.”
Mark continued with:
Next, ask yourself these questions:
– How much memory do I have to dedicate to KEEP and/or RECYCLE pools?
– Of the segments returned from the query above, which ones are small, and easy to entirely cache? These are candidates for KEEP pool.
– Which are entirely too large to consider caching? These are candidates for RECYCLE pool.
I hope my old Oracle9i paper and Mark’s oracle-l post are helpful.
Hey Kevin,
Shortly after I posted that script, someone pointed out that it doesn’t account for ‘physical reads direct’, which are in the ‘physical reads’ statistic, and in some environments (DW, for example), can have a significant impact. Being that I come from an OLTP-centric view of the world, I hadn’t considered that. So, I hacked the query a bit to consider ‘physical reads direct’, and here’s the new version:
column owner format a20
column statistic_name format a15
select * from(
select vss1.owner,
vss1.object_name,
vss1.statistic_name,
vss1.value-vss2.value value,
ds.bytes/1048576 “SEGSIZE (MB)”,
ds.buffer_pool
from v$segment_statistics vss1,
v$segment_statistics vss2,
dba_segments ds
where vss1.statistic_name =’physical reads’
and vss2.statistic_name = ‘physical reads direct’
and ds.segment_type = vss1.object_type
and ds.segment_name = vss1.object_name
and ds.owner=vss1.owner
and ds.segment_type = vss2.object_type
and ds.segment_name = vss2.object_name
and ds.owner=vss2.owner
and ds.buffer_pool = ‘DEFAULT’
order by (vss1.value-vss2.value) desc
)
where rownum
just thinking aloud here:
I wonder if this classification into what is desirable for short term caching -RECYCLE as oppsoed to KEEP or normal – and what should be in the cache forever shouldn’t rather be driven by the *type* of block rather than its frequency of use?
for example: ideally, I’d love to have the root node of any frequently referenced index living forever in the cache while the actual index leaf blocks could be falling off the deep end.
would there be a way of implementing this other than faith in the cache management algorithms?
that faith is what got us here in the first place…
I’m not convinced there is enough flexibility in the cache management algorithms to allow correct management to take place. for example, a data block should never pre-empt an index root block in the cache other than in extreme conditions: 0 use + time-to-live exceeded, that sort of thing.
this area is ripe for some serious research, me theenks. good to see you folks getting stuck into it.
Noons,
As usual, very nice follow up. I know it is moot now, but back when we were working on NUMA optimizations for Oracle in the Sequent port, we were investigating memory-locality and replication techniques for very hard hit buffers. It became clear to us that very hard hit memory lines would be better off replicated into each local memory. The sorts of ideas you are mulling over here were actually in our minds then. We were looking at using hints from the Oracle kernel combined with reference counting in our NUMA engine on each building block. But I digress…
I think it all comes down to the fact that relatively speaking, memory is cheap. So effective buffer pool utilization is just not an important battle. For instance, when Oracle 6.0.27 was deployed on the typical SMP of that time frame, the ratio of main memory to CPU clock speed was on the order of 256KB/MHz. I’ve got Proliant DL585s around here that are configured with 2048MB/MHz.
Here is a new version with some refinements (accounting for partitioned objects, performance improvement):
with segstat as
(
select object_type,
object_name,
owner,
subobject_name,
max(case statistic_name when ‘physical reads’ then value end) pr,
max(case statistic_name when ‘physical reads direct’ then value end) prd
from v$segment_statistics vss
where statistic_name in (‘physical reads’, ‘physical reads direct’)
group by object_type, object_name, owner, subobject_name
)
select st.owner,
st.object_name,
st.subobject_name,
(st.pr-st.prd) pr_prd_diff,
ds.bytes/power(1024,2) mb
from dba_segments ds, segstat st
where ds.owner=st.owner
and ds.segment_name=st.object_name
and ds.segment_type=st.object_type
and nvl(ds.partition_name, ds.segment_name)=nvl(st.subobject_name, st.object_name)
and ds.buffer_pool=’DEFAULT’
order by st.pr-st.prd desc;
Thank you for the sql syntax for isolating potential objects for the keep and recycle pool, I suppose the art of this exercise is to determine how large to make each pool?
what threshold do you use to determine to put an object in the keep or recycle pool?
any advice???
One of the 10g reports addmrpt.sql will give some general advice on this subject
Any ideas
thanks in advance
Carrie
Hi Carrie,
The paper I reference in this post has quite a bit of information about what to direct to the various pools. Did you get the paper?
Hii kevin
Well i can’t restrict myself from throwing queries towards you. Please take that as my respect for you.
Discussing oracle multiple buffer pools – i too agree of the advantages of the multiple buffer pools.
I my environment we have has the setting of DB_FILE_MULTIBLOCK_READ_COUNT(128) * Oracle block size (8K)= 1Mb ( which is the max IO chunk, that can be handled by vxfs software). The Filesystem block size is also 8k so as not to have more IO request per oracle block.
What makes me wonder is – We are using buffered IO on vxfs filesystem on HP-UX and oracle 10g. So whatever be the size of oracle blocks the OS kernel or disk drivers are supposed to build batches of optimum i/o requests before placing it to the storage, isnt it ?
Lodh,
I can’t speak intelligently about this recipe because I can’t imagine why anyone would use buffered I/O in that configuration. Why not ODM or even quick I/O ?