Applies to:
Oracle Server - Enterprise Edition - Version 11.2.0.3 and laterInformation in this document applies to any platform.
Symptoms
- AWR Report shows contention for 'row cache objects' as the top wait:
Top 5 Timed Foreground Events Event Waits Time(s) Avg wait (ms) DB time Wait Class latch: row cache objects 58,291,873 130,700 2 63.18 Concurrency DB CPU 60,563 29.27 db file sequential read 2,994,299 14,461 5 6.99 User I/O library cache lock 3,179 1,662 523 0.80 Concurrency gc cr grant 2-way 1,233,503 981 1 0.47 Cluster
- After the database upgrade to 11.2.0.3, there is dramatic increase in the waits for the 'row cache objects' latch. Looking at the 'Latch Miss Sources' section in the AWR Report, the top location calling 'row cache objects' is 'kqrpre: find obj':
Latch Name Where NoWait Misses Sleeps Waiter Sleeps ... row cache objects kqrpre: find obj 0 10,332,326 8,906,124 row cache objects kqreqd: reget 0 7,888,165 8,503,625 row cache objects kqreqd 0 7,677,074 8,489,832 row cache objects kqrso 0 4,127 3,622 row cache objects kqrpre: init complete 0 143 88
The 'row cache objects' latch is called most frequently from 'kqrpre: find obj'. This module tries to find details of an object being parsed in the row cache. During parse, the row cache is searched. The process searches through a linked list protected by the 'row cache objects' latch to find the object in the cache that it needs. When other processes are also parsing and looking through the row cache this may cause contention; especially if the parse activity is excessive or unnecessary. - From the system state trace, a common stack can bee seen for the related processes:
()+492<-kslges()+2152<-kqreqd()+256<-kkoiqb()+9436<-kkooqb()+684<-kkoqbc()+2256<-apakkoqb()+140
<-apaqbdDescendents()+460<-apadrv()+2020<-opitca()+2112 <-kksFullTypeCheck()+72<-rpiswu2()
Most of the time in the stack appears to be spent in the function KKOIQB. This function makes row cache calls in order to find index names so that indexes for a table can be sorted by alphabetical name order. In the case of a cost tie between different plans the optimizer needs a consistent method of choosing one index over another. The index names is sorted alphabetically to achieve this; in other words, it picks the first index alphabetically in the event of a choice between two plans using indexes with equal costs.
Changes
The database was upgraded from 10.2.0.4 to 11.2.0.3 .
Cause
The cause is due to architectural changes between 10.2 and 11.2 in how Oracle does optimization. There are lot more query transformation in 11.2.0.3 than in 10.2.0.4 for queries with views or sub-queries. This results in more passes through optimization, which means more row cache latch calls.
(Furthermore, in this particular case the parameter optimizer_mode = first_rows_100 was being used increasing the number of optimization passes still further, since In 11.2, with first_rows optimization, the optimizer goes through more permutations at hard parse time to get the most optimal execution plan possible. In addition, this particular environment is doing 350 hard parses per second because the application did not use bind variables.)
Solution
There are actually several solutions/workarounds. The list is in order of preference from Support's perspective.
1) Rewrite the application to use bind variables to lower the hard parse rate from 350/second
2) Enable the cursor_sharing = force
3) Do not use first_rows_x optimization and use all_rows instead
4) Set optimizer_feature_enabled = 10.2.0.4
5) Set event 10089 level 1 to disable index sorting.
2) Enable the cursor_sharing = force
3) Do not use first_rows_x optimization and use all_rows instead
4) Set optimizer_feature_enabled = 10.2.0.4
5) Set event 10089 level 1 to disable index sorting.
Here are some example metrics on the batch job in question before and after some of the changes:
10g
55 threads No parameters set, optimizer_mode=FIRST_ROWS_100, <---Throughput is 11K. 'row cache objects' latch contention observed.
11g
60 threads No parameters set, optimizer_mode=FIRST_ROWS_100, <---Throughput is 8400. 'row cache objects' latch contention observed.
60 threads optimizer_mode=ALL_ROWS. <---Throughput is 13.5K. 10% less 'row cache objects' latch contention than the two examples above.
60 threads with event 10089 set. <---Throughput is 18304. Not any major contention.
65 threads with event 10089 set. <---Throughput is 21000. Not any major contention.
Niciun comentariu:
Trimiteți un comentariu