You should apply 11.2.0.2.5 (PSU5 for 11GR2 ) in order to fix bug 9829397
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.3 - Release: 11.2 to 11.2IBM AIX on POWER Systems (64-bit)
Symptoms
Long log file sync times on AIX 6.1 (+ later) and Oracle 11.2Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
db file sequential read 6,456,003 43,555 7 42.00 User I/O
log file sync 5,192 21,585 4157 20.81 Commit
read by other session 1,436,853 13,718 10 13.23 User I/O
log buffer space 4,749 8,790 1851 8.48 Configuration
DB CPU 5,518 5.32
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
db file sequential read 6,456,003 43,555 7 42.00 User I/O
log file sync 5,192 21,585 4157 20.81 Commit
read by other session 1,436,853 13,718 10 13.23 User I/O
log buffer space 4,749 8,790 1851 8.48 Configuration
DB CPU 5,518 5.32
Performing a TRUSS of LGWR e.g. truss -dD -o lgwrtruss.log -p <ospid>
you can see that the time reported next to lseek may be very high, and warning messages are being written to the LGWR trace. This is the elapsed time since the previous call aio_nwait64.
0.0009: aio_nwait64(0x0000000000001000, 0x0000000000000012, 0x0FFFFFFFFFFEC0A0, 0x800000000000D032, 0x3FFC000000000003, 0x000000000000011C, 0x0000000000000000, 0x0000000000000000) = 0x0000000000000012
2.4272: lseek(20, 0, 1) = 1689988 <====== Time between lseek and previous call is high
kwrite(20, "\n * * * 2 0 1 1 - 0 3".., 29) = 29
kwrite(21, " 3 ? + O M 9 ~ r T\n", 10) = 10
0.0002: lseek(20, 0, 1) = 1690017
kwrite(20, " W a r n i n g : l o g".., 52) = 52 <====== Warning message written to LGWR trace
kwrite(21, " 3 ? V S 2 ~ 0 q\n", 9) = 9
kwrite(20, "\n", 1) = 1
0.0002: thread_post(12321045) = 0
2.4272: lseek(20, 0, 1) = 1689988 <====== Time between lseek and previous call is high
kwrite(20, "\n * * * 2 0 1 1 - 0 3".., 29) = 29
kwrite(21, " 3 ? + O M 9 ~ r T\n", 10) = 10
0.0002: lseek(20, 0, 1) = 1690017
kwrite(20, " W a r n i n g : l o g".., 52) = 52 <====== Warning message written to LGWR trace
kwrite(21, " 3 ? V S 2 ~ 0 q\n", 9) = 9
kwrite(20, "\n", 1) = 1
0.0002: thread_post(12321045) = 0
The above shows LGWR taking a long time and then writing the below message to the LGWR trace file.
Warning: log write elapsed time XXXXms, size XKB
If we look just at the times since the previous call, by looking at lseek times , they jump when load becomes slightly higher. Write times were good though.
0.0003: lseek(20, 0, 1) = 1686342
0.0002: lseek(20, 0, 1) = 1686564
0.0002: lseek(20, 0, 1) = 1686757
0.0002: lseek(22, 0, 0) = 0
0.0002: lseek(482, 0, 0) = 0
0.0002: lseek(483, 0, 0) = 0
0.0026: lseek(20, 0, 1) = 1687283
2.1588: lseek(20, 0, 1) = 1687285 <== elapsed time since previous call (aio_nwait64) jumps, which we see
2.1983: lseek(20, 0, 1) = 1687367 represented next to the lseek call.
2.1029: lseek(20, 0, 1) = 1687449
2.1846: lseek(20, 0, 1) = 1687531
0.0002: lseek(20, 0, 1) = 1686564
0.0002: lseek(20, 0, 1) = 1686757
0.0002: lseek(22, 0, 0) = 0
0.0002: lseek(482, 0, 0) = 0
0.0002: lseek(483, 0, 0) = 0
0.0026: lseek(20, 0, 1) = 1687283
2.1588: lseek(20, 0, 1) = 1687285 <== elapsed time since previous call (aio_nwait64) jumps, which we see
2.1983: lseek(20, 0, 1) = 1687367 represented next to the lseek call.
2.1029: lseek(20, 0, 1) = 1687449
2.1846: lseek(20, 0, 1) = 1687531
Changes
Database Upgraded to 11.2 or a AIX version, TL or SP is appliedCause
The operating system, or file system is not correctly configured.Solution
It is important to ensure that all AIX system parameters are correctly configured and that the file system block size aligns with the block size of the online redo logs.There have been multiple occasions where an upgrade of Oracle to 11g, or the application of a later Technology Level (TL) and Service Pack (SP) has seem a sudden increase in log file sync wait times with the symptom being high delta between aio_nwait64 and lseek calls seen during truss and high log file sync times.
Even though the only change may be an Oracle upgrade or a OS patch, multiple issues with the above symptoms have been addressed by either correctly configuring the OS or filesystem containing online redo.
Some previous changes that have resolved long LFS issue in Oracle 11.2 on AIX include :-
- Disabling Disk I/O Pacing for the filesystem with Redo Logs. (Enabled by default in AIX 6.1)
- Correctly aligning file system and redo log block size to avoid demoted I/O's
- Correctly setting vmo and network parameters
Please review the following IBM whitepaper "Oracle Architecture and Tuning on AIX" and ensure that minimum recommended settings are met.
Disabling or delaying the warning message from being written to the LGWR trace file also may help improve LFS times. In 10.2 set event 10468 level 4. In 11.2, you can vary the message warning threshold using
"_long_log_write_warning_threshold" which sets the threshold for long log write warning messages in ms.
If the everything is correctly configured then applying the fix for Bug 12412983 - AIX: "asynch descriptor resize" may also help. This contains both the fix of bug 12412983 and bug 9829397.
There is also a merge patch 12986882: MERGE ON TOP OF 11.2.0.2.0 FOR BUGS 9829397 12412983.
Note:
The following recommendation may help in alleviating long 'log file sync' waits
However we highly advise you to contact IBM to verify any changes to configuration on the OS level
Please also ensure all recommended OS patches have been installed
Niciun comentariu:
Trimiteți un comentariu