Applies to:
Oracle Server - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]Information in this document applies to any platform.
Description
On 11.2.0.3 (prior to 11.2.0.3.4 PSU), one of the cluster nodes may experience CRS restart intermittently (no node reboot) with ocssd message point to "clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1". As the result, ASM and database instance on the affected node also get restarted. It is caused by a racing condition when checking voting disk availability from different thread. It is reported and fixed in an unpublished bug 13869978.
Occurrence
It only affects cluster with 1 voting disk/file configed for Grid Infrastructure 11.2.0.3 prior to applying 11.2.0.3.4 PSU.
Symptoms
<grid-home>/log/<node>/cssd/ocssd.log shows the following:
2012-05-28 07:45:32.823: [ CSSD][1075423552](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2012-05-28 07:45:32.835: [ CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [ CSSD][1075423552]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-05-28 07:45:32.835: [ CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [ CSSD][1075423552](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-05-28 07:45:32.849: [ CSSD][1075423552]
----- Call Stack Trace -----
2012-05-28 07:45:32.857: [ CSSD][1075423552]calling call entry argument values in hex
2012-05-28 07:45:32.858: [ CSSD][1075423552]location type point (? means dubious value)
2012-05-28 07:45:32.859: [ CSSD][1075423552]-------------------- -------- -------------------- ----------------------------
2012-05-28 07:45:32.881: [ CSSD][1075423552]clssscExit()+740 call kgdsdst() 000000000 ? 000000000 ?
2012-05-28 07:45:32.884: [ CSSD][1075423552]clssnmvDiskCheck()+ call clssscExit() 2AAAAC477780 ? 000000002 ?
2012-05-28 07:45:32.887: [ CSSD][1075423552]clssnmvDiskPingMoni call clssnmvDiskCheck() 2AAAAC477780 ? 2AAAAC0A3C40 ?
2012-05-28 07:45:32.888: [ CSSD][1075423552]torThread()+423 04019A0B8 ? 000000000 ?
2012-05-28 07:45:32.890: [ CSSD][1075423552]clssscthrdmain()+25 call clssnmvDiskPingMoni 2AAAAC477780 ? 2AAAAC0A3C40 ?
2012-05-28 07:45:32.835: [ CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [ CSSD][1075423552]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-05-28 07:45:32.835: [ CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [ CSSD][1075423552](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-05-28 07:45:32.849: [ CSSD][1075423552]
----- Call Stack Trace -----
2012-05-28 07:45:32.857: [ CSSD][1075423552]calling call entry argument values in hex
2012-05-28 07:45:32.858: [ CSSD][1075423552]location type point (? means dubious value)
2012-05-28 07:45:32.859: [ CSSD][1075423552]-------------------- -------- -------------------- ----------------------------
2012-05-28 07:45:32.881: [ CSSD][1075423552]clssscExit()+740 call kgdsdst() 000000000 ? 000000000 ?
2012-05-28 07:45:32.884: [ CSSD][1075423552]clssnmvDiskCheck()+ call clssscExit() 2AAAAC477780 ? 000000002 ?
2012-05-28 07:45:32.887: [ CSSD][1075423552]clssnmvDiskPingMoni call clssnmvDiskCheck() 2AAAAC477780 ? 2AAAAC0A3C40 ?
2012-05-28 07:45:32.888: [ CSSD][1075423552]torThread()+423 04019A0B8 ? 000000000 ?
2012-05-28 07:45:32.890: [ CSSD][1075423552]clssscthrdmain()+25 call clssnmvDiskPingMoni 2AAAAC477780 ? 2AAAAC0A3C40 ?
For some cases, the following may show up in ocssd.log:
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmFindVFByVDIN: Requested guid 0b11163b-77614f16-bf6dea8e-e0b9a98b, vdisk guid 0b11163b-77614f16-bf6dea8e-e0b9a98b (0000000007D8E248) - len 16, vfile (0000000007D8B980), link (0000000007D8B980)
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmFindVFByVDIN: Voting file not found - queue(0000000007CF8AC0), prev (0000000007D8B980), next (0000000007D8B980)
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmvDiskCheck: No voting file found for guid 0b11163b-77614f16-bf6dea8e-e0b9a98b
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmFindVFByVDIN: Voting file not found - queue(0000000007CF8AC0), prev (0000000007D8B980), next (0000000007D8B980)
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmvDiskCheck: No voting file found for guid 0b11163b-77614f16-bf6dea8e-e0b9a98b
Usually, if there is a voting disk IO issue, the following will be seen in ocssd.log before cssd aborts the node:
2012-05-22 14:13:21.939: [ CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 75% maximum time, 27000 ms, will be considered unusable in 6640 ms
..
2012-05-22 14:13:26.408: [ CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 90% maximum time, 27000 ms, will be considered unusable in 2170 ms
OR
If access to voting disk is down instead of slow, an OS error will be printed.
..
2012-05-22 14:13:26.408: [ CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 90% maximum time, 27000 ms, will be considered unusable in 2170 ms
OR
If access to voting disk is down instead of slow, an OS error will be printed.
Workaround
Use 3 or more voting disks/files instead of 1.
If the voting disk is on ASM, move the voting disk to a normal or high redundancy diskgroup. Please refer to note 428681.1 OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) for instructions to move voting disks.
If the voting disk is on ASM, move the voting disk to a normal or high redundancy diskgroup. Please refer to note 428681.1 OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) for instructions to move voting disks.
As a best practice, It is recommended to config multiple voting disks.
Patches
The bug 13869978 fix has been included in 11.2.0.3 Grid Infrastructure PSU 4 and above. Please apply 11.2.0.3.4 GI PSU (patch 14275572).
Alternatively, interim patch 13869978 has been provided for 11.2.0.3.2 and 11.2.0.3.3 PSU on various platform, please check My Oracle Support "Patches & Updates" for availability.
History
Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community
Niciun comentariu:
Trimiteți un comentariu