2011-05-04

To start a pre11gR2 database in 11gR2 Grid Infrastructure environment, node(s) must be pinned.

Document TitlePre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment (Doc ID 948456.1)

Purpose

The note lists most known issues in 11gR2 Grid Infrastructure (in short GI) + pre-11gR2 database environment.

Even workaround is available in some cases, it's recommended to apply patches whenever possible. Refer to note 1064804.1 for instructions to patch in mixed environment.

For CRS PSU/bundle patch information, refer to note 405820.1 for 10.2 and note 810663.1 for 11.1

Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment

1. Error creating or starting pre-11.2 database:

If it happens while creating database, DBCA fails with ORA-29702 and traces shows:

ORA-01501: CREATE DATABASE failed
ORA-00200: control file could not be created
ORA-00202: control file: '+DG_DATA/racdb/control01.ctl'
ORA-17502: ksfdcre:4 Failed to create file +DG_DATA/racdb/control01.ctl
ORA-15001: diskgroup "DG_DATA" does not exist or is not mounted
ORA-15077: could not locate ASM instance serving a required diskgroup


If it happens while starting existing database, sqlplus startup fails:

ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/prod/spfileprod.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATA/prod/spfileprod.ora
ORA-15077: could not locate ASM instance serving a required diskgroup
ORA-29701: unable to connect to Cluster Manager


Database alert.log shows the following message:

ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:skgxnqtsz failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: SKGXN not av
clsssinit ret = 21
interconnect information is not available from OCR
  WARNING: No cluster interconnect has been specified. Depending on
           the communication driver configured Oracle cluster traffic
           may be directed to the public interface of this machine.
           Oracle recommends that RAC clustered databases be configured
           with a private interconnect for enhanced security and
           performance

Solution:
To start a pre11gR2 database in 11gR2 Grid Infrastructure environment, node(s) must be pinned. To pin node(s), as root execute:
$GRID_HOME/bin/crsctl pin css -n <racnode1> <racnode2> <racnode3>
To find out whether node(s) is pinned or not:
$GRID_HOME/bin/olsnodes -t -n

2. If datafiles are located in ASM, DBCA fails to create database with error: 
"DBCA could not startup the ASM instance configured on this node. To processd with database creation using ASM you need the ASM instance to be up and running. Do you want to recreate the ASM instance on this node?"

DBCA trace (10g in $RDBMS_HOME/cfgtoollogs/dbca and 11g in $ORACLE_BASE/cfgtools/dbca) shows the following exception:

oracle.sysman.assistants.util.CommonUtils.getListenerProperties(CommonUtils.java:421)
oracle.sysman.assistants.util.asm.ASMAttributes.getConnection(ASMAttributes.java:150)
oracle.sysman.assistants.util.asm.ASMInstanceRAC.validateLocalASMConnection(ASMInstanceRAC.java:811)
oracle.sysman.assistants.util.asm.ASMInstanceRAC.validateASM(ASMInstanceRAC.java:595)
oracle.sysman.assistants.util.asm.ASMInstanceRAC.validateASM(ASMInstanceRAC.java:522)
oracle.sysman.assistants.util.asm.ASMInstanceRAC.validateASM(ASMInstanceRAC.java:515)
oracle.sysman.assistants.dbca.ui.StorageOptionsPage.validate(StorageOptionsPage.java:496)
oracle.sysman.assistants.util.wizard.WizardPageExt.wizardValidatePage(WizardPageExt.java:206)
....
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:151)
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:145)
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:137)
java.awt.EventDispatchThread.run(EventDispatchThread.java:100)
[AWT-EventQueue-0] [10:4:5:781] [StorageOptionsPage.validate:611]  ASM present but not startable, querying user..

Solution:

Due to unpublished bug 8288940 (fixed in 10.2.0.5), DBCA will fail if database files are located in ASM. patch 8288940 is platform independent and is available for 10.2.0.3, 10.2.0.4, 11.1.0.6 and 11.1.0.7 as a .jar file and needs to be applied to database home. For other RDBMS version where there's no patch, please refer to workaround section of bug 8520511 in Database Readme <Oracle Database Readme 11g Release 2 (11.2) -> Bug 8520511>

3. SRVCTL fails to start instance if OCR is located in an ASM diskgroup or with different permission/ownership. 
The racgimon log (located in $RDBMS_HOME/log/$HOST/racg/imon_${DBNAME}.log) shows the following message:
2009-10-17 11:20:22.093: [  OCROSD][7866809875]utopen:6':failed in stat OCR file/disk +DATA, errno=2, os err string=No such file or directory
2009-10-17 11:20:22.093: [  OCROSD][7866809875]utopen:7:failed to open OCR file/disk +DATA , errno=2, os err string=No such file or directory
2009-10-17 11:20:22.093: [  OCRRAW][7866809875]proprinit: Could not open raw device
2009-10-17 11:20:22.093: [ default][7866809875]a_init:7!: Backend init unsuccessful : [26]
..
2009-10-17 11:20:22.094: [ CSSCLNT][7866809875]clsssinit: Unable to access OCR device in OCR init.PROC-26: Error while accessing the physical storage OperatingSystem error [No such file or directory] [2]
2009-10-17 11:20:22.094: [    RACG][7866809875] [23974][7866809875][ora.default]: racgimon exiting clsz init failed

Solution:

Due to unpublished bug 8262786, if OCR is located in ASM or with different permission/ownership, srvctl will fail to start earlier database version.Fix for unpublished bug 8262786 is included in 10.2.0.4 CRS PSU4, 10.2.0.5, 11.1.0.7 CRS PSU4, and Windows 10.2.0.4 Patch 36 and needs to be applied to database home. The workaround is to use sqlplus to start the database instead of srvctl

@ 8312004 closed as dup
4. Database fails to start after restart of GI.
$GRID_HOME/log/$HOST/agent/crsd/application_<dbuser>/application_<dbuser>.log shows:
2009-11-05 14:31:19.922: [    AGFW][1342593344] Agent received the message: RESOURCE_START[ora.db10.db102.inst 1 1] ID 4098:632
..
2009-11-05 14:31:19.924: [    AGFW][1275476288] Executing command: start for resource: ora.db10.db102.inst 1 1
2009-11-05 14:31:19.924: [ora.db10.db102.inst][1275476288] [start] START action called.
2009-11-05 14:31:19.924: [ora.db10.db102.inst][1275476288] [start] Executing action script: /home/app/oracle/product/10.2/db/bin/racgwrap[start]
..
2009-11-05 14:31:22.781: [ora.db10.db102.inst][1275476288] [start] Enter user-name: Connected to an idle instance.
2009-11-05 14:31:22.781: [ora.db10.db102.inst][1275476288] [start]
2009-11-05 14:31:22.782: [ora.db10.db102.inst][1275476288] [start] SQL> ORA-01565: error in identifying file '+DATA/db10/spfiledb10.ora'
2009-11-05 14:31:22.782: [ora.db10.db102.inst][1275476288] [start] ORA-17503: ksfdopn:2 Failed to open file +DATA/db10/spfiledb10.ora
2009-11-05 14:31:22.782: [ora.db10.db102.inst][1275476288] [start] ORA-15077: could not locate ASM instance serving a required diskgroup
2009-11-05 14:31:22.782: [ora.db10.db102.inst][1275476288] [start]
2009-11-05 14:31:22.782: [ora.db10.db102.inst][1275476288] [start] ORA-01078: failure in processing system parameters

After GI restart, status of diskgroup:

$GRID_HOME/bin/crsctl stat res -t
..
ora.DATA.dg
              OFFLINE OFFLINE racnode1
              OFFLINE OFFLINE racnode2
..

Solution:

Due to unpublished bug 8448079, while stopping GI, ASM init.ora parameter asm_diskgroups will be nullified and some diskgroups will remain OFFLINE after restart of GI which cause pre-11.2 database fails to start. Fix for unpublished bug 8448079 is included in the 11.2.0.2, patch 8448079 exists for certain platform and needs to be applied to GI home.  The workaround is to add dependence of diskgroup to each instance:

$GRID_HOME/bin/crsctl modify res ora.db10.db102.inst -attr "REQUIRED_RESOURCES='ora.racnode2.ASM2.asm,ora.DATA.dg'"



5. SRVCTL fails to start service

For example:

$RDBMS_HOME/bin/srvctl start service -d b1 -s sb1

$GRID_HOME/bin/crsctl stat res

..
NAME=ora.b1.sb1.cs
TYPE=application
TARGET=ONLINE
STATE=ONLINE on eyrac1f

NAME=ora.b1.sb1.b11.srv
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

NAME=ora.b1.sb1.b12.srv
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

NAME=ora.eons
TYPE=ora.eons.type
TARGET=ONLINE           , ONLINE
STATE=ONLINE on eyrac1f, ONLINE on eyrac2f

Solution:

Due to unpublished bug 8373758, pre-11.2 srvctl will fail to start a service if the service is followed by a 11gR2 new resource in "crsctl stat res" output.  In the above example, ora.eons is a new resource in 11.2 and pre-11.2 srvctl can't parse its status properly. Fix for unpublished bug 8373758 is included in 10.2.0.4 CRS PSU4, 10.2.0.5, 11.1.0.7 CRS PSU2 and needs to be applied to database home.  The workaround is to:

    A. Create a dummy pre11.2 resource entry alphabetically after service entry.  In the example above, creating a dummy resource ora.b2.db should workaround the problem
        $RDBMS_HOME/bin/srvctl add database -d b2 -o $RDBMS_HOME
    B. Try to start all services for the database: $RDBMS_HOME/bin/srvctl start service -d b1
ORA-15025: could not open disk '/dev/rdsk/disk1'
ORA-27041: unable to open file
SVR4 Error: 13: Permission denied

And execution of setasmgidwrap fails with:

$GRID_HOME/bin/setasmgidwrap o=/home/oracle/10.2/bin/oracle
KFSG-00312: not an Oracle binary: '/home/oracle/10.2/bin/oracle'


Solution:


Due to bug 9575578, setasmgidwrap fails with pre-11.2 oracle binary. Fix for bug 9575578 is included in 11.2.0.2, patch 9575578 exists for certain platform and needs to be applied to GI home.


7. After removal of pre-11.2 CRS home, the following error reported while trying to start or stop database, or stop cluster:
CRS-5809: Failed to execute 'ACTION_SCRIPT' value of '/ocw/crs10/bin/racgwrap' for 'ora.db10.db'. Error information 'cmd /ocw/crs10/bin/racgwrap not found'
CRS-2680: Clean of 'ora.db10.db' on 'node1' failed

If GI is being stopped, the following will be reported:

CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'node1' has failed
CRS-2675: Stop of 'ora.crsd' on 'node1' failed
CRS-4000: Command Stop failed, or completed with errors.


Solution:

Due to bug 9257105, even upgrade finishes successfully, OCR configuration for pre-11.2 database still points to pre-11.2 CRS home. Fix for bug 9257105 is included in 11.2.0.1.2 and 11.2.0.2, unfortunately the fix itself is having regression which is being worked in unpublished bug 9678856. The workaround is to:

    A. As pre-11.2 database owner, execute the following command for each pre-11.2 database:
crsctl modify res ora.<dbname>.db -attr "ACTION_SCRIPT=$GRID_HOME/bin/racgwrap"

For example

crsctl modify res ora.db10.db -attr "ACTION_SCRIPT=/ocw/grid/bin/racgwrap"
    Or
    B. As pre-11.2 database owner, recreate database resource in OCR with note 1069369.1


8. Singleton service does not failover or uniform service does not stop after local node VIP resource failed or stopped:

$DBHOME/bin/srvctl config service -d racstr
rac_u PREF: racstr1 racstr2 AVAIL:
rac_s PREF: racstr1 AVAIL: racstr2

$DBHOME/bin/srvctl status service -d racstr
Service rac_u is running on instance(s) racstr1, racstr2
Service rac_s is running on instance(s) racstr1

$GRID_HOME/bin/crsctl status res ora.strdt01.vip
NAME=ora.strdt01.vip
TYPE=ora.cluster_vip_net1.type
TARGET=ONLINE
STATE=ONLINE on strdt01

Disable public network on node where instance racstr1 is running, VIP failover to another node:

$GRID_HOME/bin/crsctl status res ora.strdt01.vip
NAME=ora.racha602.vip
TYPE=ora.cluster_vip_net1.type
TARGET=ONLINE
STATE=INTERMEDIATE on racha603  <== Vip failover to other node

$DBHOME/bin/srvctl status service -d racstr
Service rac_u is running on instance(s) racstr1, racstr2  <== Service still running on racstr1
Service rac_s is running on instance(s) racstr1           <== Service did not failover to racstr2

Solution:
Due to unpublished bug 9039498, pre-11.2 database service will not failover or stop if local public network is down. Fix for bug 9039498 is included in 11.2.0.2, 12.1 and applies to GI home.



9. DBCA/srvctl fails to add instance/database with the following error:

PRKO-2010 : Error in adding instance to node: node1
PRKR-1008 : adding of instance dba21 on node node1 to cluster database dba2 failed.
CRS-2518: Invalid directory path '/home/oracle/product/11.1/db/bin/racgwrap'
CRS-0241:  Invalid directory path

Solution:

Due to bug 9767810, if pre-11.2 database is not installed on all nodes of the cluster, srvctl fails to add instance/database to OCR. bug 9767810 is fixed in 11.2.0.1.2 and applies to GI home. The workaround is to copy pre-11.2db/bin/racgwrap to all nodes in the cluster, and make sure it's accessible by pre-11.2 database owner.

10. Pre-11.2.0.2 database fails if any private network fails.

It happens as pre-11.2.0.2 instance isn't aware of Redundant Interconnect feature (multiple active cluster_interconnect in "oifcfg getif"), see note 1210883.1 for more about HAIP

Solution:
As HAIP feature can not be disabled, for environment with GI 11.2.0.2 with pre-11.2.0.2 database/ASM, it's recommended to use OS level bonding solution for private network as in earlier clusterware version.

11. If OCR is located on ASM, pre-10.2.0.5 database can not get cluster interconnect even it's configured in OCR:
$ oifcfg getif
eth3  120.0.0.0  global  public
eth1  10.1.0.0  global  cluster_interconnect

Instance alert.log:

ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:skgxnqtsz failed with ! status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: SKGXN not av
clsssinit ret = 21
interconnect information is not available from OCR
 WARNING: No cluster interconnect has been specified. Depending on
          the communication driver configured Oracle cluster traffic
          may be directed to the public interface of this machine.
          Oracle recommends that RAC clustered databases be configured
          with a private interconnect for enhanced security and
          performance.


Solution:

For 11gR2 GI + pre-10.2.0.5 database environment, it's not recommended to place OCR on ASM. Other workaround is to set init.ora parameter cluster_interconnects and ignore the warning.
@ 9865139 closed as duplicate of 5389506 which is only fixed in 10.2.0.5 and above


12. After clusterware upgrade, ocrdump shows key "SYSTEM.ORA_CRS_HOME" is not updated to new clusterware home.
If previous clusterware home is renamed or removed, pre-11gR2 SRVM client (srvctl, DBCA, DBUA etc) fails.
$ srvctl <command> database -d racdb

PRKA-2019 : Error executing command "/ocw/b201/bin/crs_stat". File is missing.

$ ocrdump -stdout -keyname SYSTEM.ORA_CRS_HOME

[SYSTEM.ORA_CRS_HOME]
ORATEXT : /ocw/b201

In this example, /ocw/b201 is pre-upgrade clusterware home which is not updated.

Solution:
Due to bug 10231584, OCR key SYSTEM.ORA_CRS_HOME is not updated during upgrade. The workaround is to execute the following as root on any node:

# ${11.2.0.2GI_HOME}/bin/clscfg -upgrade -lastnode -g <asmadmin>

Note: <asmadmin> is oinstall group typically, but it should be asmadmin in job role separation environment.



13. Service does not stop/failover after stopping corresponding instance

$DB_HOME/log/$HOST/racg/imon_<DB_NAME>.log

2011-03-14 23:22:52.228: [    RACG][1108842816] [14693][1108842816][ora.SD302.SD3021.inst]: CLSR-0521: Event ora.ha.racdb.racdb1.inst.down is rejected by EVM daemon
2011-03-14 23:22:52.228: [    RACG][1108842816] [14693][1108842816][ora.SD302.SD3021.inst]: clsrcepevm: clsrcepostevt status = 17

2011-03-14 23:22:52.228: [    RACG][1108842816] [14693][1108842816][ora.SD302.SD3021.inst]: clsrcep:evm post return 1
2011-03-14 23:22:54.458: [    RACG][1108842816] [14693][1108842816][ora.SD302.SD3021.inst]: CLSR-0521: Event sys.ora.clu.crs.app.trigger is rejected by EVM daemon

2011-03-14 23:23:06.495: [    RACG][1108842816] [14693][1108842816][ora.SD302.SD3021.inst]: clsrcexecut: env _USR_ORA_PFILE=/ocw/grid/racg/tmp/ora.racdb.racdb1.inst.ora

2011-03-14 23:23:06.495: [    RACG][1108842816] [14693][1108842816][ora.SD302.SD3021.inst]: clsrcexecut: cmd = /database/db205/bin/racgeut -e _USR_ORA_DEBUG=0 -e ORACLE_SID=racdb1 540 /database/db205/bin/racgimon stop ora.racdb.racdb1.inst


$GRID_HOME/log/$HOST/evmd/evmd.log

2011-02-16 06:13:55.965: [  EVMAPP][4163668704] EVMD Started
..
2011-02-16 06:13:55.980: [    EVMD][4163668704] Could not open /ocw/grid/evm/admin/conf/evmdaemon.conf Reconfiguration aborted.

Solution:
Due to bug 12340700, the following files has wrong permission:

ls -l $GRID_HOME/evm/admin/conf
-rw------- 1 root root 3032 Feb 19 14:42 evm.auth
-rw------- 1 root root 2318 Feb 19 14:42 evmdaemon.conf
-rw------- 1 root root 4871 Feb 19 14:42 evmlogger.conf
Unpublished bug 12340700 is fixed in 11.2.0.3 and applies to GI home. The workaround is to fix the permission manually with "chmod" command and restart GI. The expected ownership and permission are:

ls -l $GRID_HOME/evm/admin/conf
-rw-r--r-- 1 root root 3032 Feb 19 14:42 evm.auth
-rw-r--r-- 1 root root 2318 Feb 19 14:42 evmdaemon.conf
-rw-r--r-- 1 root root 4871 Feb 19 14:42 evmlogger.conf



References

BUG:9257105 - CRSCTL STOP CRS REPORTS CRS-5809
BUG:9575578 - KFSG-00312: NOT AN ORACLE BINARY, USING SETASMGIDWRAP
NOTE:1064804.1 - Apply Grid Infrastructure/CRS Patch in Mixed environment
NOTE:1069369.1 - How to Delete or Add Resource to OCR
NOTE:1210883.1 - 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip
NOTE:405820.1 - 10.2.0.X CRS Bundle Patch Information
NOTE:810663.1 - 11.1.0.X CRS Bundle Patch Information

Niciun comentariu:

Trimiteți un comentariu