2012-03-06

CRS daemon depends on private interconnect

How to Modify Private Network Interface in 11.2 Grid Infrastructure [ID 1073502.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.

Goal

The purpose of this document is to demonstrate how to change the private network interface configuration stored in the OCR. This may be required if the name of the interface for the private network (cluster interconnect) needs to be changed at the OS level, for example, the private network is configured on a single network interface eth0, now you want to replace it with a bond interface bond0 and eth0 will be part of the bond0 interface. It also includes command for adding/deleting a private network interface.

Solution

As of 11.2 Grid Infrastructure, the CRS daemon (crsd.bin) now has a dependency on the private network configuration stored in the gpnp profile and OCR.  If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible. Therefore care needs to be taken when making modifications to the configuration of the private network. It is important to perform the changes in the correct order.


Note: If only private network IP is going to be changed, the subnet and network interface remain same (for examples changing private IP from 192.168.0.1 to 192.168.0.10), simply shutdown GI stack, make IP modification at OS level (like /etc/hosts, network config etc) for private network, then restart GI stack will complete the task.

The following procedures apply when subnet or network interface name also requires change.


Please take a backup of profile.xml on all cluster nodes before proceeding, as grid user:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk


To modify the private network (cluster_interconnect):

1. Ensure CRS is running on ALL cluster nodes in the cluster

2. As grid user, add new interface:
 
Find the interface which needs to be removed. For example:
$ oifcfg getif

eth1 100.17.10.0 global public
eth0 192.168.0.0 global cluster_interconnect
Here the eth0 interface will be replaced by bond0 interface.

Add new interface bond0:
$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

For example:
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
This can be done with -global option even if the interface is not available yet, but this can not be done with -node option if the interface is not available, it will lead to node eviction.

If the interface is available on the server, subnet address can be identified by command:
$ oifcfg iflist

It lists the network interface and its subnet address. This command can be run even if CRS is not up and running. Please note, subnet address might not be in the format of x.y.z.0. For example, it can be:
$ oifcfg iflist
lan1 18.1.2.0
lan2 10.2.3.64        << this is the private network subnet address associated with privet network IP: 10.2.3.86

If the scenario is just to add a 2nd private network, for example: new interface is eth3 with subnet address: 192.168.1.96, then issue:
$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect

Verify the change:
$ oifcfg getif


3. Shutdown CRS on all nodes and disable the CRS  as root user:
# crsctl stop crs
# crsctl disable crs

4. Make the network configuration change at OS level as required, ensure the new interface is available on all nodes after the change.
$ ifconfig -a
$ ping <private hostname>

5. Enable CRS and restart CRS on all nodes as root user:
# crsctl enable crs
# crsctl start crs

6. Remove the old interface:
$ oifcfg delif -global eth0

Note #1.  This step is not required for adding 2nd interface scenario.
         #2. If the new interface is added without removing the old interface, eg: old interface still available when CRS restart, then after step 6, CRS needs to be stop and start again to ensure the old interface is no longer in use. 

Something to note:

1. If underlying network configuration has been changed, but oifcfg has not been run to make the same change,  then upon CRS restart the CRSD will not be able to start.

The crsd.log will show:
2010-01-30 09:22:47.234: [ default][2926461424] CRS Daemon Starting
..
2010-01-30 09:22:47.273: [ GPnP][2926461424]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=7153, tl=3, f=0
2010-01-30 09:22:47.282: [ OCRAPI][2926461424]clsu_get_private_ip_addresses: no ip addresses found.
2010-01-30 09:22:47.282: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
2010-01-30 09:22:47.283: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
[ OCRAPI][2926461424]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2010-01-30 09:22:47.285: [ OCRAPI][2926461424]a_init:13!: Clusterware init unsuccessful : [44]
2010-01-30 09:22:47.285: [ CRSOCR][2926461424] OCR context init failure. Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7]
2010-01-30 09:22:47.285: [ CRSD][2926461424][PANIC] CRSD exiting: Could not init OCR, code: 44
2010-01-30 09:22:47.285: [ CRSD][2926461424] Done.
Above errors indicate a mismatch between OS setting (oifcfg iflist) and gpnp profile setting profile.xml.

Workaround: restore the OS network configuration back to the original status, start CRS. Then follow above steps to make the changes again.
Please consult with Oracle Support Service if after restoring OS network configuration, CRS still could not start.


2. If any one node is down in the cluster, oifcfg command
will fail with error:
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

Workaround: start CRS on the node where it is not running. Ensure CRS is up on all cluster nodes.

3. If a user other than Grid Infrastructure owner issues above command, it will fail with same error:
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

Workaround: ensure to login as Grid Infrastructure owner to perform such command.

4. From 11.2.0.2 onwards, if attempt to delete the last private interface (cluster_interconnect) without adding a new one first, following error will occur:

PRIF-31: Failed to delete the specified network interface because it is the last private interface

Workaround: Add new private interface first before deleting the old private interface.

5. If CRS is down on the node, the following error is expected:
$ oifcfg getif
PRIF-10: failed to initialize the cluster registry

Workaround: Start the CRS on the node

Niciun comentariu:

Trimiteți un comentariu