11.2.0.2 Grid Infrastructure upgrade/install on >1 node cluster failing with "gipchaLowerProcessNode: no valid interfaces found to node" in crsd.log (Doc ID 1280234.1)
Symptoms
11.2.0.2 grid infrastructure upgrade or install on >1 node clusterrootcrs.pl is failing and the following is found in the crsd.log
...
2010-11-29 10:52:38.603: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 2614824036 ms, node 111ea99b0 { host 'racdb1', haName '1e0b-174e-37bc-a515', srcLuid 2612fa8e-3db4fcb7, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [55 : 55], createTime 2614768983, flags 0x4 }
2010-11-29 10:52:42.299: [ CRSMAIN][515] Policy Engine is not initialized yet!
2010-11-29 10:52:43.554: [ OCRMAS][3342]proath_connect_master:1: could not yet connect to master retval1 = 203, retval2 = 203
2010-11-29 10:52:43.554: [ OCRMAS][3342]th_master:110': Could not yet connect to new master [1]
2010-11-29 10:52:43.605: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 2614829038 ms, node 111ea99b0 { host 'racdb1', haName '1e0b-174e-37bc-a515', srcLuid 2612fa8e-3db4fcb7, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [60 : 60], createTime 2614768983, flags 0x4 }
2010-11-29 10:52:43.754: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2010-11-29 10:52:43.955: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
...
2010-11-29 11:13:49.817: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2010-11-29 11:13:50.018: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
...
Changes
Upgrade or install of 11.2.0.2 grid infrastructure on >1 node clusterCause
2 causes found for this symptom. One cause is AIX-specific and the other cause is Unix-generic1) AIX-specific cause
udp_sendspace is set as default 9216, it is smaller than 10240 bytes which is the size used by CRS.#no -o udp_sendspace
2) UNIX-generic cause
Netmask mismatch between the nodes. The private interface must have the same netmask on all nodes. Mismatch between netmask on different nodes can cause this symptom.Solution
The two causes have two separate solutions.1) Solution for AIX-specific cause
Increase udp_sendspace to >= 10240.# no -o udp_sendspace=65536
Note that the 11gR2 documentation instructs to set udp_sendspace to 65536:
Network tuning parameter | Recommended value |
---|---|
ipqmaxlen | 512 |
rfc1323 | 1 |
sb_max | 4194304 |
tcp_recvspace | 65536 |
tcp_sendspace | 65536 |
udp_recvspace | 655360 |
udp_sendspace | 65536 |
See Oracle Grid Infrastructure Installation Guide
11g Release 2 (11.2) for IBM AIX on POWER Systems (64-Bit)
2.11.7 Configuring Network Tuning Parameters
http://download.oracle.com/docs/cd/E11882_01/install.112/e17210/preaix.htm#CWAIX219
for more details.
If problem happens during rootupgrade.sh (usually on 2nd node), please do:
1). Increase udp_sendspace to 65536:
#no -o udp_sendspace=65536
2). Stop CRS on both nodes:
# crsctl stop crs -f
# ps -ef |grep d.bin - to ensure there is no left over CRS process
3). Restart CRS on node 1:
# crsctl start crs
4). On node 2, rerun rootupgrade.sh
# rootupgrade.sh
It should complete on node 2 this time.
2) Solution for Unix-generic cause
Check that netmask matches on private interface on all nodes. [grid@mynode1 ~]$ ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:19:B9:1E:6D:97
inet addr:192.168.1.110 Bcast:192.168.1.255 Mask:255.255.255.0
...
[grid@mynode2 ~]$ ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:19:B9:1E:6D:97
inet addr:192.168.1.111 Bcast:192.168.1.255 Mask:255.255.255.0
...
In case of mismatch, customer sysadmin must correct the netmask on the private interface(s) where it's wrong.
Niciun comentariu:
Trimiteți un comentariu