VIP不能正常启动
描述:我们的环境是2节点RAC,节点1发生物理故障造成宕机。
此时我想将节点1的VIP从节点2上启动,以便单节点对用户程序透明。
[Oracle@UNID02 ~]$ crs_start ora.unid01.vip
Attempting to start `ora.unid01.vip` on member `UNID02`
Start of `ora.unid01.vip` on member `UNID02` failed.
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.unid01.vip'.
[oracle@UNID02 ~]$
但是启动的时候报错CRS-1006: No more members to consider。
查看VIP日志(位于$CRS_HOME/log//racg),发现报网卡相关错:
2013-12-10 09:50:26.877: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: checkIf: interface eth0 is down
Invalid parameters, or failed to bring up VIP (host=UNID02) ==============================>
2013-12-10 09:50:26.877: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip start unid01
2013-12-10 09:50:26.877: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: rc = 1, time = 3.130s
2013-12-10 09:50:30.010: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip check unid01
2013-12-10 09:50:30.010: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: rc = 1, time = 3.130s
2013-12-10 09:50:30.010: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: end for resource = ora.unid01.vip, action = start, status = 1, time = 6.280s
013-12-10 01:17:41.966: [ COMMCRS][1472985408]clsc_receive: (0x2aaaac1428c0) error 2
2013-12-10 09:50:23.702: [ CRSRES][1538058560] startRunnable: setting CLI values
2013-12-10 09:50:23.705: [ CRSRES][1538058560] Attempting to start `ora.unid01.vip` on member `UNID02`
2013-12-10 09:50:30.012: [ CRSAPP][1538058560] StartResource error for ora.unid01.vip error code = 1
2013-12-10 09:50:33.198: [ CRSRES][1538058560] Start of `ora.unid01.vip` on member `UNID02` failed.
2013-12-10 09:50:33.204: [ CRSRES][1538058560] CRS-1006: No more members to consider
通过srvctl查看发现UNID02-vip的绑定网卡为eth2,而unid01-vip绑定网卡为eth0.
[oracle@UNID02 ~]$ srvctl config nodeapps -n UNID02 -a -g -s -l
VIP exists.: /UNID02-vip/10.0.15.176/255.255.255.0/eth2
GSD exists.
ONS daemon exists.
Listener exists.
[oracle@UNID02 ~]$ srvctl config nodeapps -n unid01 -a -g -s -l
VIP exists.: /unid01-vip/10.0.15.175/255.255.255.0/eth0
GSD exists.
ONS daemon exists.
Listener exists.
ifconfig查看发现eth0没有开启
[oracle@UNID02 ~]$
[root@UNID02 bin]# ifconfig
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:53 errors:0 dropped:0 overruns:0 frame:0
TX packets:43 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8246 (8.0 KiB) TX bytes:6848 (6.6 KiB)
Interrupt:122 Memory:d8000000-d8012800
eth2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.172 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5778770 errors:0 dropped:0 overruns:0 frame:0
TX packets:2798242 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1493987596 (1.3 GiB) TX bytes:1004608379 (958.0 MiB)
Interrupt:130 Memory:da000000-da012800
eth2:1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.176 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:130 Memory:da000000-da012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:921339 errors:0 dropped:0 overruns:0 frame:0
TX packets:921339 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:417953992 (398.5 MiB) TX bytes:417953992 (398.5 MiB)
[root@UNID02 bin]#
咨询系统工程师,告知这台机器之前Public IP使用的是eth0网卡,后来eth0网卡发生了故障,切换到了eth2网卡,原来如此。
有2个解决方法:
1.将unid01-vip修改为eth2
[root@UNID02 ~]$ srvctl modify nodeapps -n unid01 -A 10.0.15.175/255.255.255.0/eth2
再次启动,启动成功。
[oracle@UNID02 ~]$ crs_start ora.unid01.vip
Attempting to start `ora.unid01.vip` on member `UNID02`
Start of `ora.unid01.vip` on member `UNID02` succeeded.
2.因为crs_start会调用racgvip这个脚本启动vip,所以直接修改环境变量,再直接执行sh racgvip start ora.unid01.vip
[root@UNID02 ~]# export _USR_ORA_VIP=10.0.15.175
[root@UNID02 ~]# export _USR_ORA_NETMASK=255.255.255.0
[root@UNID02 ~]# export _USR_ORA_IF=eth2
[root@UNID02 ~]# export _CAA_NAME=ora.unid01.vip
[root@UNID02 bin]# sh -x racgvip start ora.unid01.vip
+ IFCONFIG=/sbin/ifconfig
+ GREP=/bin/grep
+ SED=/bin/sed
+ RM=/bin/rm
+ MV=/bin/mv
+ UNIQ=/usr/bin/uniq
+ PING=/bin/ping
+ WC=/usr/bin/wc
+ NETSTAT=/bin/netstat
+ AWK=/bin/awk
+ WHOAMI=/usr/bin/whoami
+ CAT=/bin/cat
+ UNAME=/bin/uname
+ SLEEP=/bin/sleep
+ SORT=/bin/sort
+ EXPR=/usr/bin/expr
+ DATE=/bin/date
+ RENICE=/usr/bin/renice
+ MIITOOL=/sbin/mii-tool
+ ARPING=/sbin/arping
+ IPCMD='/sbin/ip -f inet'
+ LANG=C
+ LC_ALL=C
+ export LANG LC_ALL
+ FAIL_WHEN_ALL_LINK_DOWN=1
+ FAIL_WHEN_DEFAULTGW_NOT_FOUND=1
+ DEFAULTGW=
+ /usr/bin/renice -20 -p 15145
++ /bin/hostname
+ HOSTNAME=UNID02
+ PING_TIMEOUT='-w 3 -c 1'
+ PING_COUNT=10
+ LOCKED=0
+ CRS_STAT=/bin/crs_stat
+ CHECK_TIMES=2
+ SUCCESS=0
+ ERROR=1
+ DEFAULT_TIMEOUT=60
+ IP=10.0.15.175
+ MASK=255.255.255.0
+ IF=eth2
+ OP=start
++ /usr/bin/whoami
+ USER=root
++ uname
+ [[ Linux != Linux ]]
+ listif_result=
+ '[' root '!=' root -a start '!=' list ']'
+ '[' -n 10.0.15.175 -a -n 255.255.255.0 ']'
++ IFS=.
++ set 10 0 15 175 255 255 255 0
++ echo 10.0.15.255
+ BROADCAST=10.0.15.255
+ logx 'Broadcast = 10.0.15.255'
+ '[' -n '' ']'
+ '[' start = list ']'
++ echo ora.unid01.vip
++ /bin/sed '-es/^ora.//;s/.vip$//'
+ VIP_NAME=unid01
+ NAME=ora.unid01.vip
+ '[' -z ora.unid01.vip ']'
+ IF_USING=
+ '[' -n 10.0.15.175 ']'
+ logx Checking interface existance
+ '[' -n '' ']'
+ logx 'Calling getifbyip'
+ '[' -n '' ']'
++ getifbyip 10.0.15.175
++ __LOCAL_IP=10.0.15.175
++ gf_retif=
++ logx 'getifbyip: started for 10.0.15.175'
++ '[' -n '' ']'
+++ /sbin/ip -f inet -o addr
+++ /bin/grep 'inet 10.0.15.175/'
+++ /bin/awk '{ print $NF }'
++ gf_retif=
++ logx 'getifbyip: returning IP '
++ '[' -n '' ']'
++ '[' -z '' ']'
+ LI=
+ logx Completed getifbyip
+ '[' -n '' ']'
+ logx 'Calling getifbyip -a'
+ '[' -n '' ']'
++ getifbyip 10.0.15.175 -a
++ __LOCAL_IP=10.0.15.175
++ gf_retif=
++ logx 'getifbyip: started for 10.0.15.175'
++ '[' -n '' ']'
+++ /sbin/ip -f inet -o addr
+++ /bin/grep 'inet 10.0.15.175/'
+++ /bin/awk '{ print $NF }'
++ gf_retif=
++ logx 'getifbyip: returning IP '
++ '[' -n '' ']'
++ '[' -z -a ']'
++ '[' -n '' ']'
+ LI_A=
+ logx Completed getifbyip
+ '[' -n '' ']'
+ '[' '' '!=' '' ']'
+ echo ''
+ /bin/grep -q :
+ '[' 1 -ne 0 ']'
+ '[' start = stop ']'
+ ping_vip 10.0.15.175
+ logx 'ping_vip 10.0.15.175 started'
+ '[' -n '' ']'
+ '[' -n 10.0.15.175 ']'
+ _count=1
+ '[' 1 -le 10 ']'
+ /bin/ping 10.0.15.175 -w 3 -c 1
+ '[' 1 -ne 0 ']'
+ logx 'ping_vip: 10.0.15.175 is not pingable, _count = 1'
+ '[' -n '' ']'
+ return 1
+ '[' 1 -eq 0 ']'
+ logx 'Completed with initial interface test'
+ '[' -n '' ']'
+ case $OP in
+ '[' start = check ']'
+ '[' start = check ']'
+ '[' -n 10.0.15.175 -a -n 255.255.255.0 -a -n eth2 ']'
+ '[' -n '' ']'
+ logx 'Interface tests'
+ '[' -n '' ']'
++ echo eth2
++ /bin/sed '-es/|/ /g'
+ IF=eth2
+ for I in '$IF'
+ '[' eth2 = '' ']'
+ checkIf eth2
+ _IF=eth2
+ _RET=0
+ _LINK_STAT=
+ logx 'checkIf: start for if=eth2'
+ '[' -n '' ']'
+ '[' -z eth2 ']'
+ /sbin/ifconfig eth2
+ /bin/grep -q -w UP
+ '[' 0 -ne 0 ']'
+ '[' -x /sbin/mii-tool ']'
++ /sbin/mii-tool eth2
+ _LINK_STAT='eth2: negotiated 100baseTx-FD flow-control, link ok'
+ '[' 0 -eq 0 ']'
+ echo 'eth2: negotiated 100baseTx-FD flow-control, link ok'
+ /bin/grep -q 'link ok'
+ '[' 0 -eq 0 ']'
+ logx 'checkIf: mii-tool checked if=eth2 ok'
+ '[' -n '' ']'
+ _RET=0
+ '[' -z 'eth2: negotiated 100baseTx-FD flow-control, link ok' ']'
+ '[' 0 -eq 1 ']'
+ logx 'checkIf: end for if=eth2'
+ '[' -n '' ']'
+ return 0
+ '[' 0 -eq 0 ']'
+ getnextli eth2
+ _LOCAL_IF=eth2
+ nextli=
+ _LIN=
+ logx 'getnextli: started for if=eth2'
+ '[' -n '' ']'
++ listif
++ logx 'listif: starting'
++ '[' -n '' ']'
++ '[' -z '' ']'
+++ /sbin/ip -f inet -o addr
++ /bin/grep eth2:
++ /bin/sed '-es/^.*://'
+++ /bin/awk '{ print $NF }'
++ /bin/sort -n
+++ /bin/grep -vw lo
++ listif_result='eth1
eth2
eth2:1'
++ logx 'listif: completed with eth1
eth2
eth2:1'
++ '[' -n '' ']'
++ echo 'eth1
eth2
eth2:1'
+ _LIN=1
+ i=1
+ '[' 1 -le 256 ']'
+ _found=0
+ for j in '${_LIN}'
+ '[' 1 -eq 0 ']'
+ '[' 1 -eq 1 ']'
+ _found=1
+ break
+ '[' 1 -eq 0 ']'
+ i=2
+ '[' 2 -le 256 ']'
+ _found=0
+ for j in '${_LIN}'
+ '[' 1 -eq 0 ']'
+ '[' 2 -eq 1 ']'
+ '[' 0 -eq 0 ']'
+ get_lock eth2_2
+ TOUCH=/bin/touch
+ LS=/bin/ls
+ KILL=/bin/kill
+ LOCK=/var/tmp/vip_eth2_2_UNID02.lock
+ /bin/touch /var/tmp/vip_eth2_2_UNID02.lock.15145
+ '[' 0 -ne 0 ']'
++ /bin/ls /var/tmp/vip_eth2_2_UNID02.lock.15145
++ /usr/bin/wc -l
+ '[' 1 -eq 1 ']'
+ logx 'get_lock: lock file /var/tmp/vip_eth2_2_UNID02.lock.15145 is created'
+ '[' -n '' ']'
+ LOCKED=1
+ return 0
+ '[' 0 -eq 0 ']'
+ listif_result=
+ listif
+ logx 'listif: starting'
+ '[' -n '' ']'
+ '[' -z '' ']'
++ /sbin/ip -f inet -o addr
+ /bin/grep -w eth2:2
++ /bin/awk '{ print $NF }'
++ /bin/grep -vw lo
+ listif_result='eth1
eth2
eth2:1'
+ logx 'listif: completed with eth1
eth2
eth2:1'
+ '[' -n '' ']'
+ echo 'eth1
eth2
eth2:1'
+ '[' 1 -ne 0 ']'
+ break
+ '[' 2 -eq 256 ']'
+ nextli=eth2:2
+ logx 'getnextli: completed with nextli=eth2:2'
+ '[' -n '' ']'
+ return 2
+ LI=eth2:2
+ /sbin/ifconfig eth2:2 10.0.15.175 netmask 255.255.255.0 broadcast 10.0.15.255 up
+ '[' 0 -ne 0 ']'
+ logx 'Success exit 1'
+ '[' -n '' ']'
+ '[' -n '' ']'
+ /sbin/arping -q -U -c 3 -I eth2 10.0.15.175
+ release_lock
+ '[' 1 = 1 ']'
+ /bin/rm -f /var/tmp/vip_eth2_2_UNID02.lock.15145
+ logx 'release_lock: remove lock file /var/tmp/vip_eth2_2_UNID02.lock.15145'
+ '[' -n '' ']'
+ LOCKED=0
+ exit 0 --返回值为0,启动成功
[root@UNID02 bin]# ifconfig
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:53 errors:0 dropped:0 overruns:0 frame:0
TX packets:43 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8246 (8.0 KiB) TX bytes:6848 (6.6 KiB)
Interrupt:122 Memory:d8000000-d8012800
eth2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.172 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5778770 errors:0 dropped:0 overruns:0 frame:0
TX packets:2798242 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1493987596 (1.3 GiB) TX bytes:1004608379 (958.0 MiB)
Interrupt:130 Memory:da000000-da012800
eth2:1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.176 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:130 Memory:da000000-da012800
eth2:2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.175 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:130 Memory:da000000-da012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:921339 errors:0 dropped:0 overruns:0 frame:0
TX packets:921339 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:417953992 (398.5 MiB) TX bytes:417953992 (398.5 MiB)
[root@UNID02 bin]#
ifconfig查看vip已经启动成功。
Return the outputs if still not working. Then refer the script /u01/app/11.2.0/grid/bin/racgvip
If there is more than one interfaces, remove the cable on the interface
which VIP is set and run check action, the VIP should be set to another interface.
# 1. becomes root user
# 2. set environment variables
# - _USR_ORA_VIP for VIP address
# - _USR_ORA_NETMASK for netmask address
# - _USR_ORA_IF for interface names, they are separated by '|' character
# - _CAA_NAME for the VIP resource name, ora..vip
# 3. Test list command
# # sh racgvip list
# 4. Test start command
# # sh racgvip start
# # echo $?
# # ifconfig (to check if the VIP is set)
# 5. Test check command
# # sh racgvip check
# # echo $?
# 6. Test stop command
# # sh racgvip stop
# # echo $?
# # ifconfig (to check if the VIP is unset)
# 7. If there is more than one interfaces, remove the cable on the interface
# which VIP is set and run check action, the VIP should be set to another
# interface.
# Note: if cables are pulled from all interfaces or there is only one
# interface, VIP will stay on the original interface and
# the script returns success. This behavior is to keep VIP resource
# from failover if there is a network brown out.
#
# # sh racgvip check
# # echo $?
# # ifconfig (to check if the VIP is set to another interface)
VIP is brought up using /u01/app/11.2.0/grid/bin/racgvip. From the script, it will check the status of the insterface. If it is down then VIP can not be up.
Reviewed the scripts in /u01/app/11.2.0/grid/bin/racgvip:
if [ -z "$_IF" ]
then
echo "checkIf: interface name is NULL"
return 1
fi
# check if ther interface is up
$IFCONFIG $_IF | $GREP -q -w UP
if [ $? -ne 0 ]
then
echo "checkIf: interface $_IF is down"
return 1
fi