CRS无法启动,运行crsctl start crs无响应
查看messages:
[root@UNID02 ~]# tail -f /var/log/messages
Dec 9 08:11:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7259.
Dec 9 08:12:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7468.
Dec 9 08:12:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7356.
Dec 9 08:12:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7259.
Dec 9 08:13:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7468.
Dec 9 08:13:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7356.
Dec 9 08:13:14 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7259.
Dec 9 08:14:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7468.
Dec 9 08:14:13 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7356.
Dec 9 08:14:14 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7259.
CRS无法启动的原因为:Cluster Ready Services waiting on dependencies
看来有依赖组件没有起来。
查看更详细的信息:
[root@UNID02 ~]# less /tmp/crsctl.7259
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
[root@UNID02 ~]# service rawdevices status
/dev/raw/raw5: bound to major 120, minor 97
/dev/raw/raw6: bound to major 120, minor 113
/dev/raw/raw7: bound to major 120, minor 129
发现只起来了三个RAW,于是将其余的RAW起起来。
[root@UNID02 ~]# service rawdevices start
Assigning devices:
/dev/raw/raw1 --> /dev/emcpowera1
/dev/raw/raw1: bound to major 120, minor 1
/dev/raw/raw2 --> /dev/emcpowerb1
/dev/raw/raw2: bound to major 120, minor 17
/dev/raw/raw3 --> /dev/emcpowerc1
/dev/raw/raw3: bound to major 120, minor 33
/dev/raw/raw4 --> /dev/emcpowerd1
/dev/raw/raw4: bound to major 120, minor 49
/dev/raw/raw5 --> /dev/emcpowerg1
/dev/raw/raw5: bound to major 120, minor 97
/dev/raw/raw6 --> /dev/emcpowerh1
/dev/raw/raw6: bound to major 120, minor 113
/dev/raw/raw7 --> /dev/emcpoweri1
/dev/raw/raw7: bound to major 120, minor 129
done
[root@UNID02 ~]#
为什么这些RAW没起来呢?原因为存储发生了故障重启,重启后disk挂载上主机,但是并没有作为RAW被识别出来。
于是再次启动rawservices。
[root@UNID02 ~]# service rawdevices start
Assigning devices:
/dev/raw/raw1 --> /dev/emcpowera1
/dev/raw/raw1: bound to major 120, minor 1
/dev/raw/raw2 --> /dev/emcpowerb1
/dev/raw/raw2: bound to major 120, minor 17
/dev/raw/raw3 --> /dev/emcpowerc1
/dev/raw/raw3: bound to major 120, minor 33
/dev/raw/raw4 --> /dev/emcpowerd1
/dev/raw/raw4: bound to major 120, minor 49
/dev/raw/raw5 --> /dev/emcpowerg1
/dev/raw/raw5: bound to major 120, minor 97
/dev/raw/raw6 --> /dev/emcpowerh1
/dev/raw/raw6: bound to major 120, minor 113
/dev/raw/raw7 --> /dev/emcpoweri1
/dev/raw/raw7: bound to major 120, minor 129
done
[root@UNID02 ~]#
继续查看Messages,发现继续报之前的错误:
Dec 9 08:15:14 UNID02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.7259.
看来启动过程中CRS发现还是存在问题,于是再次查看crsctl.*文件:
[root@UNID02 ~]# cat /tmp/crsctl.7259
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]
[root@UNID02 ~]#
还是一样的报错,但是这回由[No such file or directory]变成了 [Permission denied]。
肯定是RAW的权限由问题,查看RAW的权限:
[root@UNID02 ~]# ll /dev/raw
total 0
crw------- 1 root root 162, 1 Dec 9 08:14 raw1
crw------- 1 root root 162, 2 Dec 9 08:14 raw2
crw------- 1 root root 162, 3 Dec 9 08:14 raw3
crw------- 1 root root 162, 4 Dec 9 08:14 raw4
crw------- 1 root root 162, 5 Dec 9 08:14 raw5
crw------- 1 root root 162, 6 Dec 9 08:14 raw6
crw------- 1 root root 162, 7 Dec 9 08:14 raw7
果然如此,修改RAW的属主为oracle:
[root@UNID02 ~]# chown oracle:dba /dev/raw/*
发现messages里已经有CRS启动的信息了:
Dec 9 08:16:14 UNID02 logger: Cluster Ready Services completed waiting on dependencies.
Dec 9 08:16:14 UNID02 last message repeated 2 times
Dec 9 08:16:14 UNID02 logger: Running CRSD with TZ =
Dec 9 08:16:14 UNID02 logger: Oracle CSS Family monitor starting.
Dec 9 08:16:15 UNID02 logger: Oracle CSS restart. 0, 1
以为问题到此结束,但等了一会儿,发现asm和instance一直没起来:
[oracle@UNID02 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE OFFLINE
ora....01.lsnr application ONLINE OFFLINE
ora....t01.gsd application ONLINE OFFLINE
ora....t01.ons application ONLINE OFFLINE
ora....t01.vip application ONLINE OFFLINE
ora....SM2.asm application ONLINE OFFLINE
ora....02.lsnr application ONLINE ONLINE UNID02
ora....t02.gsd application ONLINE ONLINE UNID02
ora....t02.ons application ONLINE ONLINE UNID02
ora....t02.vip application ONLINE ONLINE UNID02
ora....RTAL.cs application ONLINE OFFLINE
ora....ac1.srv application ONLINE OFFLINE
ora....ac2.srv application ONLINE OFFLINE
ora.rac.db application ONLINE OFFLINE
ora....c1.inst application ONLINE OFFLINE
ora....c2.inst application ONLINE OFFLINE
ora....rwss.cs application ONLINE OFFLINE
ora....ac1.srv application ONLINE OFFLINE
ora....ac2.srv application ONLINE OFFLINE
ora...._taf.cs application ONLINE OFFLINE
ora....ac1.srv application ONLINE OFFLINE
ora....ac2.srv application ONLINE OFFLINE
ora....test.cs application ONLINE OFFLINE
ora....ac1.srv application ONLINE OFFLINE
ora....rac1.cs application ONLINE OFFLINE
ora....ac1.srv application ONLINE OFFLINE
ora....ac2.srv application ONLINE OFFLINE
ora....rac2.cs application ONLINE OFFLINE
ora....ac1.srv application ONLINE OFFLINE
ora....ac2.srv application ONLINE OFFLINE
[oracle@UNID02 ~]$
于是手动启动ASM实例:
[oracle@UNID02 ~]$ dba
SQL*Plus: Release 11.1.0.7.0 - Production on Mon Dec 9 08:25:06 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:check if cable failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini1
ORA-27303: additional information: requested interface eth1 interface not running set _disable_interface_checking = TRUE to disable this check for single instance cluster. Check output from ifcon
SQL>
SQL>
SQL> quit
Disconnected
报这个错误是因为RAC中的ASM和INSTANCE在启动的时候会通过私有网络去检查其他节点的私有网络信息,此时另一个节点是关机的。
解决办法是在asm参数文件中设定以下2个隐含参数:
_disable_instance_params_check = TRUE
_disable_interface_checking = TRUE
_disable_instance_params_check = TRUE的意义是在实例启动时忽略instance_type的值,而disable_interface_checking参数仅用于db的参数文件,用于asm实例时会报错ORA-15021: parameter "_disable_interface_checking" is not valid in asm instance,所以此处设置_disable_instance_params_check = TRUE用于略过instance_type的检查。
在实例参数文件中设定以下隐含参数:
_disable_interface_checking = TRUE
继续启动asm和instance,可以正常启动。