当前位置:  数据库>oracle

DNS 引起经典RAC故障实例及解决方法

    来源: 互联网  发布时间:2017-06-25

    本文导语: 一、环境介绍: 这是一套四年前部署的RAC系统,之前运行一直很好,没有出过问题,平时基本处于无人管的状态。 OS:RedHat EnterPrise Linux 5.8 x86_x64 DB:Oracle Database EnterPrise  11.2.0.4 GI:Oracle Grid Infrastructure 11.2.0.4 二、问题描述:     ...

一、环境介绍:

这是一套四年前部署的RAC系统,之前运行一直很好,没有出过问题,平时基本处于无人管的状态。

OS:RedHat EnterPrise Linux 5.8 x86_x64

DB:Oracle Database EnterPrise  11.2.0.4

GI:Oracle Grid Infrastructure 11.2.0.4

二、问题描述:

        昨天临近下班接到现场人员故障请求,描述为数据库无法连接,报ORA-12547:TNS: lost  CONNECT。当时第一反应是网络和监听故障,让现场人员进行tnsping和ping都是正常的。

三、问题现象:

        我到达现场后,首先查看了数据库的状态,发现数据库实例是停止运行状态,并且从日志中看不出明显报错;

      数据库日志:
  • Starting up:
  • Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
  • With the Partitioning, Real Application Clusters, OLAP, Data Mining
  • and Real Application Testing options.
  • ORACLE_HOME = /u01/app/oracle/11.2.0.4/product/db_1
  • System name:    Linux
  • Node name:    db01
  • Release:    3.8.13-44.1.1.el6uek.x86_64
  • Version:    #2 SMP Wed Sep 10 06:10:25 PDT 2014
  • Machine:    x86_64
  • VM name:    VMWare Version: 6
  • Using parameter settings in server-side pfile /u01/app/oracle/11.2.0.4/product/db_1/dbs/initwoo1.ora
  • System parameters with non-default values:
  •   processes = 600
  •   sessions = 922
  •   spfile = "+DATA/woo/spfilewoo.ora"
  •   nls_language = "SIMPLIFIED CHINESE"
  •   nls_territory = "CHINA"
  •   memory_target = 1584M
  •   control_files = "+DATA/woo/controlfile/current.260.930748953"
  •   control_files = "+FRA01/woo/controlfile/current.256.930748953"
  •   db_block_size = 8192
  •   compatible = "11.2.0.4.0"
  •   cluster_database = TRUE
  •   db_create_file_dest = "+DATA"
  •   db_recovery_file_dest = "+FRA01"
  •   db_recovery_file_dest_size= 4407M
  •   thread = 1
  •   undo_tablespace = "UNDOTBS1"
  •   instance_number = 1
  •   remote_login_passwordfile= "EXCLUSIVE"
  •   db_domain = ""
  •   dispatchers = "(PROTOCOL=TCP) (SERVICE=wooXDB)"
  •   remote_listener = "scan.prudentwoo.com:1521"
  •   audit_file_dest = "/u01/app/oracle/admin/woo/adump"
  •   audit_trail = "DB"
  •   db_name = "woo"
  •   open_cursors = 300
  •   diagnostic_dest = "/u01/app/oracle"
  • Cluster communication is configured to use the following interface(s) for this instance
  •   169.254.51.38
  •   169.254.243.157
  • cluster interconnect IPC version:Oracle UDP/IP (generic)
  • IPC Vendor 1 proto 2
  • Fri Dec 16 15:24:55 2016
  • USER (ospid: 4044): terminating the instance due to error 119
  • Instance terminated by USER, pid = 4044
  •  

         数据库状态:
  • [oracle@db01 ~]$ crsctl status res -t
  • --------------------------------------------------------------------------------
  • NAME TARGET STATE SERVER STATE_DETAILS
  • --------------------------------------------------------------------------------
  • Local Resources
  • --------------------------------------------------------------------------------
  • ora.BAK01.dg
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • ora.DATA.dg
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • ora.FRA01.dg
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • ora.LISTENER.lsnr
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • ora.OCR_VOT.dg
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • ora.asm
  •                ONLINE ONLINE db01 Started
  •                ONLINE ONLINE db02 Started
  • ora.gsd
  •                OFFLINE OFFLINE db01
  •                OFFLINE OFFLINE db02
  • ora.net1.network
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • ora.ons
  •                ONLINE ONLINE db01
  •                ONLINE ONLINE db02
  • --------------------------------------------------------------------------------
  • Cluster Resources
  • --------------------------------------------------------------------------------
  • ora.LISTENER_SCAN1.lsnr
  •       1 ONLINE ONLINE db02
  • ora.LISTENER_SCAN2.lsnr
  •       1 ONLINE ONLINE db01
  • ora.LISTENER_SCAN3.lsnr
  •       1 ONLINE ONLINE db01
  • ora.cvu
  •       1 ONLINE ONLINE db01
  • ora.db01.vip
  •       1 ONLINE ONLINE db01
  • ora.db02.vip
  •       1 ONLINE ONLINE db02
  • ora.oc4j
  •       1 ONLINE ONLINE db01
  • ora.scan1.vip
  •       1 ONLINE ONLINE db02
  • ora.scan2.vip
  •       1 ONLINE ONLINE db01
  • ora.scan3.vip
  •       1 ONLINE ONLINE db01
  • ora.woo.db
  •       1 ONLINE OFFLINE Instance Shutdown
  •       2 ONLINE OFFLINE Instance Shutdown
  • [oracle@db01 ~]$ srvctl status database -d woo
  • Instance woo1 is not running on node db01
  • Instance woo2 is not running on node db02
  •  

    四、手工带起数据库:
  • [oracle@db01 trace]$ srvctl start database -d woo
  • PRCR-1079 : Failed to start resource ora.woo.db
  • CRS-5017: The resource action "ora.woo.db start" encountered the following error:
  • ORA-00119: invalid specification for system parameter REMOTE_LISTENER
  • ORA-00132: syntax error or unresolved network name 'scan.prudentwoo.com:1521'
  • . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/product/grid/log/db02/agent/crsd/oraagent_oracle/oraagent_oracle.log".
  • CRS-5017: The resource action "ora.woo.db start" encountered the following error:
  • ORA-00119: invalid specification for system parameter REMOTE_LISTENER
  • ORA-00132: syntax error or unresolved network name 'scan.prudentwoo.com:1521'
  • . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/product/grid/log/db01/agent/crsd/oraagent_oracle/oraagent_oracle.log".
  • CRS-2674: Start of 'ora.woo.db' on 'db02' failed
  • CRS-2674: Start of 'ora.woo.db' on 'db01' failed
  • CRS-2632: There are no more servers to try to place resource 'ora.woo.db' on that would satisfy its placement policy
  •  

        日志信息:
  • alert.log:
  • [oracle@db01 trace]$ tail -0f alert_woo1.log
  • Fri Dec 16 15:37:08 2016
  • Starting ORACLE instance (normal)
  • LICENSE_MAX_SESSION = 0
  • LICENSE_SESSIONS_WARNING = 0
  • Initial number of CPU is 2
  • Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  •   [name='eth1:1', type=1, ip=169.254.51.38, mac=00-0c-29-7c-44-ca, net=169.254.0.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
  • Private Interface 'eth2:1' configured from GPnP for use as a private interconnect.
  •   [name='eth2:1', type=1, ip=169.254.243.157, mac=00-0c-29-7c-44-d4, net=169.254.128.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
  • Public Interface 'eth0' configured from GPnP for use as a public interface.
  •   [name='eth0', type=1, ip=192.168.84.11, mac=00-0c-29-7c-44-c0, net=192.168.84.0/24, mask=255.255.255.0, use=public/1]
  • Public Interface 'eth0:1' configured from GPnP for use as a public interface.
  •   [name='eth0:1', type=1, ip=192.168.84.22, mac=00-0c-29-7c-44-c0, net=192.168.84.0/24, mask=255.255.255.0, use=public/1]
  • Public Interface 'eth0:3' configured from GPnP for use as a public interface.
  •   [name='eth0:3', type=1, ip=192.168.84.20, mac=00-0c-29-7c-44-c0, net=192.168.84.0/24, mask=255.255.255.0, use=public/1]
  • Public Interface 'eth0:5' configured from GPnP for use as a public interface.
  •   [name='eth0:5', type=1, ip=192.168.84.13, mac=00-0c-29-7c-44-c0, net=192.168.84.0/24, mask=255.255.255.0, use=public/1]
  • CELL communication is configured to use 0 interface(s):
  • CELL IP affinity details:
  •     NUMA status: non-NUMA system
  •     cellaffinity.ora status: N/A
  • CELL communication will use 1 IP group(s):
  •     Grp 0:
  • Picked latch-free SCN scheme 3
  • Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
  • Autotune of undo retention is turned on.
  • LICENSE_MAX_USERS = 0
  • SYS auditing is disabled
  • Starting up:
  • Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
  • With the Partitioning, Real Application Clusters, OLAP, Data Mining
  • and Real Application Testing options.
  • ORACLE_HOME = /u01/app/oracle/11.2.0.4/product/db_1
  • System name:    Linux
  • Node name:    db01
  • Release:    3.8.13-44.1.1.el6uek.x86_64
  • Version:    #2 SMP Wed Sep 10 06:10:25 PDT 2014
  • Machine:    x86_64
  • VM name:    VMWare Version: 6
  • Using parameter settings in server-side pfile /u01/app/oracle/11.2.0.4/product/db_1/dbs/initwoo1.ora
  • System parameters with non-default values:
  •   processes = 600
  •   sessions = 922
  •   spfile = "+DATA/woo/spfilewoo.ora"
  •   nls_language = "SIMPLIFIED CHINESE"
  •   nls_territory = "CHINA"
  •   memory_target = 1584M
  •   control_files = "+DATA/woo/controlfile/current.260.930748953"
  •   control_files = "+FRA01/woo/controlfile/current.256.930748953"
  •   db_block_size = 8192
  •   compatible = "11.2.0.4.0"
  •   cluster_database = TRUE
  •   db_create_file_dest = "+DATA"
  •   db_recovery_file_dest = "+FRA01"
  •   db_recovery_file_dest_size= 4407M
  •   thread = 1
  •   undo_tablespace = "UNDOTBS1"
  •   instance_number = 1
  •   remote_login_passwordfile= "EXCLUSIVE"
  •   db_domain = ""
  •   dispatchers = "(PROTOCOL=TCP) (SERVICE=wooXDB)"
  •   remote_listener = "scan.prudentwoo.com:1521"
  •   audit_file_dest = "/u01/app/oracle/admin/woo/adump"
  •   audit_trail = "DB"
  •   db_name = "woo"
  •   open_cursors = 300
  •   diagnostic_dest = "/u01/app/oracle"
  • Cluster communication is configured to use the following interface(s) for this instance
  •   169.254.51.38
  •   169.254.243.157
  • cluster interconnect IPC version:Oracle UDP/IP (generic)
  • IPC Vendor 1 proto 2
  • Fri Dec 16 15:37:49 2016
  • USER (ospid: 6043): terminating the instance due to error 119
  • Instance terminated by USER, pid = 6043
  •   五、问题分析:

            我从启动数据库来看,发现数据库此时无法正常启动,并随着报ORA-00132,日志报error 119。

            根据启动提示可以将问题定位到scan,因scan故障引起数据库无法正常启动。

     

    六、检查scan配置信息:
  • #check scan info:
  • [oracle@db01 ~]$ srvctl config scan
  • SCAN name: scan.prudentwoo.com, Network: 1/192.168.84.0/255.255.255.0/eth0
  • SCAN VIP name: scan1, IP: /scan.prudentwoo.com/192.168.84.21
  • SCAN VIP name: scan2, IP: /scan.prudentwoo.com/192.168.84.22
  • SCAN VIP name: scan3, IP: /scan.prudentwoo.com/192.168.84.20
  • [oracle@db01 ~]$ ping 192.168.84.20 -c 2
  • PING 192.168.84.20 (192.168.84.20) 56(84) bytes of data.
  • 64 bytes from 192.168.84.20: icmp_seq=1 ttl=64 time=0.032 ms
  • 64 bytes from 192.168.84.20: icmp_seq=2 ttl=64 time=0.039 ms
  • --- 192.168.84.20 ping statistics ---
  • 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
  • rtt min/avg/max/mdev = 0.032/0.035/0.039/0.006 ms
  • [oracle@db01 ~]$ ping 192.168.84.21 -c 2
  • PING 192.168.84.21 (192.168.84.21) 56(84) bytes of data.
  • 64 bytes from 192.168.84.21: icmp_seq=1 ttl=64 time=0.231 ms
  • 64 bytes from 192.168.84.21: icmp_seq=2 ttl=64 time=0.292 ms
  • --- 192.168.84.21 ping statistics ---
  • 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
  • rtt min/avg/max/mdev = 0.231/0.261/0.292/0.034 ms
  • [oracle@db01 ~]$ ping 192.168.84.22 -c 2
  • PING 192.168.84.22 (192.168.84.22) 56(84) bytes of data.
  • 64 bytes from 192.168.84.22: icmp_seq=1 ttl=64 time=0.024 ms
  • 64 bytes from 192.168.84.22: icmp_seq=2 ttl=64 time=0.034 ms
  • --- 192.168.84.22 ping statistics ---
  • 2 packets transmitted, 2 received, 0% packet loss, time 999ms
  • rtt min/avg/max/mdev = 0.024/0.029/0.034/0.005 ms
  • [oracle@db01 ~]$ ping scan.prudentwoo.com -c 2
  • ping: unknown host scan.prudentwoo.com
  •  

               我们可以看到,现在scan对应的三个地址都是通的,说明SCAN的服务是好的,但是ping scan所对应的域名的时候报无法找到主机,无法解析域名,那么下一步可以定位应该是域名服务出问题了。

     

     

    七、在两台数据库服务器上检查域名(dns)服务,结果是域名服务器没有在这两台数据服务器上:
  • #check dns client and server:
  • [oracle@db01 ~]$ /sbin/chkconfig --list|grep named
  • [oracle@db01 ~]$ ssh db02 '/sbin/chkconfig --list|grep named'
  • [oracle@db01 ~]$
  • check dns client:
  • [oracle@db01 ~]$ cat /etc/resolv.conf
  • search prudentwoo.com
  • nameserver 192.168.84.15
  •  

    八、根据resolv.conf配置找到真正的域名服务器,发现域名域名服务器hang住:
  • [oracle@db01 ~]$ ping 192.168.84.15 -c 2
  • PING 192.168.84.15 (192.168.84.15) 56(84) bytes of data.
  • From 192.168.84.11 icmp_seq=1 Destination Host Unreachable
  • From 192.168.84.11 icmp_seq=2 Destination Host Unreachable
  • --- 192.168.84.15 ping statistics ---
  • 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3007ms
  • pipe 2
  •  

    九、修复域名服务器,现在可以正常解析:
  • [oracle@db01 ~]$ ping scan.prudentwoo.com -c 2
  • PING scan.prudentwoo.com (192.168.84.21) 56(84) bytes of data.
  • 64 bytes from scan.prudentwoo.com (192.168.84.21): icmp_seq=1 ttl=64 time=0.494 ms
  • 64 bytes from scan.prudentwoo.com (192.168.84.21): icmp_seq=2 ttl=64 time=0.289 ms
  • --- scan.prudentwoo.com ping statistics ---
  • 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
  • rtt min/avg/max/mdev = 0.289/0.391/0.494/0.104 ms
  •  

    十、再次启动数据库:
  • [oracle@db01 ~]$ srvctl start database -d woo
  • [oracle@db01 ~]$ srvctl status database -d woo
  • Instance woo1 is running on node db01
  • Instance woo2 is running on node db02
  • [oracle@db01 ~]$ srvctl config database -d woo
  • Database unique name: woo
  • Database name: woo
  • Oracle home: /u01/app/oracle/11.2.0.4/product/db_1
  • Oracle user: oracle
  • Spfile: +DATA/woo/spfilewoo.ora
  • Domain:
  • Start options: open
  • Stop options: immediate
  • Database role: PRIMARY
  • Management policy: AUTOMATIC
  • Server pools: woo
  • Database instances: woo1,woo2
  • Disk Groups: DATA,FRA01
  • Mount point paths:
  • Services:
  • Type: RAC
  • Database is administrator managed
  •  

           从整个问题的处理思路来看该故障不仅考验解决数据库故障能力,同时安装,基本运行原理都有考察到,当然考验更多的应该还是操作系统和DNS服务的深入理解。

          当然我是很庆幸的,出于职业敏感度,一堆报错中瞬间发现问题根源ORA-00132,而没有从其它报错信息入手。


        
     
     

    您可能感兴趣的文章:

  • Python写的一个简单DNS服务器实例
  •  
    本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术,将尽最大努力为读者提供更好的信息聚合和浏览方式。
    本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。












  • 相关文章推荐
  • java命名空间javax.naming接口context的类成员方法: dns_url定义及介绍
  • 异步DNS解析 Tiny DNS Resolver
  • DNS服务器开发包 ext-Dns
  • C语言DNS解析器 dns.c
  • DNS 服务器 Knot DNS
  • 局域网的 DNS 代理 Acrylic DNS Proxy
  • DNS管理界面 Blahz-DNS
  • DNS服务器 Eagle DNS
  • 我配了DNS服务,但为什么本机可以解析,但网上其它机器就不能用这个DNS
  • 请问向DNS服务器发包的格式及DNS服务器里mx回应包的格式,急!
  • DNS 高手请进, DNS 莫名的出错
  • 痛苦!!我使用red hat 9来配置DNS,在red hat 9的本地机器上用host测试成功,但在windows机器上已经设置了DNS为red hat9的IP地址后仍然
  • DNS服务器端可以使用DNS服务,但客户端无法使用
  • 我使用red hat 9来配置DNS,在red hat 9的本地机器上用host测试成功,但在windows机器上已经设置了DNS为red hat9的IP地址后仍然无法解析r
  • DNS代理的问题请教高手
  • DNS服务器问题
  • 智能DNS服务器的配置问题
  • LINUX下DNS的设置问题
  • DNS:为我所用!!!
  • 智能DNS系统 wddns
  • ◆◆◆自己能不能私自架设解析广域网域名的dns服务器◆◆◆


  • 站内导航:


    特别声明:169IT网站部分信息来自互联网,如果侵犯您的权利,请及时告知,本站将立即删除!

    ©2012-2021,,E-mail:www_#163.com(请将#改为@)

    浙ICP备11055608号-3