当前位置:  数据库>oracle

Linux 平台下Oracle 9i/10g/11gR1 IO-Fencing 的hangcheck-timer 模块说明

    来源: 互联网  发布时间:2017-05-02

    本文导语: 一.官网的说明 参考MOS: Linux: Hangcheck-Timer Module Requirements for Oracle 9i, 10g, and11gR1 RAC [ID 726833.1] Hangcheck_timermodule is required to run a supported configuration in Oracle Real ApplicationClusters environments on Linux, with Oracle releases 9i, 10g, or 11gR1RAC.  This no...

一.官网的说明

参考MOS:

Linux: Hangcheck-Timer Module Requirements for Oracle 9i, 10g, and11gR1 RAC [ID 726833.1]

Hangcheck_timermodule is required to run a supported configuration in Oracle Real ApplicationClusters environments on Linux, with Oracle releases 9i, 10g, or 11gR1RAC.  This note identifies and outlines the requirements needed toconfigure hangcheck-timer in an Oracle Enterprise Linux, Red Hat Linux, or SUSELinux environment.

在Linux 环境下Oracle 9i,10g,11gR1 的RAC 需要配置Hangcheck_timer模块。

Note : Hangheck timer is notrequired starting with Oracle Clusterware 11gR2

注意,在模块在11gR2的RAC 中已经不在需要配置了。

Starting in release 9.2.0.2and later, Oracle RAC environments required using a new I/O fencing model,named the hangcheck-timer module. This module was implemented to replace theWatchdog module, which provided similar fencing functionality. Hangcheck-timerwas subsequently delivered as part of the standard kernel distribution forLinux kernel releases 2.4 and above. 

从9.2.0.2版本开始,ORACLERAC环境需要使用一个新的I/O fencing模块,叫做hangcheck-timer模块。这个模块用来代替Watchdog模块,提供类似的fencing功能。Hangcheck-timer模块是标准的linux2.4以上的内核中的一个子功能被发布。

Hangcheck-timer shouldbe loaded at boot time, and monitors the Linux kernel for long operatingsystem hangs that could affect the reliability of a RAC node.  It runs inkernel mode and uses the Time Stamp Counter (TSC) to catch scheduling delays ornode hangs.  This is done by setting a timer, then checking when the timerfires as to whether it was delayed by more than the allowed margin oferror.  If the duration exceeds the allowed time of (hangcheck_tick +hangcheck_margin seconds), the machine is restarted.  Hangcheck-timer willnot cause reboots to occur due to CPU starvation.

--Hangcheck-timer应该在系统启动的时候被加载, 并对于能够影响RAC节点稳定性的长时间的系统操作HANG进行内核监控。它运行在内核级别并使用Time Stamp Counter(TSC)来捕捉调度的延迟和节点HANG。这是通过设置一个timer,然后检查这个timer的fires情况来判断是否延迟是否超过了误差的幅度。如果这个周期超过了允许的时间(也就是hangcheck_tick+hangcheck_margin秒),机器将会被重启,如果是CPU资源不足的时候,Hangcheck-timer将不会导致重启。

Hangcheck-timer requiresthree configuration parameters:

--Hangcheck-timer有三个配置参数:

(1)    hangcheck_tick - defines howoften, in seconds, the hangcheck-timer checks the node for hangs. The defaultvalue is 60 seconds.

-- hangcheck_tick:定义了hangcheck-timer检查节点是否hang的频率,单位是秒,缺省是60秒.

(2)    hangcheck_margin - defines howmuch margin is allowed, in seconds, between expected scheduling and realscheduling time. The default value is 180 seconds.

--hangcheck_margin:定义期望的和真正的scheduling之间允许的误差,单位是秒,缺省值是180秒.

(3)    hangcheck_reboot - determinesif the hangcheck-timer restarts the node if the kernel fails to respond withinthe sum of the hangcheck_tick and hangcheck_margin parameter values. If thevalue of hangcheck_reboot is equal to or greater than 1, then thehangcheck-timer module restarts the system. If the hangcheck_reboot parameteris set to zero, then the hangcheck-timer module will not reboot the node,even if a hang is detected.   The default value varies by kernelversion.  In the 2.4 kernel, the default is 1.  In 2.6 kernels, thedefault is 0.

--hangcheck_reboot:定义了如果内核在hangcheck-tick和hangcheck-margin相加的时间内响应失败的话,hangcheck-timer是否重启节点。如果hangcheck_reboot的值大于等于1,hangcheck-timer模块将会重启系统;如果设置为0,则即使系统hang的时候hangcheck-timer也不会重启系统。在linux 2.4的内核中,这个缺省值是1;在2.6的内核中,缺省值是0。

当hangcheck_reboot=1并且满足下面的公式时,hangcheck-timer将reboot系统: system hang time > (hangcheck_tick + hangcheck_margin)

All hangcheck-timer defaultvalues should be explicitly overridden when loading the kernel module, based onthe Oracle release as follows: 

--所有的hangcheck-timer的参数的缺省值必须在加载内核模块的时候被显式的覆盖,不同的oracle版本可以按照下面来设置:

(1)9i: Assuming thedefault setting of "oracm misscount" is set to 220 seconds: 

hangcheck_tick=30hangcheck_margin=180 hangcheck_reboot=1

--9i: 假如"oracle misscount"的缺省设置是220秒,则hangcheck_tick=30hangcheck_margin=180 hangcheck_reboot=1


(2)10g/11gR1: Assuming thedefault setting of "CSS misscount" is set to either 30 or 60seconds:

hangcheck_tick=1hangcheck_margin=10 hangcheck_reboot=1

--10g/11gR1: 假如"CSS misscount"的设置是30或者60秒,则hangcheck_tick=1hangcheck_margin=10 hangcheck_reboot=1

 

You must always ensure thatthe Cluster misscount setting is greater than the sum of the setting forhangcheck_tick + hangcheck_margin.

--注意:你必须设置集群的misscount值大于hangcheck_tick + hangcheck_margin之和。

When running OracleClusterware on Linux, hangcheck-timer should always be configured on each RACcluster node, as the functionality of this module is required to provide I/O Fencingto ensure no stray writes will occur from an evicted node in a RACcluster.  To verify if the hangcheck-timer module is running on a nodeexecute as the root or oracle user:

       --Linux 平台上的Clusterware,需要在每个节点上配置hangcheck-timer模块,可以用root用户执行如下命令来验证hangcheck-timer是否运行:

# /sbin/lsmod | grep hangcheck

hangcheck-timer         2672   0

If the hangcheck-timer moduleis loaded (running) you will see output similar to above. When hangcheck-timeris not loaded no output is generated, and the command prompt is returned to theuser.

In an Oracle Enterprise Linux,Red Hat 4/5, or SUSE 9/10 environment the hangcheck-timer module is loadedusing the modprobe command:

--使用如下命令来装载hangcheck-timer:

# modprobe hangcheck-timer  hangcheck_tick=1 hangcheck_margin=10hangcheck_reboot=1

In order to ensure the moduleis loaded at boot time, you should also place the same command in the appropriatelocal command execution directory (e.g. /etc/rc.d/rc.local, or/etc/init.d/boot.local).  In earlier releases, hangcheck-timer was loadedusing insmod in place of modprobe. Consult your release specific documentationto determine which initialization method is required.

       --为了确保在系统启动时就装载了hangcheck-timer模块,我们可以将命令添加到/etc/rc.d/rc.local,or /etc/init.d/boot.local中。

Hangcheck-timer will providemessage logging to the system messages log when a failure is detected, and anode restart is initiated by the module:

--当hangcheck-timer检测到系统hang时,会在系统log里记录日志并重启系统。

(1)    When Hangcheck-timer reboots itmay leave "Hangcheck: hangcheck is restarting the machine" message in/var/log/messages。

-- hangcheck-timer的启动信息都会记录在系统日志里“ /var/log/messages”,重启时会记录"Hangcheck:hangcheck is restarting the machine"信息到/var/log/messages

(2)    If you see the followingmessage in /var/log/messages:  "Hangcheck: hangcheck value pastmargin!" this means a reboot was required but was not performed, becausehangcheck_reboot was not set to 1.  If this message is seen, you mustreload the hangcheck module as described earlier in this note, with thehangcheck_reboot value set to 1.

--如果你看到/var/log/messages中有"Hangcheck:hangcheck value past margin!"消息,表示系统需要重启但是没有重启,因为hangcheck-reboot参数没有设置为1。

注:

Bug:6125546 which can preventhangcheck-timer from rebooting in RHEL4 (fixed in 2.6.9.56 or RHEL4.6)


    
 
 

您可能感兴趣的文章:

 
本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术,将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。












  • 相关文章推荐
  • Linux中一个模块中的函数如何被另一个模块调用?
  • 同机装有Windows和Linux两套系统(先装Win后装Linux),如何使计算机使用Windows的启动模块而不是Linux的启动模块?
  • IT科技资讯 iis7站长之家
  • 关于LINUX内核模块的按需卸载及安装
  • linux内核中网络模块的学习
  • linux怎样将网卡接收到的数据包交给不同模块处理
  • linux内核模块间调用函数
  • linux下使用无线模块wifi发现不了网卡
  • 怎样测出linux启动每个模块的时间
  • Linux 内核模块编程问题!!!!
  • linux2.6内核 如何自动加载当前所需要的模块
  • 关于Linux模块
  • linux 内核模块????
  • 请教linux中如何自动加载自己的模块
  • linux 内核哪个模块是负责管理笔记本touchpad触摸板的?
  • 我向linux内核里加载模块时出现段错误
  • 在linux操作系统中,如何实现可加载的模块是否被卸载的检测
  • 关于Linux下模块编程的问题请教,急急急急急急
  • 编译和使用自己编写的linux内核模块的问题
  • Linux 内核模块工具 kmod
  • linux c/c++ IP字符串转换成可比较大小的数字
  • 在win分区上安装linux和独立分区安装linux有什么区别?可以同时安装吗?(两个linux系统)
  • linux哪个版本好?linux操作系统版本详细介绍及选择方案推荐
  • 在虚拟机上安装的linux上,能像真的linux系统一样开发linux程序么?
  • secureCRT下Linux终端汉字乱码解决方法
  • 我重装window后,把linux的引导区覆盖了,进不了linux怎么办?急啊,望热心的人帮助 (现在有linux的盘)
  • Linux c字符串中不可打印字符转换成16进制
  • 安装vmware软件,不用再安装linux系统,就可以模拟linux系统了,然后可以在其上学习一下LINUX下的基本操作 了?
  • Linux常用命令介绍:更改所属用户群组或档案属性
  • 红旗Linux主机可以通过127.0.0.1访问,但如何是连网的Win2000机器通过Linux的IP去访问Linux
  • linux命令大全详细分类介绍及常用linux命令文档手册下载


  • 站内导航:


    特别声明:169IT网站部分信息来自互联网,如果侵犯您的权利,请及时告知,本站将立即删除!

    ©2012-2021,