当前位置: 技术问答>linux和unix
nagios加ndo监控563台机器,5066个服务,监控时间不更新
来源: 互联网 发布时间:2016-11-30
本文导语: 请教哪位大虾,小弟是搞开发的,最近由于项目需要让我搞nagios监控,一时摸不着头脑,情况如下: nagios版本:3.0.6 有563台机器,5066个服务需要监控,机器分别在192.168.33.0,192.168.34.0,192.168.35.0,192.168.36.0,192.168.38.0,192.1...
请教哪位大虾,小弟是搞开发的,最近由于项目需要让我搞nagios监控,一时摸不着头脑,情况如下:
nagios版本:3.0.6
有563台机器,5066个服务需要监控,机器分别在192.168.33.0,192.168.34.0,192.168.35.0,192.168.36.0,192.168.38.0,192.168.39.0
这几个网段,个网段之间应该是用的路由器转发的,采用的是nagios的nrpe主动监控方式,开始是没加ndo转存的,监控记录如下:
------------------------------------------------------------
未加转存:
begin: Thu Sep 30 14:39:15 CST 2010
middle: Thu Sep 30 15:01:51 CST 2010(初始化服务和主机监控)
over: Thu Sep 30 15:13:51 CST 2010(主机监控轮询一次)
down机测试(29-1 192.168.35.2加防火墙禁ping)
begin: Thu Sep 30 15:18:35 CST 2010
over: Thu Sep 30 15:28:15 CST 2010
------------------------------------------------------------
但是加了转存以后一直卡住,nagios页面上的Last check时间一直没变,底下是日志信息:
------------------------------------------------------------
[1286497616] Warning: Contact 'nagiosadmin' service notification command '/usr/bin/printf "%b" "***** Nagios *****nnNotification Type: PROBLEMnnService: mycheck-gmondnHost: compute-28-6.localnAddress: 192.168.38.7nState: CRITICALnnDate/Time: Fri Oct 8 08:26:24 CST 2010nnAdditional Info:nntelnet localhost no host data gmond==1!." | /usr/local/nagios/bin/sendEmail -f t427795737@live.cn -t nagios@lost -s smtp.live.com -u "** PROBLEM Service Alert: compute-28-6.local/mycheck-gmond is CRITICAL **" -xu t427795737 -xp 87997739' timed out after 30 seconds
[1286497616] SERVICE NOTIFICATION: nagiosadmin;vm-container-17-7.local;HTTP;UNKNOWN;notify-service-by-email;check_http: Invalid option - SSL is not available
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497626] SERVICE ALERT: vm-container-22-13.local;mycheck-gmond;OK;SOFT;2;OK - Process gmond is running.0
[1286497626] SERVICE EVENT HANDLER: vm-container-22-13.local;mycheck-gmond;(null);(null);(null);r_gmond!$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$
[1286497626] SERVICE NOTIFICATION: nagiosadmin;vm-container-31-13.local;HTTP;CRITICAL;notify-service-by-email;Connection refused
[1286497637] ndomod: Successfully connected to data sink. 934727 items lost, 5000 queued items to flush.
------------------------------------------------------------
加了ndo以后最一行日志一直是那样并没有更新。
nagios.cfg主要配置如下:
------------------------------------------------------------
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=0
command_check_interval=15s
command_file=/usr/local/nagios/var/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
event_broker_options=-1
broker_module=/usr/local/nagios/bin/ndomod-3x.o config_file=/usr/local/nagios/etc/ndomod.cfg
max_concurrent_checks=100
check_result_reaper_frequency=10
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
obsess_over_hosts=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
use_large_installation_tweaks=1
------------------------------------------------------------
最后申明一下,小弟我以前是搞java开发的,对linux只是略微熟悉,C语言懂的更少了,麻烦各位大虾能帮帮我,上面的经理压的紧,都快两个星期了,还是没搞定,谢谢
nagios版本:3.0.6
有563台机器,5066个服务需要监控,机器分别在192.168.33.0,192.168.34.0,192.168.35.0,192.168.36.0,192.168.38.0,192.168.39.0
这几个网段,个网段之间应该是用的路由器转发的,采用的是nagios的nrpe主动监控方式,开始是没加ndo转存的,监控记录如下:
------------------------------------------------------------
未加转存:
begin: Thu Sep 30 14:39:15 CST 2010
middle: Thu Sep 30 15:01:51 CST 2010(初始化服务和主机监控)
over: Thu Sep 30 15:13:51 CST 2010(主机监控轮询一次)
down机测试(29-1 192.168.35.2加防火墙禁ping)
begin: Thu Sep 30 15:18:35 CST 2010
over: Thu Sep 30 15:28:15 CST 2010
------------------------------------------------------------
但是加了转存以后一直卡住,nagios页面上的Last check时间一直没变,底下是日志信息:
------------------------------------------------------------
[1286497616] Warning: Contact 'nagiosadmin' service notification command '/usr/bin/printf "%b" "***** Nagios *****nnNotification Type: PROBLEMnnService: mycheck-gmondnHost: compute-28-6.localnAddress: 192.168.38.7nState: CRITICALnnDate/Time: Fri Oct 8 08:26:24 CST 2010nnAdditional Info:nntelnet localhost no host data gmond==1!." | /usr/local/nagios/bin/sendEmail -f t427795737@live.cn -t nagios@lost -s smtp.live.com -u "** PROBLEM Service Alert: compute-28-6.local/mycheck-gmond is CRITICAL **" -xu t427795737 -xp 87997739' timed out after 30 seconds
[1286497616] SERVICE NOTIFICATION: nagiosadmin;vm-container-17-7.local;HTTP;UNKNOWN;notify-service-by-email;check_http: Invalid option - SSL is not available
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497625] Max concurrent service checks (100) has been reached. Delaying further checks until previous checks are complete...
[1286497626] SERVICE ALERT: vm-container-22-13.local;mycheck-gmond;OK;SOFT;2;OK - Process gmond is running.0
[1286497626] SERVICE EVENT HANDLER: vm-container-22-13.local;mycheck-gmond;(null);(null);(null);r_gmond!$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$
[1286497626] SERVICE NOTIFICATION: nagiosadmin;vm-container-31-13.local;HTTP;CRITICAL;notify-service-by-email;Connection refused
[1286497637] ndomod: Successfully connected to data sink. 934727 items lost, 5000 queued items to flush.
------------------------------------------------------------
加了ndo以后最一行日志一直是那样并没有更新。
nagios.cfg主要配置如下:
------------------------------------------------------------
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=0
command_check_interval=15s
command_file=/usr/local/nagios/var/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
event_broker_options=-1
broker_module=/usr/local/nagios/bin/ndomod-3x.o config_file=/usr/local/nagios/etc/ndomod.cfg
max_concurrent_checks=100
check_result_reaper_frequency=10
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
obsess_over_hosts=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
use_large_installation_tweaks=1
------------------------------------------------------------
最后申明一下,小弟我以前是搞java开发的,对linux只是略微熟悉,C语言懂的更少了,麻烦各位大虾能帮帮我,上面的经理压的紧,都快两个星期了,还是没搞定,谢谢
|
没用过NDO转存 帮不上忙 检查NDO转存的相关配置
另外 你那个日志上也有警告信息 警告的内容很明确了 你最好也给解决掉。
另外 你那个日志上也有警告信息 警告的内容很明确了 你最好也给解决掉。