当前位置: 技术问答>linux和unix
linux定期死一次机。不知道为什么。
来源: 互联网 发布时间:2016-01-05
本文导语: samba服务器总是定期一两天就会死机。 这里把今天的运行状态给拿出来大伙看看,哪里有问题了. 一般连接数多的时候会死掉.但不太可能是因为这个原因,因为别的几十个ser都在一样的工作着. 怀疑是硬件.换过主板. ...
samba服务器总是定期一两天就会死机。
这里把今天的运行状态给拿出来大伙看看,哪里有问题了.
一般连接数多的时候会死掉.但不太可能是因为这个原因,因为别的几十个ser都在一样的工作着. 怀疑是硬件.换过主板.
redhat4.0企业版.内核2.4.20. 配置是Gigabyte848P + P42.4 + 512MB RAM + D-Link千兆网卡 客户端512台.. 最多时候连接四五十左右.主要工作是对比以及下载.
网卡有集成的百兆eth0和一个另加的千兆eth1.在用是的千兆. 死机频率是一天到三天之内.
重启又好了. 请各位给一个帮助吧.谢谢啦
(大家注意看"top"命令的结果....iowait. )
[root@ecofe2 root]# top
11:56:57 up 1 day, 14:06, 2 users, load average: 0.04, 0.09, 0.08
52 processes: 51 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.0% 0.2% 0.0% 0.4% 0.0% 99.4%
Mem: 504652k av, 499140k used, 5512k free, 0k shrd, 325560k buff
382432k actv, 74008k in_d, 10820k in_c
Swap: 1309256k av, 15624k used, 1293632k free 135072k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
26689 oface 15 0 2932 2536 2040 S 0.2 0.5 0:00 0 smbd
1 root 15 0 484 448 428 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:26 0 keventd
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
7 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
5 root 15 0 0 0 0 SW 0.0 0.0 1:59 0 kswapd
6 root 15 0 0 0 0 SW 0.0 0.0 0:03 0 kscand
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
13 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
72 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
709 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
710 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
711 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
[root@ecofe2 root]# service smb status
smbd (pid 26774 26772 26771 26769 26716 26620 26619 8087 5378) is running...
nmbd (pid 5382) is running...
[root@ecofe2 root]# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 256 172.16.1.249:ssh 172.16.8.47:2594 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.9.19:3093 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.1.253:1034 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.59:1045 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.137:1031 ESTABLISHED
tcp 0 0 172.16.1.2:microsoft-ds 172.16.7.137:1028 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.139:4417 ESTABLISHED
tcp 0 4 172.16.1.24:netbios-ssn 172.16.9.133:1026 ESTABLISHED
tcp 0 4 172.16.1.24:netbios-ssn 172.16.9.133:1027 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.152:1311 ESTABLISHED
tcp 0 1460 172.16.1.24:netbios-ssn 172.16.8.134:1038 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 8 [ ] DGRAM 4503 /dev/log
unix 2 [ ] DGRAM 5794
unix 2 [ ] DGRAM 5202
unix 2 [ ] DGRAM 5192
unix 2 [ ] DGRAM 5157
unix 2 [ ] DGRAM 4893
unix 2 [ ] DGRAM 4511
[root@ecofe2 root]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:11:2F:BC:00:01
inet addr:172.16.1.249 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:6563 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:1612135 (1.5 Mb)
Interrupt:11 Base address:0x1c00
eth1 Link encap:Ethernet HWaddr 00:15:E9:AD:55:C7
inet addr:172.16.1.249 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:219156043 errors:0 dropped:0 overruns:0 frame:0
TX packets:483759830 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3233908600 (3084.0 Mb) TX bytes:2048871747 (1953.9 Mb)
Interrupt:6 Memory:fe5fc000-0
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:33741 errors:0 dropped:0 overruns:0 frame:0
TX packets:33741 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1858446 (1.7 Mb) TX bytes:1858446 (1.7 Mb)
[root@ecofe2 root]# free
total used free shared buffers cached
Mem: 504652 498336 6316 0 325924 135188
-/+ buffers/cache: 37224 467428
Swap: 1309256 15084 1294172
另外,不知道还有什么参数可以查看一些更加具体的运行状态? 本人菜鸟.请赐教.
这里把今天的运行状态给拿出来大伙看看,哪里有问题了.
一般连接数多的时候会死掉.但不太可能是因为这个原因,因为别的几十个ser都在一样的工作着. 怀疑是硬件.换过主板.
redhat4.0企业版.内核2.4.20. 配置是Gigabyte848P + P42.4 + 512MB RAM + D-Link千兆网卡 客户端512台.. 最多时候连接四五十左右.主要工作是对比以及下载.
网卡有集成的百兆eth0和一个另加的千兆eth1.在用是的千兆. 死机频率是一天到三天之内.
重启又好了. 请各位给一个帮助吧.谢谢啦
(大家注意看"top"命令的结果....iowait. )
[root@ecofe2 root]# top
11:56:57 up 1 day, 14:06, 2 users, load average: 0.04, 0.09, 0.08
52 processes: 51 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.0% 0.2% 0.0% 0.4% 0.0% 99.4%
Mem: 504652k av, 499140k used, 5512k free, 0k shrd, 325560k buff
382432k actv, 74008k in_d, 10820k in_c
Swap: 1309256k av, 15624k used, 1293632k free 135072k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
26689 oface 15 0 2932 2536 2040 S 0.2 0.5 0:00 0 smbd
1 root 15 0 484 448 428 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:26 0 keventd
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
7 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
5 root 15 0 0 0 0 SW 0.0 0.0 1:59 0 kswapd
6 root 15 0 0 0 0 SW 0.0 0.0 0:03 0 kscand
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
13 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
72 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
709 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
710 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
711 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
[root@ecofe2 root]# service smb status
smbd (pid 26774 26772 26771 26769 26716 26620 26619 8087 5378) is running...
nmbd (pid 5382) is running...
[root@ecofe2 root]# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 256 172.16.1.249:ssh 172.16.8.47:2594 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.9.19:3093 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.1.253:1034 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.59:1045 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.137:1031 ESTABLISHED
tcp 0 0 172.16.1.2:microsoft-ds 172.16.7.137:1028 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.139:4417 ESTABLISHED
tcp 0 4 172.16.1.24:netbios-ssn 172.16.9.133:1026 ESTABLISHED
tcp 0 4 172.16.1.24:netbios-ssn 172.16.9.133:1027 ESTABLISHED
tcp 0 0 172.16.1.24:netbios-ssn 172.16.7.152:1311 ESTABLISHED
tcp 0 1460 172.16.1.24:netbios-ssn 172.16.8.134:1038 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 8 [ ] DGRAM 4503 /dev/log
unix 2 [ ] DGRAM 5794
unix 2 [ ] DGRAM 5202
unix 2 [ ] DGRAM 5192
unix 2 [ ] DGRAM 5157
unix 2 [ ] DGRAM 4893
unix 2 [ ] DGRAM 4511
[root@ecofe2 root]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:11:2F:BC:00:01
inet addr:172.16.1.249 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:6563 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:1612135 (1.5 Mb)
Interrupt:11 Base address:0x1c00
eth1 Link encap:Ethernet HWaddr 00:15:E9:AD:55:C7
inet addr:172.16.1.249 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:219156043 errors:0 dropped:0 overruns:0 frame:0
TX packets:483759830 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3233908600 (3084.0 Mb) TX bytes:2048871747 (1953.9 Mb)
Interrupt:6 Memory:fe5fc000-0
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:33741 errors:0 dropped:0 overruns:0 frame:0
TX packets:33741 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1858446 (1.7 Mb) TX bytes:1858446 (1.7 Mb)
[root@ecofe2 root]# free
total used free shared buffers cached
Mem: 504652 498336 6316 0 325924 135188
-/+ buffers/cache: 37224 467428
Swap: 1309256 15084 1294172
另外,不知道还有什么参数可以查看一些更加具体的运行状态? 本人菜鸟.请赐教.
|
看看klog,特别是重启前部分
一般Linux是开了一个CPU监控服务的,如果CPU温度过高,就会自动关机/重启
如果你的是死机--系统卡住,无法操作,那么应该是系统进入死锁了,那就要看你所运行的服务/程序了
一般Linux是开了一个CPU监控服务的,如果CPU温度过高,就会自动关机/重启
如果你的是死机--系统卡住,无法操作,那么应该是系统进入死锁了,那就要看你所运行的服务/程序了