目 录........................................................................................................................................... 1 1 故障摘要....................................................................................................................................... 1
1.1 故障的系统及配置 ........................................................................................................... 1 1.2 故障处理简要 ................................................................................................................... 1 2 故障现象(描述故障的总体情况) ........................................................................................... 1 3 故障分析及处理(记录故障分析及处理) ............................................................................... 2 4 目前状况....................................................................................................................................... 4 5 总结建议(本次故障处理的总结建议) ................................................................................... 4 6 遗留问题(故障处理完毕后尚遗留待处理的问题) ............................................................... 4
1 故障摘要
1.1 故障的系统及配置
硬件: SUN M4000
软件: XSCF,Solaris10
1.2 故障处理简要
2011-6-8:发现M4000内存有报警
2 故障现象(描述故障的总体情况)
远程登录XSCF XSCF> showstatus
MBU_A Status:Normal;
* MEMB#0 Status:Deconfigured; * MEM#0A Status:Deconfigured; * MEM#1A Status:Deconfigured; * MEM#2A Status:Degraded; * MEM#3A Status:Faulted; XSCF> showhardconf
* MEMB#0 Status:Deconfigured; Ver:0101h; Serial:BF0947H3W8 ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ; * MEM#0A Status:Deconfigured;
+ Code:2c000000000000000818HTF25672PY-667G10100-d36165a1;
+ Type:2A; Size:2 GB; * MEM#1A Status:Deconfigured;
+ Code:ce0000000000000001M3 93T5660QZA-CE6 4151-5224785b; + Type:2A; Size:2 GB; * MEM#2A Status:Degraded;
+ Code:ce0000000000000001M3 93T5660QZA-CE6 4151-5224785e; + Type:2A; Size:2 GB; * MEM#3A Status:Faulted;
+ Code:ce0000000000000001M3 93T5660QZA-CE6 4151-52247833; + Type:2A; Size:2 GB;
MEMB#1 Status:Normal; Ver:0101h; Serial:BF0947H3P2 ;
+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ; MEM#0A Status:Normal;
+ Code:2c000000000000000818HTF25672PY-667G10100-d3616807; + Type:2A; Size:2 GB;
3 故障分析及处理(记录故障分析及处理)
内存故障,处理过程如下 1.
首先掉电,然后现场更换状态为Degraded和Faulted的内存,启动后得到如下输出 XSCF> showstatus
* MBU_A Status:Degraded;
* MEMB#0 Status:Degraded; 2.
从原厂申请Service password,例如 TOW JOT RAID KAHN STAG SICK NOW CAFE VERY NASH FONT BORN GLOM FUSE OTT EAST SOOT ABLE JAB MORT BULL BAR DUTY MERT CURD SHUN SLIT DEE AX ABE 然后XSCF> enableservice
Service Password:
**** **** *** **** **** ***
**** **** **** **** **** ****
**** **** **** **** **** ***
**** **** *** **** **** ****
**** **** **** **** *** ***
Mode password is: JOEY DISH OILY XSCF> service
Mode password: **** **** ****
进入service,运行命令service>clearfault MBU_A
service>clearfault MBU_A/MEMB#0
检验内存是否正常service>showstatus MBU_A Status:Normal;
* MEMB#1 Status:Deconfigured; * MEM#0A Status:Deconfigured; * MEM#1A Status:Faulted;
* MEM#2A Status:Deconfigured; * MEM#3A Status:Deconfigured; 3.
掉电更换状态为Faulted的内存,重启后得到如下输出 XSCF> showstatus
MBU_A Status:Normal;
* MEMB#1 Status:Degraded; * MEM#0A Status:Degraded; 清除错误
service> clearfault MBU_A/MEMB#1 XSCF> showstatus
MBU_A Status:Normal;
MEMB#1 Status:Normal;
MEM#0A Status:Degraded;
service>clearfault /MBU_A/MEMB#1/MEM#0A
clearfault: Fault cannot be cleared for this FRU.
FRU will be marked to clear faulton next circuit breaker off and on. Continue? [y|n]: Y
Fault will be cleared after circuit breaker off and on
第三次掉电重启,黄灯灭,得到如下输出 XSCF> showstatus
No failures found in System Initialization. 进入操作系统
XSCF> console -d 0
Connect to DomainID 0?[y|n] :y 启动后,查看系统硬件信息 root@TJANACOL1 # prtdiag -v couldn't set locale correctly
System Configuration: Sun Microsystems sun4u Sun SPARC Enterprise M4000 Server
System clock frequency: 1012 MHz
Memory size: 16384 Megabytes #能够认到内存
4 目前状况
故障已经解决.系统业务运行恢复.
5 总结建议(本次故障处理的总结建议)
应注意2个问题
1.硬件更换必须在掉电情形下进行
2.更换硬件后,需要清除错误,并根据需要掉电重启2-3次
6 遗留问题(故障处理完毕后尚遗留待处理的问题)
因篇幅问题不能全部显示,请点此查看更多更全内容