2008年9月1日星期一

nfs故障处理

状况: 因为各种原因导致一台nfs不能使用,而以前挂这这nfs的前端web负载节节升高,经过观察发现,每当执行地方这个命令的时候,总是执行不完.
情况类似于Blog ^A jianingy's
解决:
killall -KILL rpciod
杀掉rpcio的话可以解决df堆积的问题,但是不能根除,因为下次执行df的时候rpcio还是会出现.特别是如果你在crond中使用了df的时候.
根本解决,卸载出问题的nfs,如果你出问题的nfs为
222.222.222.222:/opt/nfs
,那你可以直接卸载掉,
umount 222.222.222.222:/opt/nfs
这样执行df的时候就不会再出问题了.
反思:

soft If an NFS file operation has a major timeout then report an I/O error to the calling program. The default is to continue retrying NFS file operations indefinitely.
hard If an NFS file operation has a major timeout then report "server not responding" on the console and continue retrying indefinitely. This is the default.
intr If an NFS file operation has a major timeout and it is hard mounted, then allow signals to interupt the file operation and cause it to return EINTR to the calling program. The default is to not allow file operations to be interrupted.

使用intr应该不会出现类似的问题.不过还好,我们可以解决问题.所以做sa&dba一定要思考严密

没有评论: