Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.2.0
-
[root@n-mds1 ~]# cat /proc/fs/lustre/version
lustre: 2.2.0
kernel: patchless_client
build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64
[root@n-mds1 ~]# uname -r
2.6.32-220.4.2.el6_lustre.x86_64
[root@n-mds1 ~]# rpm -qa|grep lustre
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64[ root@n-mds1 ~]# cat /proc/fs/lustre/version lustre: 2.2.0 kernel: patchless_client build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# uname -r 2.6.32-220.4.2.el6_lustre.x86_64 [ root@n-mds1 ~]# rpm -qa|grep lustre lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64 lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64 kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64 kernel-2.6.32-220.4.2.el6_lustre.x86_64 kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64
Description
We recently experienced two MDS crashes on our Lustre installation.
I've attached the netconsole output of both crashes (that's all i got: there is nothing in the syslog and i wasn't able to create a screenshot of the console output as the crashed mds was already powercycled by its failover partner).
Well, the problem is that i can not reproduce the crash and i did not see any new crashes since 14. November.
(The crash was probably caused by an user job: There are about ~800 users on our cluster and i have no way to figure out what job crashed it).
But in any case: Even if the crash was triggered by an 1.8.x client: It should get fixed, shouldn't it?
And do we have any news about the llog_write_rec error? (did the debugfs output help?)