Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.4.1
-
None
-
RHEL 6.4/distro IB
-
2
-
11773
Description
We encountered this assertion in production, libcfs_panic_on_lbug was set to 1, so server rebooted. On mount, the same assertion and lbug would occur. Filesystem will mount with panic_on_lbug set to 0. We've captured a crash dump and lustre log messages with the debug flags:
[root@atlas-mds3 ~]# cat /proc/sys/lnet/debug
trace ioctl neterror warning other error emerg ha config console
Ran e2fsck:
e2fsck -f -j /dev/mapper/atlas2-mdt1-journal /dev/mapper/atlas2-mdt1
and only fixed the quota inconsistencies it found.
At the moment, we are back to production after the osp_sync_threads lbugs on mount. There are hung task messages about osp_sync_threads as would be expected. We want to fix the root issue that is causing the assertions.
kernel messages during one of the failed mounts
Nov 21 21:16:44 atlas-mds3 kernel: [ 911.319839] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts:
Nov 21 21:16:44 atlas-mds3 kernel: [ 911.986208] Lustre: mdt_num_threads module parameter is deprecated, use mds_num_threads instead or unset both for dynamic thread startup
Nov 21 21:16:46 atlas-mds3 kernel: [ 913.069371] Lustre: atlas2-MDT0000: used disk, loading
Nov 21 21:16:47 atlas-mds3 kernel: [ 914.261572] LustreError: 18945:0:(osp_sync.c:862:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5
Nov 21 21:16:47 atlas-mds3 kernel: [ 914.278318] LustreError: 18945:0:(osp_sync.c:862:osp_sync_thread()) LBUG
Nov 21 21:16:47 atlas-mds3 kernel: [ 914.286036] Pid: 18945, comm: osp-syn-256
Nov 21 21:16:47 atlas-mds3 kernel: [ 914.290841]
Nov 21 21:16:47 atlas-mds3 kernel: [ 914.290844] Call Trace:
We also see this message:
Nov 21 23:01:01 atlas-mds3 kernel: [ 1512.633528] ERST: NVRAM ERST Log Address Range is not implemented yet