Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.5.2
-
None
-
RHEL6
-
3
-
15387
Description
after the recent upgrade to 2.5.2 on our servers, I've just tried to unmount the MDT (and MGS) to try and fail over to the second server (after applying the patches recommended in LU-5514). While waiting for the unmount to complete, we had this LBUG:
kernel:LustreError: 11779:0:(osp_sync.c:878:osp_sync_thread()) ASSERTION( count < 10 ) failed: lustre03-OST0009-osc: 1 1 empty
kernel:LustreError: 11779:0:(osp_sync.c:878:osp_sync_thread()) LBUG
The machine then rebooted, so not much debugging available, but I managed to get the following from the Red Hat crash logs (vmcore-dmesg.txt):
<0>LustreError: 11779:0:(osp_sync.c:878:osp_sync_thread()) ASSERTION( count < 10 ) failed: lustre03-OST0009-osc: 1 1 empty <0>LustreError: 11779:0:(osp_sync.c:878:osp_sync_thread()) LBUG <4>Pid: 11779, comm: osp-syn-9-0 <4> <4>Call Trace: <4> [<ffffffffa0515895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0515e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa1033132>] osp_sync_thread+0x6c2/0x7d0 [osp] <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa1032a70>] ? osp_sync_thread+0x0/0x7d0 [osp] <4> [<ffffffff8109ab56>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109aac0>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 11779, comm: osp-syn-9-0 Not tainted 2.6.32-431.17.1.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff8152795f>] ? panic+0xa7/0x16f <4> [<ffffffffa0515eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa1033132>] ? osp_sync_thread+0x6c2/0x7d0 [osp] <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa1032a70>] ? osp_sync_thread+0x0/0x7d0 [osp] <4> [<ffffffff8109ab56>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109aac0>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
This looks like it could be LU-5244, can someone confirm this and provide a patch for 2.5.2?
As we've only seen it during the MDT unmount so far, it's not that urgent at the moment, but if this is going to hit us during normal operation, users won't be happy...