Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.5.3, Lustre 2.8.0
-
None
-
2
-
9223372036854775807
Description
LustreError: 11-0: hw_nb-OST0016-osc-MDT0000: Communicating with 10.151.26.55@o2ib, operation ost_connect failed with -114. LustreError: 6488:0:(llog_cat.c:866:llog_cat_init_and_process()) hw_nb-OST0024-osc-MDT0000: llog_process() with cat_cancel_cb failed: rc = -5 LustreError: 6580:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5 LustreError: 6580:0:(osp_sync.c:874:osp_sync_thread()) LBUG Pid: 6580, comm: osp-syn-36-0 Call Trace: [<ffffffffa05cf895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa05cfe97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa10d9243>] osp_sync_thread+0x753/0x7d0 [osp] [<ffffffff81559b9e>] ? thread_return+0x4e/0x770 [<ffffffffa10d8af0>] ? osp_sync_thread+0x0/0x7d0 [osp] Entering kdb (current=0xffff8803b5e04080, pid 6580) on processor 3 Oops: (null) due to oops @ 0x0 kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task [3]kdb>
Attachments
Issue Links
- is related to
-
LU-9068 Hardware problem resulting in bad blocks
-
- Resolved
-
-
LU-8252 MDS kernel panic after aborting journal
-
- Resolved
-
-
LU-7011 Kernel part of llog subsystem can do self-repairing in some cases
-
- Resolved
-
- is related to
-
LU-5056 osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 6 changes, 8 in progress, 0 in flight: -5
-
- Resolved
-
(2 mentioned in)
It looks like llog has another (or the same) header written from 8192 offset. That is wrong and I'd like to investigate this to understand how that was possible.
Andreas, I agree, OSP code is quite aggressive towards possible IO errors