Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.2
-
None
-
3
-
13202
Description
Hi,
After 3 days in production with Lustre 2.4.2, CEA is suffering from the following "assertion failed" issue about 5 times a day:
LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed: LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) LBUG Pid: 4089, comm: %%AQC.P.I.O Call Trace: [<ffffffffa0af4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0af4e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa1065d51>] lovsub_lock_state+0x1a1/0x1b0 [lov] [<ffffffffa0bd7a88>] cl_lock_state_signal+0x68/0x160 [obdclass] [<ffffffffa0bd7bd5>] cl_lock_state_set+0x55/0x190 [obdclass] [<ffffffffa0bdb8d9>] cl_enqueue_try+0x149/0x300 [obdclass] [<ffffffffa105e0da>] lov_lock_enqueue+0x22a/0x850 [lov] [<ffffffffa0bdb88c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa0bdcc7f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass] [<ffffffffa0bdd8ee>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa0be2b8c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa0be2dc2>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa10dba90>] ll_file_io_generic+0x450/0x600 [lustre] [<ffffffffa10dc9d2>] ll_file_aio_write+0x142/0x2c0 [lustre] [<ffffffffa10dccbc>] ll_file_write+0x16c/0x2a0 [lustre] [<ffffffff811895d8>] vfs_write+0xb8/0x1a0 [<ffffffff81189ed1>] sys_write+0x51/0x90 [<ffffffff81091039>] ? sys_times+0x29/0x70 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
This issue is very similar to LU-4693, which is itself a duplicate of LU-4692, for which there is unfortunately no fix yet.
Please ask if you need additional information that could help the diagnostic and resolution of the problem.
Sebastien.
Seb, sorry to only answer to your own comment/reply on "21/Mar/14 3:53 PM", so yes it could be useful as a 1st debugging info to get the debug-log content extracted from the crash-dump. BTW, I hope that you run with a default debug-levels mask that will be enough to gather accurate traces for this problem ??...