Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.2
-
None
-
3
-
13202
Description
Hi,
After 3 days in production with Lustre 2.4.2, CEA is suffering from the following "assertion failed" issue about 5 times a day:
LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed: LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) LBUG Pid: 4089, comm: %%AQC.P.I.O Call Trace: [<ffffffffa0af4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0af4e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa1065d51>] lovsub_lock_state+0x1a1/0x1b0 [lov] [<ffffffffa0bd7a88>] cl_lock_state_signal+0x68/0x160 [obdclass] [<ffffffffa0bd7bd5>] cl_lock_state_set+0x55/0x190 [obdclass] [<ffffffffa0bdb8d9>] cl_enqueue_try+0x149/0x300 [obdclass] [<ffffffffa105e0da>] lov_lock_enqueue+0x22a/0x850 [lov] [<ffffffffa0bdb88c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa0bdcc7f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass] [<ffffffffa0bdd8ee>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa0be2b8c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa0be2dc2>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa10dba90>] ll_file_io_generic+0x450/0x600 [lustre] [<ffffffffa10dc9d2>] ll_file_aio_write+0x142/0x2c0 [lustre] [<ffffffffa10dccbc>] ll_file_write+0x16c/0x2a0 [lustre] [<ffffffff811895d8>] vfs_write+0xb8/0x1a0 [<ffffffff81189ed1>] sys_write+0x51/0x90 [<ffffffff81091039>] ? sys_times+0x29/0x70 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
This issue is very similar to LU-4693, which is itself a duplicate of LU-4692, for which there is unfortunately no fix yet.
Please ask if you need additional information that could help the diagnostic and resolution of the problem.
Sebastien.