Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.2
-
None
-
3
-
13202
Description
Hi,
After 3 days in production with Lustre 2.4.2, CEA is suffering from the following "assertion failed" issue about 5 times a day:
LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed: LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) LBUG Pid: 4089, comm: %%AQC.P.I.O Call Trace: [<ffffffffa0af4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0af4e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa1065d51>] lovsub_lock_state+0x1a1/0x1b0 [lov] [<ffffffffa0bd7a88>] cl_lock_state_signal+0x68/0x160 [obdclass] [<ffffffffa0bd7bd5>] cl_lock_state_set+0x55/0x190 [obdclass] [<ffffffffa0bdb8d9>] cl_enqueue_try+0x149/0x300 [obdclass] [<ffffffffa105e0da>] lov_lock_enqueue+0x22a/0x850 [lov] [<ffffffffa0bdb88c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa0bdcc7f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass] [<ffffffffa0bdd8ee>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa0be2b8c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa0be2dc2>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa10dba90>] ll_file_io_generic+0x450/0x600 [lustre] [<ffffffffa10dc9d2>] ll_file_aio_write+0x142/0x2c0 [lustre] [<ffffffffa10dccbc>] ll_file_write+0x16c/0x2a0 [lustre] [<ffffffff811895d8>] vfs_write+0xb8/0x1a0 [<ffffffff81189ed1>] sys_write+0x51/0x90 [<ffffffff81091039>] ? sys_times+0x29/0x70 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
This issue is very similar to LU-4693, which is itself a duplicate of LU-4692, for which there is unfortunately no fix yet.
Please ask if you need additional information that could help the diagnostic and resolution of the problem.
Sebastien.
every write needs a exclusive lock, write from other node will cause the lock holder to relinquish the lock, and multiple write upon the same file from different node will cause lock enqueue and lock blocking ast intertwined, by that I meant normal.