Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.2
-
None
-
3
-
13202
Description
Hi,
After 3 days in production with Lustre 2.4.2, CEA is suffering from the following "assertion failed" issue about 5 times a day:
LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed: LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) LBUG Pid: 4089, comm: %%AQC.P.I.O Call Trace: [<ffffffffa0af4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0af4e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa1065d51>] lovsub_lock_state+0x1a1/0x1b0 [lov] [<ffffffffa0bd7a88>] cl_lock_state_signal+0x68/0x160 [obdclass] [<ffffffffa0bd7bd5>] cl_lock_state_set+0x55/0x190 [obdclass] [<ffffffffa0bdb8d9>] cl_enqueue_try+0x149/0x300 [obdclass] [<ffffffffa105e0da>] lov_lock_enqueue+0x22a/0x850 [lov] [<ffffffffa0bdb88c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa0bdcc7f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass] [<ffffffffa0bdd8ee>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa0be2b8c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa0be2dc2>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa10dba90>] ll_file_io_generic+0x450/0x600 [lustre] [<ffffffffa10dc9d2>] ll_file_aio_write+0x142/0x2c0 [lustre] [<ffffffffa10dccbc>] ll_file_write+0x16c/0x2a0 [lustre] [<ffffffff811895d8>] vfs_write+0xb8/0x1a0 [<ffffffff81189ed1>] sys_write+0x51/0x90 [<ffffffff81091039>] ? sys_times+0x29/0x70 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
This issue is very similar to LU-4693, which is itself a duplicate of LU-4692, for which there is unfortunately no fix yet.
Please ask if you need additional information that could help the diagnostic and resolution of the problem.
Sebastien.
Aurelien, concerning the evictions likely to be reproduced on site with this script, is it also possible to get a Lustre debug-log, at least from evicted Client side and with the full debug mask/traces enabled ?