Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.5.1
-
None
-
2
-
14002
Description
Hello,
We are seeing following error message on Lustre 2.5.1 clients, and it makes the system not responsive. multiple clients were affected with this issue.
System Details: Lustre 2.5.1 / RHEL 6.5
Here are the node names, time stamps and one according message:
May 4 11:03:28 uc1n055 kernel: LustreError: 1979:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
May 4 18:15:43 uc1n468 kernel: LustreError: 42888:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
May 4 18:54:19 uc1n059 kernel: LustreError: 111650:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed:
May 9 09:21:08 uc1n129 kernel: LustreError: 93767:0:(statahead.c:1704:do_statahead_enter()) LBUG
May 10 09:28:14 uc1n996 kernel: LustreError: 7387:0:(osc_lock.c:1224:osc_lock_wait()) LBUG
May 15 07:50:57 uc1n198 kernel: LustreError: 25007:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
I think that the following assertion is already fixed by
LU-3498:That assertion is inside an if block which is only executed when do_statahead_enter() thinks that the thread creation failed.
Here is the relevant portion of do_statahead_enter() in version 2.5.1 (which has the
LU-3498bug):So with the fix for
LU-3498, this code will not be executed unless the thread creation actually fails. If the thread creation fails, your patched code which does an extra ll_sai_put(sai) will not be executed anyway.Unless I'm missing something, I don't think this patch belongs to this ticket. It probably is more appropriate to link that patch against
LU-5274.