Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0, Lustre 2.16.0, Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
A system running LFSCK was crashing in a loop, apparently trying to destroy a bad object FID:
LustreError: 16300:0:(ldlm_resource.c:1488:ldlm_resource_get()) ASSERTION(name->name[0] != 0) failed: kernel:Kernel panic - not syncing: LBUG Call Trace: libcfs_call_trace+0x90/0xf0 [libcfs] lbug_with_loc+0x4c/0xa0 [libcfs] ldlm_resource_get+0x7e9/0x950 [ptlrpc] ldlm_lock_create+0x55/0xa60 [ptlrpc] ldlm_cli_enqueue_local+0xcc/0x850 [ptlrpc] lfsck_layout_slave_conditional_destroy [lfsck] lfsck_layout_slave_in_notify+0xa19/0xed0 [lfsck] lfsck_in_notify+0x23c/0x320 [lfsck] tgt_handle_lfsck_notify+0x5c/0x140 [ptlrpc] tgt_request_handle+0x8bf/0x18c0 [ptlrpc] ptlrpc_server_handle_request+0x253/0xc40 [ptlrpc] ptlrpc_main+0xc4a/0x1cb0 [ptlrpc] kthread+0xd1/0xe0
It probably makes sense to have lfsck_layout_slave_conditional_destroy() or a higher level check that the FID is valid before calling all the way down to ldlm_cli_enqueue_local().