[LU-16371] client: ASSERTION( !((((lock))->l_flags & (1ULL << 25)) != 0) ) failed: Created: 07/Dec/22  Updated: 25/Apr/23  Resolved: 04/Apr/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

client crash with an assertion failure

[1734537.865642] Lustre: testfs-MDT0001-mdc-ffff8d5f0512c800:Connection to testfs-MDT0001 (at xxxx.xxx.xxx@o2ib) was lost; in progress operations using this service will wait for recovery to complete
[1734537.885054] Lustre: Skipped 8 previous similar messages
[1734537.891452] LustreError: 3416350:0:import.c:711:ptlrpc_connect_import_locked()) already connecting
[1734593.289395] Lustre: Evicted from MGS (at MGCxxx.xxx.xxx@o2ib_0) after server handle changed from 0xc3fe297edd52f4ff to 0x6aa0ef669c42e71e
[1734655.934378] LustreError: 11-0: testfs-OST0004-osc-ffff8d5f0512c800: operation ldlm_enqueue to node xxx.xxx.xxx@o2ib failed: rc = -19
[1734655.950278] LustreError: Skipped 2 previous similar messages
[1734769.302478] LustreError: 167-0: testfs-MDT0003-mdc-ffff8d5f0512c800: This client was evicted by testfs-MDT0003; in progress operations using this service will fail.
[1734769.320698] LustreError:3417045:0:ldlm_request.c:1448:ldlm_cli_cancel()) ASSERTION( !((((lock))->l_flags & (1ULL << 25)) != 0) ) failed: 
[1734769.338032] LustreError: 3417045:0:ldlm_request.c:1448:ldlm_cli_cancel()) LBUG
[1734769.346394] Pid: 3417045, comm: ll_imp_inval 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Tue Sep 13 06:07:14 EDT 2022
[1734769.357356] Call Trace TBD:
[1734769.361207] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
[1734769.367392] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[1734769.373214] [<0>] ldlm_cli_cancel+0x37c/0x520 [ptlrpc]
[1734769.379428] [<0>] cleanup_resource+0x132/0x340 [ptlrpc]
[1734769.385711] [<0>] ldlm_resource_clean+0x30/0x50 [ptlrpc]
[1734769.392042] [<0>] cfs_hash_for_each_relax+0x253/0x450 [libcfs]
[1734769.398859] [<0>] cfs_hash_for_each_nolock+0x11d/0x1a0 [libcfs]
[1734769.405754] [<0>] ldlm_namespace_cleanup+0x2b/0xb0 [ptlrpc]
[1734769.412313] [<0>] mdc_import_event+0x264/0xae0 [mdc]
[1734769.418228] [<0>] ptlrpc_invalidate_import+0x425/0xa30 [ptlrpc]
[1734769.425125] [<0>] ptlrpc_invalidate_import_thread+0x3e/0x1c0 [ptlrpc]
[1734769.432537] [<0>] kthread+0x10a/0x120
[1734769.437180] [<0>] ret_from_fork+0x1f/0x40
[1734769.442114] Kernel panic - not syncing: LBUG
[1734769.447270] CPU: 43 PID: 3417045 Comm: ll_imp_inval Kdump: loaded Tainted: GIOE    --------- -  - 4.18.0-372.26.1.el8_6.x86_64 #1
[1734769.469914] Call Trace:
[1734769.473234]  dump_stack+0x41/0x60
[1734769.477414]  panic+0xe7/0x2ac
[1734769.481230]  ? ret_from_fork+0x1f/0x40
[1734769.485809]  lbug_with_loc.cold.4+0x18/0x18 [libcfs]
[1734769.491596]  ldlm_cli_cancel+0x37c/0x520 [ptlrpc]
[1734769.497139]  ? ldlm_resource_free+0x1b7/0x2f0 [ptlrpc]
[1734769.503099]  cleanup_resource+0x132/0x340 [ptlrpc]
[1734769.508711]  ldlm_resource_clean+0x30/0x50 [ptlrpc]
[1734769.514394]  cfs_hash_for_each_relax+0x253/0x450 [libcfs]
[1734769.520562]  ? cleanup_resource+0x340/0x340 [ptlrpc]
[1734769.526314]  ? cleanup_resource+0x340/0x340 [ptlrpc]
[1734769.532044]  cfs_hash_for_each_nolock+0x11d/0x1a0 [libcfs]
[1734769.538267]  ldlm_namespace_cleanup+0x2b/0xb0 [ptlrpc]
[1734769.544151]  mdc_import_event+0x264/0xae0 [mdc]
[1734769.549402]  ptlrpc_invalidate_import+0x425/0xa30 [ptlrpc]
[1734769.555622]  ? ptlrpc_import_recovery_state_machine+0x9d0/0x9d0 [ptlrpc]
[1734769.563059]  ? libcfs_debug_msg+0x55/0x70 [libcfs]
[1734769.568555]  ? ptlrpc_import_recovery_state_machine+0x9d0/0x9d0 [ptlrpc]
[1734769.575976]  ptlrpc_invalidate_import_thread+0x3e/0x1c0 [ptlrpc]
[1734769.582691]  kthread+0x10a/0x120
[1734769.586586]  ? set_kthread_struct+0x40/0x40
[1734769.591415]  ret_from_fork+0x1f/0x40


 Comments   
Comment by Zhenyu Xu [ 07/Dec/22 ]

"Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49339
Subject: LU-16371 ldlm: clear lock converting flag on resource cleanup
Project: ex/lustre-release
Branch: master
Current Patch Set: 1
Commit: 25099518ae90db16d0e7b311eab85e9acb22c30f
 

Comment by Gerrit Updater [ 04/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49339/
Subject: LU-16371 ldlm: clear lock converting flag on resource cleanup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4990f4ef5eb81d8017c9992c1f6924527dc8ce60

Comment by Peter Jones [ 04/Apr/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:26:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.