[LU-15893] replay-dual test_30: ldlm_cli_cancel on converting lock Created: 26/May/22  Updated: 26/May/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11276 racer: mdc_dev.c:1346:mdc_req_attr_se... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for John Hammond <jhammond@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/95e574e7-8b99-41dc-9287-a2ac4e2d5965

test_30 failed with the following error:

onyx-44vm3 crashed during replay-dual test_30

This is a failure of LASSERT(!ldlm_is_converting(lock)) in ldlm_cli_cleanup().

[ 4806.105733] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 30: layout lock replay is not blocked on IO ========================================================== 18:12:34 \(1653502354\)
[ 4806.511990] Lustre: DEBUG MARKER: == replay-dual test 30: layout lock replay is not blocked on IO ========================================================== 18:12:34 (1653502354)
[ 4806.576387] LustreError: 37471:0:(fail.c:138:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e sleeping for 4000ms
[ 4810.637737] LustreError: 37471:0:(fail.c:149:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e awake
[ 4810.640700] LustreError: 11-0: lustre-MDT0000-mdc-ffff98bd03d49800: operation ost_write to node 10.240.23.231@tcp failed: rc = -107
[ 4826.632462] LustreError: 166-1: MGC10.240.23.231@tcp: Connection to MGS (at 10.240.23.231@tcp) was lost; in progress operations using this service will fail
[ 4826.635829] LustreError: Skipped 2 previous similar messages
[ 4826.638669] Lustre: Evicted from MGS (at 10.240.23.231@tcp) after server handle changed from 0xbde5217ab8283ded to 0xbde5217ab829a8b2
[ 4826.640966] Lustre: Skipped 2 previous similar messages
[ 4826.660227] LustreError: 130333:0:(fail.c:138:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e sleeping for 4000ms
[ 4826.662942] LustreError: 130333:0:(fail.c:138:__cfs_fail_timeout_set()) Skipped 1 previous similar message
[ 4830.716829] LustreError: 130334:0:(fail.c:149:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e awake
[ 4830.737968] LustreError: 8013:0:(import.c:701:ptlrpc_connect_import_locked()) already connecting
[ 4830.740450] LustreError: 167-0: lustre-MDT0000-mdc-ffff98bd03d49800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[ 4830.743247] LustreError: Skipped 1 previous similar message
[ 4830.744635] Lustre: 8017:0:(llite_lib.c:3512:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.240.23.231@tcp:/lustre/fid: [0x200012511:0x162:0x0]// may get corrupted (rc -5)
[ 4830.748634] LustreError: 130392:0:(ldlm_request.c:1546:ldlm_cli_cancel()) ASSERTION( !((((lock))->l_flags & (1ULL << 25)) != 0) ) failed: 
[ 4830.750999] LustreError: 130392:0:(ldlm_request.c:1546:ldlm_cli_cancel()) LBUG
[ 4830.752406] Pid: 130392, comm: ll_imp_inval 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Tue Nov 16 14:42:35 UTC 2021
[ 4830.754255] Call Trace TBD:
[ 4830.755020] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
[ 4830.756060] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
[ 4830.757298] [<0>] ldlm_cli_cancel+0x245/0x510 [ptlrpc]
[ 4830.758367] [<0>] cleanup_resource+0x132/0x310 [ptlrpc]
[ 4830.759447] [<0>] ldlm_resource_clean+0x30/0x50 [ptlrpc]
[ 4830.760514] [<0>] cfs_hash_for_each_relax+0x253/0x450 [libcfs]
[ 4830.761669] [<0>] cfs_hash_for_each_nolock+0x11b/0x1f0 [libcfs]
[ 4830.762848] [<0>] ldlm_namespace_cleanup+0x2b/0xb0 [ptlrpc]
[ 4830.763999] [<0>] mdc_import_event+0x32d/0xcf0 [mdc]
[ 4830.765009] [<0>] ptlrpc_invalidate_import+0x28d/0x9f0 [ptlrpc]
[ 4830.766218] [<0>] ptlrpc_invalidate_import_thread+0x6d/0x260 [ptlrpc]
[ 4830.767522] [<0>] kthread+0x116/0x130
[ 4830.768304] [<0>] ret_from_fork+0x35/0x40

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_30 - onyx-44vm3 crashed during replay-dual test_30


Generated at Sat Feb 10 03:22:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.