[LU-12805] replay-single/36 is broken Created: 25/Sep/19  Updated: 29/Mar/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: always_except

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

replay-single/36 seem to be very old, written at times of CFS when all tests were local:
if $LCTL dk | grep "stale lock .*cookie"; then
error "cancel after replay failed"
i.e. it checks for server-side messages on the client.
I noticed that having it failing constantly locally. then tried to change so that server side's lock is grepped: https://review.whamcloud.com/#/c/36127/ – all testing sessions failed in autotest.
AFAIU, ELC is the root cause - it packs cancels, then client doesn't replay cancelled locks but the original pack is sent anyway.
I think we should disable 36 at least.



 Comments   
Comment by Gerrit Updater [ 25/Sep/19 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36291
Subject: LU-12805 tests: disable replay-single/36
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4017b4b99bd83303db1a1b8eb5305139c20c1b34

Comment by Andreas Dilger [ 26/Sep/19 ]

I have a patch to fix this I thought? https://review.whamcloud.com/35311

Comment by Alex Zhuravlev [ 26/Sep/19 ]

hm, the test verifies that there should be no cancels after recovery, right? but they do come due to ELC.

Comment by Alex Zhuravlev [ 26/Sep/19 ]

for example: https://testing.whamcloud.com/test_sets/31eccc00-dfbd-11e9-a0ba-52540065bddc
CMD: onyx-32vm12 /usr/sbin/lctl dk
00010000:00010000:1.0:1569433691.407285:0:31650:0:(ldlm_lockd.c:1708:ldlm_request_cancel()) ### server-side cancel handler stale lock (cookie 18295147502878251446)
00010000:00010000:1.0:1569433691.407286:0:31650:0:(ldlm_lockd.c:1708:ldlm_request_cancel()) ### server-side cancel handler stale lock (cookie 18295147502878251264)
00010000:00010000:1.0:1569433691.407287:0:31650:0:(ldlm_lockd.c:1708:ldlm_request_cancel()) ### server-side cancel handler stale lock (cookie 18295147502878251418)
replay-single test_36: @@@@@@ FAIL: cancel after replay failed

Comment by Gerrit Updater [ 28/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/36291/
Subject: LU-12805 tests: disable replay-single/36
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bdbc7f9f42b9b06aa4d93aabd4c4d559f2bbced1

Generated at Sat Feb 10 02:55:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.