[LU-14371] replay-single test 65b fails with 'No early reply' Created: 27/Jan/21  Updated: 07/Oct/22  Resolved: 07/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

(So far only) DNE


Issue Links:
Related
is related to LU-9566 replay-single test_65a: @@@@@@ FAIL:... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

replay-single test_65b fails with 'No early reply'. We see this test fail frequently for PPC64 client testing. If we ignore the PPC failures, we see this test fail for DNE configurations from 16 FEB 2020 with https://testing.whamcloud.com/test_sets/9766a818-51c7-11ea-a90e-52540065bddc. In all of these cases, replay-single tests 0c and 0d fail with 'mount fails' and then a test after 65b will hang.

Looking at the test_suite for a recent failure at https://testing.whamcloud.com/test_sets/2efe802a-db0c-4424-af23-e1a7e8ce99a4, we see the test output

== replay-single test 65b: AT: verify early replies on packed reply / bulk =========================== 02:08:17 (1611540497)
CMD: trevis-63vm4 lctl get_param -n at_max
CMD: trevis-63vm1.trevis.whamcloud.com lctl get_param -n at_max
CMD: trevis-63vm3 lctl get_param -n at_max
CMD: trevis-63vm4 lctl get_param -n at_max
CMD: trevis-63vm1.trevis.whamcloud.com lctl get_param -n at_max
CMD: trevis-63vm3 lctl get_param -n at_max
CMD: trevis-63vm4 lctl get_param -n at_history
CMD: trevis-63vm4 lctl set_param at_history=8
at_history=8
CMD: trevis-63vm3 lctl set_param at_history=8
at_history=8
CMD: trevis-63vm4 /usr/sbin/lctl get_param -n debug
debug=other trace
CMD: trevis-63vm3 lctl set_param fail_val=6
fail_val=6
CMD: trevis-63vm3 /usr/sbin/lctl set_param fail_loc=0x224
fail_loc=0x224
CMD: trevis-63vm3 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
 replay-single test_65b: @@@@@@ FAIL: No early reply 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/replay-single.sh:1864:test_65b()

There are no errors in the console/dmesg logs.

This issue maybe be related to LU-9566 replay-single test_65a: @@@@@@ FAIL: No early reply

Logs for other failures are at
https://testing.whamcloud.com/test_sets/6b7431ba-6424-408d-a379-e5fb422df642
https://testing.whamcloud.com/test_sets/fee3e68e-26e9-4a53-a83e-7d8ea9ad4bd1
https://testing.whamcloud.com/test_sets/d097765d-c4f1-4070-9c42-f1a1b1f7b258



 Comments   
Comment by Andreas Dilger [ 07/Oct/22 ]

Duplicate of LU-9566

Generated at Sat Feb 10 03:09:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.