[LU-3414] Test failure on test suite replay-single, subtest test_53e Created: 29/May/13  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 8459

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite runs:
http://maloo.whamcloud.com/test_sets/a06094ce-c83a-11e2-b8c5-52540035b04c
https://maloo.whamcloud.com/test_sets/8ab52280-3536-11e2-918f-52540035b04c
https://maloo.whamcloud.com/test_sets/36d646d4-2704-11e2-b04c-52540035b04c

The sub-test test_53e failed with the following error:

test_53e failed with 2

Info required for matching: replay-single 53e



 Comments   
Comment by Andreas Dilger [ 29/May/13 ]

This looks like some kind of race in the test. The MDS console logs do not show the fail_loc being hit:

Lustre: DEBUG MARKER: == replay-single test 53e: |X| open reply while two MDC requests in flight == 10:59:03
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x119
Lustre: DEBUG MARKER: lctl set_param fail_loc=0
Lustre: DEBUG MARKER: replay-single test_53e: @@@@@@ FAIL: test_53e failed with 2 

while passing tests show the fail_loc is being hit:

Lustre: DEBUG MARKER: == replay-single test 53e: |X| open reply while two MDC requests in flight == 04:01:46
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x119
Lustre: *** cfs_fail_loc=119, val=2147483648***
LustreError: target_send_reply_msg()) @@@ dropping reply  req@ffff8800635ef000 x1436362426138208/t219043332110(0) o36->0e87a8e3-8c49-4757-2124-d93b92b79a2c@10.10.17.36@tcp:0/0 lens 488/448
DEBUG MARKER: lctl set_param fail_loc=0

Seems there is some code path that is bypassing this fail_loc. Oleg looked at the code, and the fail_loc is only being used indirectly (value stored, then checked later), so it is possible this is being overwritten in some rare cases?

Comment by Sarah Liu [ 01/Mar/15 ]

Hit this issue in interop test between 2.6.0 client and 2.7.0 server:

https://testing.hpdd.intel.com/test_sets/79a2ed54-bfc4-11e4-881f-5254006e85c2

Generated at Sat Feb 10 01:33:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.