[LU-14595] replay-single test 85b fails with 'unused locks (0) should be zero' - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.12.6, Lustre 2.15.0
Labels:
- failover

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

replay-single test_85b fails with 'unused locks (0) should be zero'. Looking at the suite log for the failure at https://testing.whamcloud.com/test_sets/abb9065e-95b7-46bc-bb27-07ffe7934307, we see

== replay-single test 85b: check the cancellation of unused locks during recovery(EXTENT) ============ 15:55:13 (1604159713)
before recovery: unused locks count = 0
 replay-single test_85b: @@@@@@ FAIL: unused locks (0) should be zero 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6254:error()
  = /usr/lib64/lustre/tests/replay-single.sh:3134:test_85b()

It seems like there is a problem here; the test reports that there are zero unused locks and then the test fails because there should be zero unused locks. Let’s look at the portion of the code that contains the error/error check:

3127         lov_id=$(lctl dl | grep "clilov")
3128         addr=$(echo $lov_id | awk '{print $4}' | awk -F '-' '{print $NF}')
3129         count=$(lctl get_param -n \
3130                           ldlm.namespaces.*OST0000*$addr.lock_unused_count)
3131         echo "before recovery: unused locks count = $count"
3132         [ $count -ne 0 ] || error "unused locks ($count) should be zero"

Either the error message is wrong or we want to AND (&&) the test with the error message. Looking at the test, it seems like we produce unused locks and want a non-zero number of locks before failover.

For the master branch, the last time this test failed was on 29 OCT 2020 for Lustre 2.13.56.45; https://testing.whamcloud.com/test_sets/abb9065e-95b7-46bc-bb27-07ffe7934307.
For b2_12, the last time this test failed was on 19 OCT 2020 for Lustre 2.12.5.52; https://testing.whamcloud.com/test_sets/810f63ee-776d-43c8-9a8d-f740bc29aec8.

Even though the test isn’t failing, we should fix the confusing error message.

Attachments

Issue Links

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-14595] replay-single test 85b fails with 'unused locks (0) should be zero'

There are no comments yet on this issue.

People

Assignee:: WC Triage

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Apr/21 6:39 PM

Updated:: 09/Apr/21 5:28 PM