Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.6, Lustre 2.15.0
-
3
-
9223372036854775807
Description
replay-single test_85b fails with 'unused locks (0) should be zero'. Looking at the suite log for the failure at https://testing.whamcloud.com/test_sets/abb9065e-95b7-46bc-bb27-07ffe7934307, we see
== replay-single test 85b: check the cancellation of unused locks during recovery(EXTENT) ============ 15:55:13 (1604159713) before recovery: unused locks count = 0 replay-single test_85b: @@@@@@ FAIL: unused locks (0) should be zero Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6254:error() = /usr/lib64/lustre/tests/replay-single.sh:3134:test_85b()
It seems like there is a problem here; the test reports that there are zero unused locks and then the test fails because there should be zero unused locks. Let’s look at the portion of the code that contains the error/error check:
3127 lov_id=$(lctl dl | grep "clilov") 3128 addr=$(echo $lov_id | awk '{print $4}' | awk -F '-' '{print $NF}') 3129 count=$(lctl get_param -n \ 3130 ldlm.namespaces.*OST0000*$addr.lock_unused_count) 3131 echo "before recovery: unused locks count = $count" 3132 [ $count -ne 0 ] || error "unused locks ($count) should be zero"
Either the error message is wrong or we want to AND (&&) the test with the error message. Looking at the test, it seems like we produce unused locks and want a non-zero number of locks before failover.
For the master branch, the last time this test failed was on 29 OCT 2020 for Lustre 2.13.56.45; https://testing.whamcloud.com/test_sets/abb9065e-95b7-46bc-bb27-07ffe7934307.
For b2_12, the last time this test failed was on 19 OCT 2020 for Lustre 2.12.5.52; https://testing.whamcloud.com/test_sets/810f63ee-776d-43c8-9a8d-f740bc29aec8.
Even though the test isn’t failing, we should fix the confusing error message.
Attachments
Issue Links
- mentioned in
-
Page Loading...