Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14595

replay-single test 85b fails with 'unused locks (0) should be zero'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.6, Lustre 2.15.0
    • 3
    • 9223372036854775807

    Description

      replay-single test_85b fails with 'unused locks (0) should be zero'. Looking at the suite log for the failure at https://testing.whamcloud.com/test_sets/abb9065e-95b7-46bc-bb27-07ffe7934307, we see

      == replay-single test 85b: check the cancellation of unused locks during recovery(EXTENT) ============ 15:55:13 (1604159713)
      before recovery: unused locks count = 0
       replay-single test_85b: @@@@@@ FAIL: unused locks (0) should be zero 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6254:error()
        = /usr/lib64/lustre/tests/replay-single.sh:3134:test_85b()
      

      It seems like there is a problem here; the test reports that there are zero unused locks and then the test fails because there should be zero unused locks. Let’s look at the portion of the code that contains the error/error check:

      3127         lov_id=$(lctl dl | grep "clilov")
      3128         addr=$(echo $lov_id | awk '{print $4}' | awk -F '-' '{print $NF}')
      3129         count=$(lctl get_param -n \
      3130                           ldlm.namespaces.*OST0000*$addr.lock_unused_count)
      3131         echo "before recovery: unused locks count = $count"
      3132         [ $count -ne 0 ] || error "unused locks ($count) should be zero"
      

      Either the error message is wrong or we want to AND (&&) the test with the error message. Looking at the test, it seems like we produce unused locks and want a non-zero number of locks before failover.

      For the master branch, the last time this test failed was on 29 OCT 2020 for Lustre 2.13.56.45; https://testing.whamcloud.com/test_sets/abb9065e-95b7-46bc-bb27-07ffe7934307.
      For b2_12, the last time this test failed was on 19 OCT 2020 for Lustre 2.12.5.52; https://testing.whamcloud.com/test_sets/810f63ee-776d-43c8-9a8d-f740bc29aec8.

      Even though the test isn’t failing, we should fix the confusing error message.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: