Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13759

sanity-dom sanityn_test_20 fails with '1 page left in cache after lock cancel'

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.14.0
    • Fix Version/s: Lustre 2.14.0
    • Labels:
    • Environment:
      DNE
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      sanity-dom sanityn test_20 fails with '1 page left in cache after lock cancel'. This test started failing on 28 June 2020 and is only failing for DNE testing meaning in review-dne-part-4 and review-dne-zfs-part-4.

      sanity-dom runs several sanityn.sh tests with DOM enabled

       178 test_sanityn()
       179 {
       180         # XXX: to fix 60
       181         ONLY="1 2 4 5 6 7 8 9 10 11 12 14 17 19 20 23 27 39 51a 51c 51d" \
       182                 OSC="mdc" DOM="yes" bash sanityn.sh
       183 
       184         return 0
       185 }
       186 run_test sanityn "Run sanityn with Data-on-MDT files"
      

      and it is actually sanityn test 20 that we see fail here.

      There’s a couple of problems:
      1. sanityn test 20 is failing when DOM=”yes” is set
      2. when this test fails, sanity-dom is not marked as failed or not marked in a way that Maloo recognizes the failure. So, this is a silent failure

      This ticket deals with sanity-dom’s sanityn test 20 failure. I’ll open a different ticket for the sanity-dom failures not getting recognized as failures.

      For a recent failure, logs at https://testing.whamcloud.com/test_sets/5230daaa-9cb6-4bdf-98ad-330a658a197a, the suite_log doesn’t reveal anything about the cause of the failure

      == sanityn test 20: test extra readahead page left in cache ========================================== 09:32:02 (1594114322)
      striped dir -i0 -c2 -H fnv_1a_64 /mnt/lustre/d20
       sanityn test_20: @@@@@@ FAIL: 1 page left in cache after lock cancel 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6167:error()
        = sanityn.sh:600:test_20()
      

      Since the failure is not recognized as a failure by Maloo, there are no logs other than console logs to look at. The console logs do not provide any information on why the test is failing.

      Recent failures of this test are at:
      https://testing.whamcloud.com/test_sets/61841ecb-57f6-4c0f-b563-01eae76405f2
      https://testing.whamcloud.com/test_sets/88646434-24d8-41fc-81cc-43d19e862c07

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tappro Mikhail Pershin
                Reporter:
                jamesanunez James Nunez
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated: