Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13759

sanity-dom sanityn_test_20 fails with '1 page left in cache after lock cancel'

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.7
    • Lustre 2.14.0
    • DNE
    • 3
    • 9223372036854775807

    Description

      sanity-dom sanityn test_20 fails with '1 page left in cache after lock cancel'. This test started failing on 28 June 2020 and is only failing for DNE testing meaning in review-dne-part-4 and review-dne-zfs-part-4.

      sanity-dom runs several sanityn.sh tests with DOM enabled

       178 test_sanityn()
       179 {
       180         # XXX: to fix 60
       181         ONLY="1 2 4 5 6 7 8 9 10 11 12 14 17 19 20 23 27 39 51a 51c 51d" \
       182                 OSC="mdc" DOM="yes" bash sanityn.sh
       183 
       184         return 0
       185 }
       186 run_test sanityn "Run sanityn with Data-on-MDT files"
      

      and it is actually sanityn test 20 that we see fail here.

      There’s a couple of problems:
      1. sanityn test 20 is failing when DOM=”yes” is set
      2. when this test fails, sanity-dom is not marked as failed or not marked in a way that Maloo recognizes the failure. So, this is a silent failure

      This ticket deals with sanity-dom’s sanityn test 20 failure. I’ll open a different ticket for the sanity-dom failures not getting recognized as failures.

      For a recent failure, logs at https://testing.whamcloud.com/test_sets/5230daaa-9cb6-4bdf-98ad-330a658a197a, the suite_log doesn’t reveal anything about the cause of the failure

      == sanityn test 20: test extra readahead page left in cache ========================================== 09:32:02 (1594114322)
      striped dir -i0 -c2 -H fnv_1a_64 /mnt/lustre/d20
       sanityn test_20: @@@@@@ FAIL: 1 page left in cache after lock cancel 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6167:error()
        = sanityn.sh:600:test_20()
      

      Since the failure is not recognized as a failure by Maloo, there are no logs other than console logs to look at. The console logs do not provide any information on why the test is failing.

      Recent failures of this test are at:
      https://testing.whamcloud.com/test_sets/61841ecb-57f6-4c0f-b563-01eae76405f2
      https://testing.whamcloud.com/test_sets/88646434-24d8-41fc-81cc-43d19e862c07

      Attachments

        Issue Links

          Activity

            [LU-13759] sanity-dom sanityn_test_20 fails with '1 page left in cache after lock cancel'
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.12.7 [ 14793 ]

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40302/
            Subject: LU-13759 dom: lock cancel to drop pages
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 63b0c8f28dbd8513774219b8802370a638668811

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40302/ Subject: LU-13759 dom: lock cancel to drop pages Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 63b0c8f28dbd8513774219b8802370a638668811
            lixi_wc Li Xi made changes -
            Labels Original: DNE exap New: DNE
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Seems to be fixed

            pjones Peter Jones added a comment - Seems to be fixed

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40302
            Subject: LU-13759 dom: lock cancel to drop pages
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 59daace04573950e436385020c565399cae08c9e

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40302 Subject: LU-13759 dom: lock cancel to drop pages Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 59daace04573950e436385020c565399cae08c9e
            adilger Andreas Dilger made changes -
            Labels Original: DNE New: DNE exap

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39540/
            Subject: LU-13759 test: make sanityn test_20 repeatable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 910ed44d1f3844ae3f76a3594dbd1a09b5892643

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39540/ Subject: LU-13759 test: make sanityn test_20 repeatable Project: fs/lustre-release Branch: master Current Patch Set: Commit: 910ed44d1f3844ae3f76a3594dbd1a09b5892643

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39401/
            Subject: LU-13759 dom: lock cancel to drop pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e95eca236471cf23083ef281ef204a5920e4db9b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39401/ Subject: LU-13759 dom: lock cancel to drop pages Project: fs/lustre-release Branch: master Current Patch Set: Commit: e95eca236471cf23083ef281ef204a5920e4db9b
            adilger Andreas Dilger made changes -
            Summary Original: sanity-dom test 20 fails with '1 page left in cache after lock cancel' New: sanity-dom sanityn_test_20 fails with '1 page left in cache after lock cancel'

            People

              tappro Mikhail Pershin
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: