Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14201

replay-single test 89 fails with '3072 blocks leaked'

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.6
    • None
    • ZFS
    • 3
    • 9223372036854775807

    Description

      replay-single test_89 fails with '3072 blocks leaked'. We’ve seen this test fail with this error message, LU-1867 and LU-5761, but all these tickets are closed. Since late September 2020, we’ve seen this test fail with this error for branch testing and for patches, but there are several other replay-single tests that fail prior to test 89; for example https://testing.whamcloud.com/test_sets/b83cb774-5cb2-473e-a641-90e5875fe6a6.

      For the test failure at https://testing.whamcloud.com/test_sets/a6260ca9-b7a0-4818-9d48-ab79249ba526, the last lines in the suite_log are

      Waiting for orphan cleanup...
      CMD: trevis-20vm4 /usr/sbin/lctl list_param osp.*osc*.old_sync_processed 2> /dev/null
      osp.lustre-OST0000-osc-MDT0000.old_sync_processed
      osp.lustre-OST0001-osc-MDT0000.old_sync_processed
      osp.lustre-OST0002-osc-MDT0000.old_sync_processed
      osp.lustre-OST0003-osc-MDT0000.old_sync_processed
      osp.lustre-OST0004-osc-MDT0000.old_sync_processed
      osp.lustre-OST0005-osc-MDT0000.old_sync_processed
      osp.lustre-OST0006-osc-MDT0000.old_sync_processed
      wait 40 secs maximumly for trevis-20vm4 mds-ost sync done.
      CMD: trevis-20vm4 /usr/sbin/lctl get_param -n osp.*osc*.old_sync_processed
      sleep 5 for ZFS zfs
      Waiting for local destroys to complete
       replay-single test_89: @@@@@@ FAIL: 3072 blocks leaked 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5907:error()
        = /usr/lib64/lustre/tests/replay-single.sh:3329:test_89()
      

      There is nothing obviously wrong in the console logs.

      Attachments

        Issue Links

          Activity

            [LU-14201] replay-single test 89 fails with '3072 blocks leaked'

            Duplicate with LU-16271, but seems to be hitting more regularly. Only 5 failures in 2022-07 and 2022-08, but 16 in the past 4 weeks.

            adilger Andreas Dilger added a comment - Duplicate with LU-16271 , but seems to be hitting more regularly. Only 5 failures in 2022-07 and 2022-08, but 16 in the past 4 weeks.

            I had a quick look at this, and so far it is a one-off test failure. There was one other test_89 failure in the past month, but it looked quite different.

            This test is verifying that if a file is deleted across both an OSS and MDS restart that the space on the OSTs is released. In terms of severity, this is fairly low, in the sense that concurrent MDS and OSS failure is fairly rare, while also deleting files. At worst, some space on the OST would be leaked. It may also be that this is a test script issue (e.g. the delete didn't happen yet because "wait_delete_completed_mds()" didn't wait long enough).

            So I don't think it is a blocker for the 2.12.6 release, but we can keep an eye on whether it is being hit regularly.

            adilger Andreas Dilger added a comment - I had a quick look at this, and so far it is a one-off test failure. There was one other test_89 failure in the past month, but it looked quite different. This test is verifying that if a file is deleted across both an OSS and MDS restart that the space on the OSTs is released. In terms of severity, this is fairly low, in the sense that concurrent MDS and OSS failure is fairly rare, while also deleting files. At worst, some space on the OST would be leaked. It may also be that this is a test script issue (e.g. the delete didn't happen yet because " wait_delete_completed_mds() " didn't wait long enough). So I don't think it is a blocker for the 2.12.6 release, but we can keep an eye on whether it is being hit regularly.

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: