Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6951

sanity test_27m: @@@@@@ FAIL: OST0 was full but new created file still use it

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      sanity 27m fails with "OST0 was full but new created file still use it"
      if this test runs with more than 1 connected client. A simple reproducer
      is "MOUNT_2=true REFORMAT=true ONLY=27m bash sanity.sh".

      The reason seems to be grants. In my setup, every client initially
      gets 2 Mb grant which never shrinks to 0. When dd from the first
      client receives ENOSPC, it does not really mean the OST is filled up,
      since the client is not allowed to use other clients' grants. When
      creating a new file, the MDS still sees free space on OST0 equal to
      the amount of unused grants and allocates new objects on OST0.
      Eventually, the test reports failure.

      It's not clear if it's a defect in Lustre or it's a test defect and
      clients # > 1 case should be skipped. We would like to know your
      opinion. Thanks.

      Attachments

        Issue Links

          Activity

            [LU-6951] sanity test_27m: @@@@@@ FAIL: OST0 was full but new created file still use it

            No new patches are expected.

            panda Andrew Perepechko added a comment - No new patches are expected.

            should this be closed?

            cfaber#1 Colin Faber [X] (Inactive) added a comment - should this be closed?
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/23506/
            Subject: LU-6951 tests: sanity test_27m failure
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: fafb93b96179d5fd2c9bc83b19e512180a4f833e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/23506/ Subject: LU-6951 tests: sanity test_27m failure Project: fs/lustre-release Branch: master Current Patch Set: Commit: fafb93b96179d5fd2c9bc83b19e512180a4f833e

            Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: http://review.whamcloud.com/23506
            Subject: LU-6951 tests: sanity test_27m failure
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f192c6a38e4d429c9b52f10cf38c4a0519de270e

            gerrit Gerrit Updater added a comment - Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: http://review.whamcloud.com/23506 Subject: LU-6951 tests: sanity test_27m failure Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f192c6a38e4d429c9b52f10cf38c4a0519de270e

            The best solution would be to resurrect and fix the grant shrinking functionality, so that the idle client would return the grant back to the OST when space is running low.

            I see in LU-3859 that grant shrink was enabled in your code at one point, but I guess it still had some bugs at that time. The grant shrinking was disabled due to problems before 1.8.1 and 2.0.0 were released (https://bugzilla.lustre.org/show_bug.cgi?id=19507) and hasn't been fixed since then.

            I don't have any strong preference about whether you unmount the other client or just skip the test, but it is probably less complex to just skip the test in case of multiple mounts. The sanity.sh tests all assume that there is only a single client mounted, since multi-mount tests are run in sanityn.sh. I suspect there may be other test failures hit if you are running sanity.sh with multiple mounts.

            adilger Andreas Dilger added a comment - The best solution would be to resurrect and fix the grant shrinking functionality, so that the idle client would return the grant back to the OST when space is running low. I see in LU-3859 that grant shrink was enabled in your code at one point, but I guess it still had some bugs at that time. The grant shrinking was disabled due to problems before 1.8.1 and 2.0.0 were released ( https://bugzilla.lustre.org/show_bug.cgi?id=19507 ) and hasn't been fixed since then. I don't have any strong preference about whether you unmount the other client or just skip the test, but it is probably less complex to just skip the test in case of multiple mounts. The sanity.sh tests all assume that there is only a single client mounted, since multi-mount tests are run in sanityn.sh. I suspect there may be other test failures hit if you are running sanity.sh with multiple mounts.

            Alex, do you think we should skip this test if running with clients # > 1 or temporarilly unmount other clients for the test run?

            panda Andrew Perepechko added a comment - Alex, do you think we should skip this test if running with clients # > 1 or temporarilly unmount other clients for the test run?

            the test wasn't designed to run with few clients. that said, this problem with the grants is known and there was a try to solve it returning granted space with ping (implying no client's activity on this OST), but the functionality got disable due to issues.

            bzzz Alex Zhuravlev added a comment - the test wasn't designed to run with few clients. that said, this problem with the grants is known and there was a try to solve it returning granted space with ping (implying no client's activity on this OST), but the functionality got disable due to issues.

            People

              panda Andrew Perepechko
              panda Andrew Perepechko
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: