Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5473

Test failure sanity test_51b: test_51b failed: fnum: No space left on device

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.7.0, Lustre 2.5.3
    • 3
    • 15250

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/09a62af8-1feb-11e4-8610-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/b21a51b0-0aff-11e4-8dbb-5254006e85c2

      The sub-test test_51b failed with the following error:

      test_51b failed with 1

      Info required for matching: sanity 51b

      Attachments

        Activity

          [LU-5473] Test failure sanity test_51b: test_51b failed: fnum: No space left on device
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21821/
          Subject: LU-5473 tests: sanity/51b Account for ZFS inode size
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 15dd813536ad06a119dfb2358f00281eed22a98b

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21821/ Subject: LU-5473 tests: sanity/51b Account for ZFS inode size Project: fs/lustre-release Branch: master Current Patch Set: Commit: 15dd813536ad06a119dfb2358f00281eed22a98b

          11KB/inode does seem to be a good estimate for ZFS based custom runs from 21821

          utopiabound Nathaniel Clark added a comment - 11KB/inode does seem to be a good estimate for ZFS based custom runs from 21821

          Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/21821
          Subject: LU-5473 tests: Add debug to sanity/51b
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: d0f82be70ef01af4ccea83d870ddb4d9178690ac

          gerrit Gerrit Updater added a comment - Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/21821 Subject: LU-5473 tests: Add debug to sanity/51b Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d0f82be70ef01af4ccea83d870ddb4d9178690ac

          This test hasn't failed since 2015-06-15:
          https://testing.hpdd.intel.com/test_sets/b3e189ee-13ca-11e5-b4b0-5254006e85c2

          Same MDS error:

          15:41:02:LustreError: 19303:0:(osd_handler.c:209:osd_trans_start()) lustre-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low.
          
          utopiabound Nathaniel Clark added a comment - This test hasn't failed since 2015-06-15: https://testing.hpdd.intel.com/test_sets/b3e189ee-13ca-11e5-b4b0-5254006e85c2 Same MDS error: 15:41:02:LustreError: 19303:0:(osd_handler.c:209:osd_trans_start()) lustre-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low.

          The test is skipped (SLOW) on review-zfs, but it is still being run during full test runs, and is consistently passing on ZFS. The average space usage before the test run is about 11KB/inode on the MDT, and it reports plenty of free inodes before the test is passing, but the MDT filesystem is 3GB in size so it would have enough space for 95k inodes even at 32KB/inode that we were seeing before. I think 11KB/inode is reasonable for the number of actual files created in the filesystem at this point, since this includes all of the other filesystem overhead. It would be more useful to know the average per-inode space usage once all 70k files were created to get a better average and/or the differential usage for just those inodes.

          https://testing.hpdd.intel.com/sub_tests/7b8bcf2a-4c80-11e5-b77f-5254006e85c2

          == sanity test 51b: exceed 64k subdirectory nlink limit == 05:28:37 (1440480517)
          UUID                   1K-blocks        Used   Available Use% Mounted on
          lustre-MDT0000_UUID      3063424       12416     3048960   0% /mnt/lustre[MDT:0]
          lustre-OST0000_UUID      2031360        5632     2023680   0% /mnt/lustre[OST:0]
          lustre-OST0001_UUID      2031360        4608     2024704   0% /mnt/lustre[OST:1]
          lustre-OST0002_UUID      2031104        3968     2025088   0% /mnt/lustre[OST:2]
          lustre-OST0003_UUID      2031360        4224     2025088   0% /mnt/lustre[OST:3]
          lustre-OST0004_UUID      2031360        5760     2023552   0% /mnt/lustre[OST:4]
          lustre-OST0005_UUID      2031104        5888     2023168   0% /mnt/lustre[OST:5]
          lustre-OST0006_UUID      2031360        6912     2022400   0% /mnt/lustre[OST:6]
          
          filesystem summary:     14219008       36992    14167680   0% /mnt/lustre
          
          UUID                      Inodes       IUsed       IFree IUse% Mounted on
          lustre-MDT0000_UUID       162442        1363      161079   1% /mnt/lustre[MDT:0]
          lustre-OST0000_UUID        76956         423       76533   1% /mnt/lustre[OST:0]
          lustre-OST0001_UUID        73555         323       73232   0% /mnt/lustre[OST:1]
          lustre-OST0002_UUID        74679         320       74359   0% /mnt/lustre[OST:2]
          lustre-OST0003_UUID        74436         324       74112   0% /mnt/lustre[OST:3]
          lustre-OST0004_UUID        71288         323       70965   0% /mnt/lustre[OST:4]
          lustre-OST0005_UUID        70901         321       70580   0% /mnt/lustre[OST:5]
          lustre-OST0006_UUID        69146         322       68824   0% /mnt/lustre[OST:6]
          
          filesystem summary:       162442        1363      161079   1% /mnt/lustre
          

          The osd_statfs->osd_objs_count_estimate() information that is being computed for ZFS is using about 21KB/inode for its free inodes estimate, which is conservative but reasonable given how few inodes are actually in use at this point.

          It wouldn't be terrible to print another lfs df and lfs df -i after the test, regardless of pass/fail result, to see what the average space usage is on ZFS, and then if it is still reasonable (i.e. going down from 11KB/inode) this bug could be closed.

          adilger Andreas Dilger added a comment - The test is skipped (SLOW) on review-zfs, but it is still being run during full test runs, and is consistently passing on ZFS. The average space usage before the test run is about 11KB/inode on the MDT, and it reports plenty of free inodes before the test is passing, but the MDT filesystem is 3GB in size so it would have enough space for 95k inodes even at 32KB/inode that we were seeing before. I think 11KB/inode is reasonable for the number of actual files created in the filesystem at this point, since this includes all of the other filesystem overhead. It would be more useful to know the average per-inode space usage once all 70k files were created to get a better average and/or the differential usage for just those inodes. https://testing.hpdd.intel.com/sub_tests/7b8bcf2a-4c80-11e5-b77f-5254006e85c2 == sanity test 51b: exceed 64k subdirectory nlink limit == 05:28:37 (1440480517) UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 3063424 12416 3048960 0% /mnt/lustre[MDT:0] lustre-OST0000_UUID 2031360 5632 2023680 0% /mnt/lustre[OST:0] lustre-OST0001_UUID 2031360 4608 2024704 0% /mnt/lustre[OST:1] lustre-OST0002_UUID 2031104 3968 2025088 0% /mnt/lustre[OST:2] lustre-OST0003_UUID 2031360 4224 2025088 0% /mnt/lustre[OST:3] lustre-OST0004_UUID 2031360 5760 2023552 0% /mnt/lustre[OST:4] lustre-OST0005_UUID 2031104 5888 2023168 0% /mnt/lustre[OST:5] lustre-OST0006_UUID 2031360 6912 2022400 0% /mnt/lustre[OST:6] filesystem summary: 14219008 36992 14167680 0% /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 162442 1363 161079 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 76956 423 76533 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 73555 323 73232 0% /mnt/lustre[OST:1] lustre-OST0002_UUID 74679 320 74359 0% /mnt/lustre[OST:2] lustre-OST0003_UUID 74436 324 74112 0% /mnt/lustre[OST:3] lustre-OST0004_UUID 71288 323 70965 0% /mnt/lustre[OST:4] lustre-OST0005_UUID 70901 321 70580 0% /mnt/lustre[OST:5] lustre-OST0006_UUID 69146 322 68824 0% /mnt/lustre[OST:6] filesystem summary: 162442 1363 161079 1% /mnt/lustre The osd_statfs->osd_objs_count_estimate() information that is being computed for ZFS is using about 21KB/inode for its free inodes estimate, which is conservative but reasonable given how few inodes are actually in use at this point. It wouldn't be terrible to print another lfs df and lfs df -i after the test, regardless of pass/fail result, to see what the average space usage is on ZFS, and then if it is still reasonable (i.e. going down from 11KB/inode) this bug could be closed.
          pjones Peter Jones added a comment -

          Nathaniel

          Isaac thinks that this failure is due to a flaw in the test script but does not have the bandwidth to dig into to it atm. Are you able to investigate?

          Thanks

          Peter

          pjones Peter Jones added a comment - Nathaniel Isaac thinks that this failure is due to a flaw in the test script but does not have the bandwidth to dig into to it atm. Are you able to investigate? Thanks Peter

          Only a debug patch was landed, and the test was skipped. The core problem is not fixed.

          adilger Andreas Dilger added a comment - Only a debug patch was landed, and the test was skipped. The core problem is not fixed.

          Patch landed to Master.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12185/
          Subject: LU-5473 tests: print space usage in sanity test_51b
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6e45c6d3ae4c46a0312bbb95b7e9ff09761f037d

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12185/ Subject: LU-5473 tests: print space usage in sanity test_51b Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6e45c6d3ae4c46a0312bbb95b7e9ff09761f037d

          People

            utopiabound Nathaniel Clark
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: