Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5473

Test failure sanity test_51b: test_51b failed: fnum: No space left on device

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.7.0, Lustre 2.5.3
    • 3
    • 15250

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/09a62af8-1feb-11e4-8610-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/b21a51b0-0aff-11e4-8dbb-5254006e85c2

      The sub-test test_51b failed with the following error:

      test_51b failed with 1

      Info required for matching: sanity 51b

      Attachments

        Activity

          [LU-5473] Test failure sanity test_51b: test_51b failed: fnum: No space left on device
          pjones Peter Jones added a comment -

          Nathaniel

          Isaac thinks that this failure is due to a flaw in the test script but does not have the bandwidth to dig into to it atm. Are you able to investigate?

          Thanks

          Peter

          pjones Peter Jones added a comment - Nathaniel Isaac thinks that this failure is due to a flaw in the test script but does not have the bandwidth to dig into to it atm. Are you able to investigate? Thanks Peter

          Only a debug patch was landed, and the test was skipped. The core problem is not fixed.

          adilger Andreas Dilger added a comment - Only a debug patch was landed, and the test was skipped. The core problem is not fixed.

          Patch landed to Master.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12185/
          Subject: LU-5473 tests: print space usage in sanity test_51b
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6e45c6d3ae4c46a0312bbb95b7e9ff09761f037d

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12185/ Subject: LU-5473 tests: print space usage in sanity test_51b Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6e45c6d3ae4c46a0312bbb95b7e9ff09761f037d

          Is there any way that we can figure out why the MDS is consuming 32KB per inode? I now recall a patch landing:

          commit 47c0b97421b21dab686b05d6bf829ebcaf62d5db
          Author: Isaac Huang <he.huang@intel.com>
          Date:   Tue Jul 22 16:42:03 2014 -0600
          
              LU-5391 osd-zfs: ZAP object block sizes too small
              
              Currently osd-zfs ZAP objects use 4K for both leaf
              and indirect blocks. This patch increases:
              - leaf block to 16K, which equals ZFS fzap_default_block_shift
              - indirect block to 16K, the default used by ZPL directories
              
              Signed-off-by: Isaac Huang <he.huang@intel.com>
              Change-Id: I5b476414d27822a14afb25e1307991fbd2e3a59e
              Reviewed-on: http://review.whamcloud.com/11182
          

          which might account for some of this. If the ZAP leaf and indirect blocks are being updated randomly due to many file creations, it may be that the indirect blocks are taking up a lot of space in the metadnode and saved snapshots?

          adilger Andreas Dilger added a comment - Is there any way that we can figure out why the MDS is consuming 32KB per inode? I now recall a patch landing: commit 47c0b97421b21dab686b05d6bf829ebcaf62d5db Author: Isaac Huang <he.huang@intel.com> Date: Tue Jul 22 16:42:03 2014 -0600 LU-5391 osd-zfs: ZAP object block sizes too small Currently osd-zfs ZAP objects use 4K for both leaf and indirect blocks. This patch increases: - leaf block to 16K, which equals ZFS fzap_default_block_shift - indirect block to 16K, the default used by ZPL directories Signed-off-by: Isaac Huang <he.huang@intel.com> Change-Id: I5b476414d27822a14afb25e1307991fbd2e3a59e Reviewed-on: http://review.whamcloud.com/11182 which might account for some of this. If the ZAP leaf and indirect blocks are being updated randomly due to many file creations, it may be that the indirect blocks are taking up a lot of space in the metadnode and saved snapshots?
          pjones Peter Jones added a comment -

          Isaac

          What do you suggest here?

          Peter

          pjones Peter Jones added a comment - Isaac What do you suggest here? Peter
          adilger Andreas Dilger added a comment - - edited

          The failing test run shows about 32KB of space used per inode on the MDT both before and after the test is run. This is more than I would have expected, which is about 4KB per ZFS inode. It is expected that the free space and free inodes would run out at the same time on a ZFS filesystem, since the ZFS inodes are not preallocated as they are on ldiskfs. The total number of inodes and the number of free inodes is an estimate that is based on the space used and number of inodes used (average bytes per inode) and the number of free blocks.

          It would be interesting to see what the average space used per inode is for other ZFS filesystems.

          adilger Andreas Dilger added a comment - - edited The failing test run shows about 32KB of space used per inode on the MDT both before and after the test is run. This is more than I would have expected, which is about 4KB per ZFS inode. It is expected that the free space and free inodes would run out at the same time on a ZFS filesystem, since the ZFS inodes are not preallocated as they are on ldiskfs. The total number of inodes and the number of free inodes is an estimate that is based on the space used and number of inodes used (average bytes per inode) and the number of free blocks. It would be interesting to see what the average space used per inode is for other ZFS filesystems.

          Yu Jian wrote in http://review.whamcloud.com/12185:

          > Tests received by maloo, run on CentOS release 6.5/x86_64: (https://maloo.whamcloud.com/test_sessions/723d3cb6-5bee-11e4-a35f-5254006e85c2). Ran 3 tests. 1 tests failed: sanity.
          The failure was reproduced over IB network.

           mkdir(/mnt/lustre/d51.sanity/d62645) error: No space left on device
           UUID                   1K-blocks        Used   Available Use% Mounted on
           lustre-MDT0000_UUID      2029824     2029824           0 100% /mnt/lustre[MDT:0]
           UUID                      Inodes       IUsed       IFree IUse% Mounted on
           lustre-MDT0000_UUID        63039       63039           0 100% /mnt/lustre[MDT:0]
          

          Both space and inodes were consumed 100%.
          Before running createmany, the space and inodes were:

           UUID                   1K-blocks        Used   Available Use% Mounted on
           lustre-MDT0000_UUID      2031488        6784     2022656   0% /mnt/lustre[MDT:0]
           UUID                      Inodes       IUsed       IFree IUse% Mounted on
           lustre-MDT0000_UUID       228934         191      228743   0% /mnt/lustre[MDT:0]
          

          It was strange that the total inodes number was reduced from 228934 to 63039 .

          > Tests received by maloo, run on CentOS release 6.5/x86_64: (https://maloo.whamcloud.com/test_sessions/7622fce2-5c81-11e4-b08a-5254006e85c2). Ran 3 tests. No failures.
          With TCP network (other test parameters were the same), the same test passed.
          Before running createmany, the space and inodes were:

           UUID                   1K-blocks        Used   Available Use% Mounted on
           lustre-MDT0000_UUID      2031872        4096     2025728   0% /mnt/lustre[MDT:0]
           UUID                      Inodes       IUsed       IFree IUse% Mounted on
           lustre-MDT0000_UUID       300941         191      300750   0% /mnt/lustre[MDT:0]
          
          adilger Andreas Dilger added a comment - Yu Jian wrote in http://review.whamcloud.com/12185: > Tests received by maloo, run on CentOS release 6.5/x86_64: ( https://maloo.whamcloud.com/test_sessions/723d3cb6-5bee-11e4-a35f-5254006e85c2 ). Ran 3 tests. 1 tests failed: sanity. The failure was reproduced over IB network. mkdir(/mnt/lustre/d51.sanity/d62645) error: No space left on device UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2029824 2029824 0 100% /mnt/lustre[MDT:0] UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 63039 63039 0 100% /mnt/lustre[MDT:0] Both space and inodes were consumed 100%. Before running createmany, the space and inodes were: UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2031488 6784 2022656 0% /mnt/lustre[MDT:0] UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 228934 191 228743 0% /mnt/lustre[MDT:0] It was strange that the total inodes number was reduced from 228934 to 63039 . > Tests received by maloo, run on CentOS release 6.5/x86_64: ( https://maloo.whamcloud.com/test_sessions/7622fce2-5c81-11e4-b08a-5254006e85c2 ). Ran 3 tests. No failures. With TCP network (other test parameters were the same), the same test passed. Before running createmany, the space and inodes were: UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2031872 4096 2025728 0% /mnt/lustre[MDT:0] UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 300941 191 300750 0% /mnt/lustre[MDT:0]
          adilger Andreas Dilger added a comment - - edited

          The test_51b() code itself checks to see if the filesystem is reporting at least $NUMTEST free inodes on the MDT where the test directory is located, and at least 4kB of free space for each file. For creating NUMTEST=70000 empty directories this should consume about 272MB of space on a ZFS MDT, so it is surprising that we can't fit this on a 2GB MDT.

          I pushed http://review.whamcloud.com/12185 to add some debugging to this test to see why it is failing, and to collect more information about ZFS space usage per inode before and after the test on the MDT even if it is not failing.

          adilger Andreas Dilger added a comment - - edited The test_51b() code itself checks to see if the filesystem is reporting at least $NUMTEST free inodes on the MDT where the test directory is located, and at least 4kB of free space for each file. For creating NUMTEST=70000 empty directories this should consume about 272MB of space on a ZFS MDT, so it is surprising that we can't fit this on a 2GB MDT. I pushed http://review.whamcloud.com/12185 to add some debugging to this test to see why it is failing, and to collect more information about ZFS space usage per inode before and after the test on the MDT even if it is not failing.
          yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/test_sets/a333f7ac-419e-11e4-8023-5254006e85c2
          dmiter Dmitry Eremin (Inactive) added a comment - +1 https://testing.hpdd.intel.com/test_sets/9269b584-3d6a-11e4-af25-5254006e85c2

          People

            utopiabound Nathaniel Clark
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: