Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5473

Test failure sanity test_51b: test_51b failed: fnum: No space left on device

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.7.0, Lustre 2.5.3
    • 3
    • 15250

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/09a62af8-1feb-11e4-8610-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/b21a51b0-0aff-11e4-8dbb-5254006e85c2

      The sub-test test_51b failed with the following error:

      test_51b failed with 1

      Info required for matching: sanity 51b

      Attachments

        Activity

          [LU-5473] Test failure sanity test_51b: test_51b failed: fnum: No space left on device
          adilger Andreas Dilger added a comment - - edited

          The failing test run shows about 32KB of space used per inode on the MDT both before and after the test is run. This is more than I would have expected, which is about 4KB per ZFS inode. It is expected that the free space and free inodes would run out at the same time on a ZFS filesystem, since the ZFS inodes are not preallocated as they are on ldiskfs. The total number of inodes and the number of free inodes is an estimate that is based on the space used and number of inodes used (average bytes per inode) and the number of free blocks.

          It would be interesting to see what the average space used per inode is for other ZFS filesystems.

          adilger Andreas Dilger added a comment - - edited The failing test run shows about 32KB of space used per inode on the MDT both before and after the test is run. This is more than I would have expected, which is about 4KB per ZFS inode. It is expected that the free space and free inodes would run out at the same time on a ZFS filesystem, since the ZFS inodes are not preallocated as they are on ldiskfs. The total number of inodes and the number of free inodes is an estimate that is based on the space used and number of inodes used (average bytes per inode) and the number of free blocks. It would be interesting to see what the average space used per inode is for other ZFS filesystems.

          Yu Jian wrote in http://review.whamcloud.com/12185:

          > Tests received by maloo, run on CentOS release 6.5/x86_64: (https://maloo.whamcloud.com/test_sessions/723d3cb6-5bee-11e4-a35f-5254006e85c2). Ran 3 tests. 1 tests failed: sanity.
          The failure was reproduced over IB network.

           mkdir(/mnt/lustre/d51.sanity/d62645) error: No space left on device
           UUID                   1K-blocks        Used   Available Use% Mounted on
           lustre-MDT0000_UUID      2029824     2029824           0 100% /mnt/lustre[MDT:0]
           UUID                      Inodes       IUsed       IFree IUse% Mounted on
           lustre-MDT0000_UUID        63039       63039           0 100% /mnt/lustre[MDT:0]
          

          Both space and inodes were consumed 100%.
          Before running createmany, the space and inodes were:

           UUID                   1K-blocks        Used   Available Use% Mounted on
           lustre-MDT0000_UUID      2031488        6784     2022656   0% /mnt/lustre[MDT:0]
           UUID                      Inodes       IUsed       IFree IUse% Mounted on
           lustre-MDT0000_UUID       228934         191      228743   0% /mnt/lustre[MDT:0]
          

          It was strange that the total inodes number was reduced from 228934 to 63039 .

          > Tests received by maloo, run on CentOS release 6.5/x86_64: (https://maloo.whamcloud.com/test_sessions/7622fce2-5c81-11e4-b08a-5254006e85c2). Ran 3 tests. No failures.
          With TCP network (other test parameters were the same), the same test passed.
          Before running createmany, the space and inodes were:

           UUID                   1K-blocks        Used   Available Use% Mounted on
           lustre-MDT0000_UUID      2031872        4096     2025728   0% /mnt/lustre[MDT:0]
           UUID                      Inodes       IUsed       IFree IUse% Mounted on
           lustre-MDT0000_UUID       300941         191      300750   0% /mnt/lustre[MDT:0]
          
          adilger Andreas Dilger added a comment - Yu Jian wrote in http://review.whamcloud.com/12185: > Tests received by maloo, run on CentOS release 6.5/x86_64: ( https://maloo.whamcloud.com/test_sessions/723d3cb6-5bee-11e4-a35f-5254006e85c2 ). Ran 3 tests. 1 tests failed: sanity. The failure was reproduced over IB network. mkdir(/mnt/lustre/d51.sanity/d62645) error: No space left on device UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2029824 2029824 0 100% /mnt/lustre[MDT:0] UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 63039 63039 0 100% /mnt/lustre[MDT:0] Both space and inodes were consumed 100%. Before running createmany, the space and inodes were: UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2031488 6784 2022656 0% /mnt/lustre[MDT:0] UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 228934 191 228743 0% /mnt/lustre[MDT:0] It was strange that the total inodes number was reduced from 228934 to 63039 . > Tests received by maloo, run on CentOS release 6.5/x86_64: ( https://maloo.whamcloud.com/test_sessions/7622fce2-5c81-11e4-b08a-5254006e85c2 ). Ran 3 tests. No failures. With TCP network (other test parameters were the same), the same test passed. Before running createmany, the space and inodes were: UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 2031872 4096 2025728 0% /mnt/lustre[MDT:0] UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 300941 191 300750 0% /mnt/lustre[MDT:0]
          adilger Andreas Dilger added a comment - - edited

          The test_51b() code itself checks to see if the filesystem is reporting at least $NUMTEST free inodes on the MDT where the test directory is located, and at least 4kB of free space for each file. For creating NUMTEST=70000 empty directories this should consume about 272MB of space on a ZFS MDT, so it is surprising that we can't fit this on a 2GB MDT.

          I pushed http://review.whamcloud.com/12185 to add some debugging to this test to see why it is failing, and to collect more information about ZFS space usage per inode before and after the test on the MDT even if it is not failing.

          adilger Andreas Dilger added a comment - - edited The test_51b() code itself checks to see if the filesystem is reporting at least $NUMTEST free inodes on the MDT where the test directory is located, and at least 4kB of free space for each file. For creating NUMTEST=70000 empty directories this should consume about 272MB of space on a ZFS MDT, so it is surprising that we can't fit this on a 2GB MDT. I pushed http://review.whamcloud.com/12185 to add some debugging to this test to see why it is failing, and to collect more information about ZFS space usage per inode before and after the test on the MDT even if it is not failing.
          yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/test_sets/a333f7ac-419e-11e4-8023-5254006e85c2
          dmiter Dmitry Eremin (Inactive) added a comment - +1 https://testing.hpdd.intel.com/test_sets/9269b584-3d6a-11e4-af25-5254006e85c2
          yujian Jian Yu added a comment - One more instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sessions/7643a3c8-35c3-11e4-8a7f-5254006e85c2
          yujian Jian Yu added a comment -

          After increasing the MDSSIZE to 3GB, the tests also passed:
          https://testing.hpdd.intel.com/test_sessions/200dd82c-3648-11e4-81c9-5254006e85c2

          I created TEI-2623.

          yujian Jian Yu added a comment - After increasing the MDSSIZE to 3GB, the tests also passed: https://testing.hpdd.intel.com/test_sessions/200dd82c-3648-11e4-81c9-5254006e85c2 I created TEI-2623.
          yujian Jian Yu added a comment -

          After increasing MDSSIZE from 2GB to 4GB, the tests passed with FSTYPE=zfs and NETTYPE=o2ib:
          https://testing.hpdd.intel.com/test_sessions/c5442360-332c-11e4-a323-5254006e85c2

          I just changed the size to 3GB to see if the tests pass. And will create a TEI ticket.

          yujian Jian Yu added a comment - After increasing MDSSIZE from 2GB to 4GB, the tests passed with FSTYPE=zfs and NETTYPE=o2ib: https://testing.hpdd.intel.com/test_sessions/c5442360-332c-11e4-a323-5254006e85c2 I just changed the size to 3GB to see if the tests pass. And will create a TEI ticket.
          yujian Jian Yu added a comment -

          Test results showed that the same failure occurred on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib.

          Dmesg on MDS:

          LustreError: 15047:0:(osd_handler.c:211:osd_trans_start()) lustre-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low.
          

          Just increased the MDSSIZE to run the test again.

          yujian Jian Yu added a comment - Test results showed that the same failure occurred on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib. Dmesg on MDS: LustreError: 15047:0:(osd_handler.c:211:osd_trans_start()) lustre-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low. Just increased the MDSSIZE to run the test again.
          yujian Jian Yu added a comment -

          By searching on Maloo, I found that all of the failures occurred under the following configuration:

          MDSCOUNT=1
          MDSSIZE=2097152
          OSTCOUNT=2
          OSTSIZE=8388608
          NETTYPE=o2ib
          FSTYPE=zfs
          

          The same test passed on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=tcp.

          Here is a for-test-only patch to perform the test against Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib: http://review.whamcloud.com/11510

          yujian Jian Yu added a comment - By searching on Maloo, I found that all of the failures occurred under the following configuration: MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=8388608 NETTYPE=o2ib FSTYPE=zfs The same test passed on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=tcp. Here is a for-test-only patch to perform the test against Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib: http://review.whamcloud.com/11510
          bfaccini Bruno Faccini (Inactive) added a comment - +1 at https://testing.hpdd.intel.com/test_sets/890a458c-2675-11e4-a34b-5254006e85c2 .

          People

            utopiabound Nathaniel Clark
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: