Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5473

Test failure sanity test_51b: test_51b failed: fnum: No space left on device

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.7.0, Lustre 2.5.3
    • 3
    • 15250

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/09a62af8-1feb-11e4-8610-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/b21a51b0-0aff-11e4-8dbb-5254006e85c2

      The sub-test test_51b failed with the following error:

      test_51b failed with 1

      Info required for matching: sanity 51b

      Attachments

        Activity

          [LU-5473] Test failure sanity test_51b: test_51b failed: fnum: No space left on device
          adilger Andreas Dilger added a comment - - edited

          The test_51b() code itself checks to see if the filesystem is reporting at least $NUMTEST free inodes on the MDT where the test directory is located, and at least 4kB of free space for each file. For creating NUMTEST=70000 empty directories this should consume about 272MB of space on a ZFS MDT, so it is surprising that we can't fit this on a 2GB MDT.

          I pushed http://review.whamcloud.com/12185 to add some debugging to this test to see why it is failing, and to collect more information about ZFS space usage per inode before and after the test on the MDT even if it is not failing.

          adilger Andreas Dilger added a comment - - edited The test_51b() code itself checks to see if the filesystem is reporting at least $NUMTEST free inodes on the MDT where the test directory is located, and at least 4kB of free space for each file. For creating NUMTEST=70000 empty directories this should consume about 272MB of space on a ZFS MDT, so it is surprising that we can't fit this on a 2GB MDT. I pushed http://review.whamcloud.com/12185 to add some debugging to this test to see why it is failing, and to collect more information about ZFS space usage per inode before and after the test on the MDT even if it is not failing.
          yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/test_sets/a333f7ac-419e-11e4-8023-5254006e85c2
          dmiter Dmitry Eremin (Inactive) added a comment - +1 https://testing.hpdd.intel.com/test_sets/9269b584-3d6a-11e4-af25-5254006e85c2
          yujian Jian Yu added a comment - One more instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sessions/7643a3c8-35c3-11e4-8a7f-5254006e85c2
          yujian Jian Yu added a comment -

          After increasing the MDSSIZE to 3GB, the tests also passed:
          https://testing.hpdd.intel.com/test_sessions/200dd82c-3648-11e4-81c9-5254006e85c2

          I created TEI-2623.

          yujian Jian Yu added a comment - After increasing the MDSSIZE to 3GB, the tests also passed: https://testing.hpdd.intel.com/test_sessions/200dd82c-3648-11e4-81c9-5254006e85c2 I created TEI-2623.
          yujian Jian Yu added a comment -

          After increasing MDSSIZE from 2GB to 4GB, the tests passed with FSTYPE=zfs and NETTYPE=o2ib:
          https://testing.hpdd.intel.com/test_sessions/c5442360-332c-11e4-a323-5254006e85c2

          I just changed the size to 3GB to see if the tests pass. And will create a TEI ticket.

          yujian Jian Yu added a comment - After increasing MDSSIZE from 2GB to 4GB, the tests passed with FSTYPE=zfs and NETTYPE=o2ib: https://testing.hpdd.intel.com/test_sessions/c5442360-332c-11e4-a323-5254006e85c2 I just changed the size to 3GB to see if the tests pass. And will create a TEI ticket.
          yujian Jian Yu added a comment -

          Test results showed that the same failure occurred on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib.

          Dmesg on MDS:

          LustreError: 15047:0:(osd_handler.c:211:osd_trans_start()) lustre-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low.
          

          Just increased the MDSSIZE to run the test again.

          yujian Jian Yu added a comment - Test results showed that the same failure occurred on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib. Dmesg on MDS: LustreError: 15047:0:(osd_handler.c:211:osd_trans_start()) lustre-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low. Just increased the MDSSIZE to run the test again.
          yujian Jian Yu added a comment -

          By searching on Maloo, I found that all of the failures occurred under the following configuration:

          MDSCOUNT=1
          MDSSIZE=2097152
          OSTCOUNT=2
          OSTSIZE=8388608
          NETTYPE=o2ib
          FSTYPE=zfs
          

          The same test passed on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=tcp.

          Here is a for-test-only patch to perform the test against Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib: http://review.whamcloud.com/11510

          yujian Jian Yu added a comment - By searching on Maloo, I found that all of the failures occurred under the following configuration: MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=8388608 NETTYPE=o2ib FSTYPE=zfs The same test passed on Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=tcp. Here is a for-test-only patch to perform the test against Lustre 2.5.2 with FSTYPE=zfs and NETTYPE=o2ib: http://review.whamcloud.com/11510
          bfaccini Bruno Faccini (Inactive) added a comment - +1 at https://testing.hpdd.intel.com/test_sets/890a458c-2675-11e4-a34b-5254006e85c2 .
          yujian Jian Yu added a comment -

          The same failure occurred while verifying patch http://review.whamcloud.com/11435 on Lustre b2_5 branch with FSTYPE=zfs over IB network:
          https://testing.hpdd.intel.com/test_sets/ae3f2c6a-261f-11e4-9fe5-5254006e85c2

          The configuration was:

          MDSCOUNT=1
          MDSSIZE=2097152
          OSTCOUNT=2
          OSTSIZE=8388608
          
          yujian Jian Yu added a comment - The same failure occurred while verifying patch http://review.whamcloud.com/11435 on Lustre b2_5 branch with FSTYPE=zfs over IB network: https://testing.hpdd.intel.com/test_sets/ae3f2c6a-261f-11e4-9fe5-5254006e85c2 The configuration was: MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=8388608
          green Oleg Drokin added a comment -

          I think this is some sort of test env issue?

          mkdir(/mnt/lustre/d51.sanity/d61732) error: No space left on device
          total: 61732 creates in 390.32 seconds: 158.16 creates/second
          /usr/lib64/lustre/tests/sanity.sh: line 4004: /mnt/lustre/d51.sanity/fnum: No space left on device
          
          green Oleg Drokin added a comment - I think this is some sort of test env issue? mkdir(/mnt/lustre/d51.sanity/d61732) error: No space left on device total: 61732 creates in 390.32 seconds: 158.16 creates/second /usr/lib64/lustre/tests/sanity.sh: line 4004: /mnt/lustre/d51.sanity/fnum: No space left on device

          People

            utopiabound Nathaniel Clark
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: