Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4223

conf-sanity test_32c, test_32d: could not find any free loop device

Details

    • 3
    • 11491

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/751ee23a-4106-11e3-a1e8-52540035b04c.

      The sub-test test_32c failed with the following error:

      test_32c failed with 1

      Info required for matching: conf-sanity 32c

      Attachments

        Issue Links

          Activity

            [LU-4223] conf-sanity test_32c, test_32d: could not find any free loop device

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13265/
            Subject: LU-4223 tests: fix conf-sanity test_32 typo
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5da53bcb1d8c38157325505f6619d6b6c3d4db6a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13265/ Subject: LU-4223 tests: fix conf-sanity test_32 typo Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5da53bcb1d8c38157325505f6619d6b6c3d4db6a

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13265
            Subject: LU-4223 tests: fix conf-sanity test_32 typo
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 05a59163d8427f0c77d3aa6b31486b2219fb49d2

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13265 Subject: LU-4223 tests: fix conf-sanity test_32 typo Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 05a59163d8427f0c77d3aa6b31486b2219fb49d2
            yujian Jian Yu added a comment -

            Landed for Lustre 2.5.1.

            yujian Jian Yu added a comment - Landed for Lustre 2.5.1.
            yujian Jian Yu added a comment -

            http://review.whamcloud.com/8409

            The above patch was not cherry-picked to Lustre b2_5 branch.

            The same failure occurred on Lustre b2_5 build #5:
            https://maloo.whamcloud.com/test_sets/16cd0f30-7497-11e3-8b21-52540035b04c

            Here is the back-ported patch on Lustre b2_5 branch: http://review.whamcloud.com/8723

            yujian Jian Yu added a comment - http://review.whamcloud.com/8409 The above patch was not cherry-picked to Lustre b2_5 branch. The same failure occurred on Lustre b2_5 build #5: https://maloo.whamcloud.com/test_sets/16cd0f30-7497-11e3-8b21-52540035b04c Here is the back-ported patch on Lustre b2_5 branch: http://review.whamcloud.com/8723

            False alarm - it is actually LU-4358 that is being hit, but is misdiagnosed as LU-4223.

            I'm closing this bug since it looks like it is not being hit in recent test runs.

            adilger Andreas Dilger added a comment - False alarm - it is actually LU-4358 that is being hit, but is misdiagnosed as LU-4223 . I'm closing this bug since it looks like it is not being hit in recent test runs.
            adilger Andreas Dilger added a comment - This patch was landed on 2013-12-09 but conf-sanity is still reporting this bug for failures: https://maloo.whamcloud.com/test_sets/7d7a27aa-66ae-11e3-93e2-52540035b04c https://maloo.whamcloud.com/test_sets/202cbd9e-66c5-11e3-93e2-52540035b04c https://maloo.whamcloud.com/test_sets/868c4f72-66fd-11e3-a234-52540035b04c and others.
            di.wang Di Wang added a comment -

            John, yes, this makes sense, I updated the patch, and please have a look.

            di.wang Di Wang added a comment - John, yes, this makes sense, I updated the patch, and please have a look.
            jhammond John Hammond added a comment -

            Hi Di,

            1. There are a few other places in lustre/utils/ where popen() if followed by fclose(), including one in is_e2fsprogs_feature_supp(). We should fix those too.
            1. This may not be enough. On my RHEL 6.4 kernel (2.6.32-358.18.1.el6.lustre.x86_64) if there are IOs still in flight to a loop device then it cannot be detached:
              # dd if=/dev/zero of=/tmp/0 bs=1M count=100
              100+0 records in
              100+0 records out
              104857600 bytes (105 MB) copied, 0.195863 s, 535 MB/s
              # losetup -f /tmp/0
              # losetup -a
              /dev/loop0: [fc01]:1055063 (/tmp/0)
              # dd if=/dev/zero of=/dev/loop0 bs=1M count=100; losetup -d /dev/loop0
              100+0 records in
              100+0 records out
              104857600 bytes (105 MB) copied, 0.15608 s, 672 MB/s
              loop: can't delete device /dev/loop0: Device or resource busy
              # losetup -a
              /dev/loop0: [fc01]:1055063 (/tmp/0)
              # losetup -d /dev/loop0
              #
              # losetup /dev/loop0 /tmp/0; dd if=/dev/zero of=/dev/loop0 oflag=sync bs=1M count=100; losetup -d /dev/loop0
              100+0 records in
              100+0 records out
              104857600 bytes (105 MB) copied, 3.72073 s, 28.2 MB/s
              loop: can't delete device /dev/loop0: Device or resource busy
              

              As you see above, asking for sync IO to the loop device is not enough. So we probably need to add some wait retry logic to loop_cleanup().

            jhammond John Hammond added a comment - Hi Di, There are a few other places in lustre/utils/ where popen() if followed by fclose(), including one in is_e2fsprogs_feature_supp(). We should fix those too. This may not be enough. On my RHEL 6.4 kernel (2.6.32-358.18.1.el6.lustre.x86_64) if there are IOs still in flight to a loop device then it cannot be detached: # dd if=/dev/zero of=/tmp/0 bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.195863 s, 535 MB/s # losetup -f /tmp/0 # losetup -a /dev/loop0: [fc01]:1055063 (/tmp/0) # dd if=/dev/zero of=/dev/loop0 bs=1M count=100; losetup -d /dev/loop0 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.15608 s, 672 MB/s loop: can't delete device /dev/loop0: Device or resource busy # losetup -a /dev/loop0: [fc01]:1055063 (/tmp/0) # losetup -d /dev/loop0 # # losetup /dev/loop0 /tmp/0; dd if=/dev/zero of=/dev/loop0 oflag=sync bs=1M count=100; losetup -d /dev/loop0 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 3.72073 s, 28.2 MB/s loop: can't delete device /dev/loop0: Device or resource busy As you see above, asking for sync IO to the loop device is not enough. So we probably need to add some wait retry logic to loop_cleanup().
            di.wang Di Wang added a comment - http://review.whamcloud.com/8409
            di.wang Di Wang added a comment -

            John: I think you are right, and it should use pclose, instead of fclose. Good catch!

            di.wang Di Wang added a comment - John: I think you are right, and it should use pclose, instead of fclose. Good catch!

            People

              di.wang Di Wang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: