Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5765

sanity test_123a test_123b: rm: no such file or directory

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.7.0
    • 3
    • 16184

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      In sanity.sh test_123a and test_123b a large number of errors are being reported when "rm -r" is running:

      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity2': No such file or directory
      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity5': No such file or directory
      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity8': No such file or directory
      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity11': No such file or directory
      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity12': No such file or directory
      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity16': No such file or directory
      rm: cannot remove `/mnt/lustre/d123a.sanity/f123a.sanity18': No such file or directory
      

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/186592e2-5577-11e4-8542-5254006e85c2.

      Info required for matching: sanity 123a
      Info required for matching: sanity 123b

      Attachments

        Issue Links

          Activity

            [LU-5765] sanity test_123a test_123b: rm: no such file or directory

            Haven't seen this again for the past 4 weeks.

            Closing it again. Maybe LU-6101 will fix the final trigger.

            adilger Andreas Dilger added a comment - Haven't seen this again for the past 4 weeks. Closing it again. Maybe LU-6101 will fix the final trigger.

            This might also be related to LU-6101, which also has a patch.

            adilger Andreas Dilger added a comment - This might also be related to LU-6101 , which also has a patch.
            adilger Andreas Dilger added a comment - Saw this again on a recent patch run: https://testing.hpdd.intel.com/test_sets/e55f7412-881b-11e4-aa28-5254006e85c2 https://testing.hpdd.intel.com/test_sets/c69c83ca-87f9-11e4-a70f-5254006e85c2

            Is it possible that this problem is the same as LU-3573 and was fixed by http://review.whamcloud.com/12904 ?

            The most recent failure https://testing.hpdd.intel.com/test_sets/97ea0fd0-84f6-11e4-a60f-5254006e85c2 was on a patch based on a tree that doesn't contain the LU-3573 fix.

            Before that, the most recent test failure was https://testing.hpdd.intel.com/test_sets/863b9434-fcb2-11e2-9222-52540035b04c on 2014-08-03.

            adilger Andreas Dilger added a comment - Is it possible that this problem is the same as LU-3573 and was fixed by http://review.whamcloud.com/12904 ? The most recent failure https://testing.hpdd.intel.com/test_sets/97ea0fd0-84f6-11e4-a60f-5254006e85c2 was on a patch based on a tree that doesn't contain the LU-3573 fix. Before that, the most recent test failure was https://testing.hpdd.intel.com/test_sets/863b9434-fcb2-11e2-9222-52540035b04c on 2014-08-03.

            In one test today, test_123a took 18127 seconds to complete (more than 100x the usual time on the same hw), and there were 11575754 "no such file" errors in the test log.

            isaac Isaac Huang (Inactive) added a comment - In one test today, test_123a took 18127 seconds to complete (more than 100x the usual time on the same hw), and there were 11575754 "no such file" errors in the test log.

            With a build on master, I'm now able to reproduce it almost 100%. I'll look into it.

            isaac Isaac Huang (Inactive) added a comment - With a build on master, I'm now able to reproduce it almost 100%. I'll look into it.

            Isaac, another possible condition is that our Maloo test clusters run a lots VMs on single node, so the system load on Maloo clusters should be much higher than our personal test environment. So if we can simulate the similar test environment as Maloo clusters do, then it may be helpful to reproduce the issue locally.

            yong.fan nasf (Inactive) added a comment - Isaac, another possible condition is that our Maloo test clusters run a lots VMs on single node, so the system load on Maloo clusters should be much higher than our personal test environment. So if we can simulate the similar test environment as Maloo clusters do, then it may be helpful to reproduce the issue locally.

            People

              isaac Isaac Huang (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: