Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2776

Test failure: sanityn, subtest test_51a "multiop is still there"

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0, Lustre 2.11.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • 6729

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/5c072382-70f7-11e2-9241-52540035b04c.

      The sub-test test_51a failed with the following error:

      multiop is still there

      Info required for matching: sanityn 51a

      Attachments

        Activity

          [LU-2776] Test failure: sanityn, subtest test_51a "multiop is still there"

          Another occurrence:

          https://maloo.whamcloud.com/test_sets/3c2b25fe-e08c-11e2-b3fd-52540035b04c

          test_51a 	
          
              Error: 'multiop is still there'
              Failure Rate: 28.00% of last 100 executions [all branches] 
          
          

          This seems to be hitting a bit.

          keith Keith Mannthey (Inactive) added a comment - Another occurrence: https://maloo.whamcloud.com/test_sets/3c2b25fe-e08c-11e2-b3fd-52540035b04c test_51a Error: 'multiop is still there' Failure Rate: 28.00% of last 100 executions [all branches] This seems to be hitting a bit.

          Just hit one more occurence in https://maloo.whamcloud.com/test_sets/aa92a7ca-de1a-11e2-b04c-52540035b04c, with the new sleep 1s from the patch.

          I am wondering if multiop is stuck somewhere, is there a way we get a full backtraces dump from this specific test, like for time-outs ??

          bfaccini Bruno Faccini (Inactive) added a comment - Just hit one more occurence in https://maloo.whamcloud.com/test_sets/aa92a7ca-de1a-11e2-b04c-52540035b04c , with the new sleep 1s from the patch. I am wondering if multiop is stuck somewhere, is there a way we get a full backtraces dump from this specific test, like for time-outs ??

          let's reopen it.

          jay Jinshan Xiong (Inactive) added a comment - let's reopen it.

          Even though this bug is closed I think I see a fresh instance in https://maloo.whamcloud.com/test_sets/7548a8bc-9fcc-11e2-86dc-52540035b04c

          Is this the same bug, or should I open a fresh bug report?

          bogl Bob Glossman (Inactive) added a comment - Even though this bug is closed I think I see a fresh instance in https://maloo.whamcloud.com/test_sets/7548a8bc-9fcc-11e2-86dc-52540035b04c Is this the same bug, or should I open a fresh bug report?
          pjones Peter Jones added a comment -

          Landed for 2.4

          pjones Peter Jones added a comment - Landed for 2.4
          jay Jinshan Xiong (Inactive) added a comment - patch is at: http://review.whamcloud.com/5321

          Hi Andreas, it turns out your guess is right. From the log, the read was started after sleeping 2 seconds. I will fix it.

          jay Jinshan Xiong (Inactive) added a comment - Hi Andreas, it turns out your guess is right. From the log, the read was started after sleeping 2 seconds. I will fix it.

          I was going to say that this test seemed racy, and it passes on my local VMs.

          utopiabound Nathaniel Clark added a comment - I was going to say that this test seemed racy, and it passes on my local VMs.

          My first guess is that this test might be racy. There is only a 0.1s margin for multiop to

          {fork, exec, sleep, wake, read}

          while the "dd" is ongoing, so this could fail on occasion. Could you try bumping this margin to 0.5s, and hopefully this race will disappear.

          Jinshan, could you please comment on the intent of this test? Will increasing the margin ruin the test?

          adilger Andreas Dilger added a comment - My first guess is that this test might be racy. There is only a 0.1s margin for multiop to {fork, exec, sleep, wake, read} while the "dd" is ongoing, so this could fail on occasion. Could you try bumping this margin to 0.5s, and hopefully this race will disappear. Jinshan, could you please comment on the intent of this test? Will increasing the margin ruin the test?

          People

            jay Jinshan Xiong (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: