Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4401

sanityn test_13 : f13 shouldn't return an error (1)

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.6.0
    • None
    • 3
    • 12081

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>
      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/1d250bbe-f0c9-11e2-ba12-52540035b04c.

      This test has been failing intermittently for at least a year, but has never had a bug opened for it. The first reported failure with this message is a single failure on 2012-01-05:
      https://maloo.whamcloud.com/sub_tests/bea1ba98-37e1-11e1-bd0c-5254004bbbd3 (2012-01-05)

      but there are a number of previous failures on the same test with "test_13 returned 1" starting on 2012-08-22 through 2013-04-23, as many as 3 or 4 failures a day. It then returned on 2013-06-04 with the current failure message:
      https://maloo.whamcloud.com/sub_tests/bd93ded6-ccde-11e2-a1e0-52540035b04c

      However, while that was the first recorded failure on an unlanded patch, there were many other failures on different patches before the first one landed several months later, so it was not the root cause.

      There are no messages on the consoles of the client or MDS that would indicate the cause.

      The sub-test test_13 failed with the following error:

      f13 shouldn't return an error (1)

      Info required for matching: sanityn 13

      Attachments

        Issue Links

          Activity

            [LU-4401] sanityn test_13 : f13 shouldn't return an error (1)

            Fixed in the original patches.

            adilger Andreas Dilger added a comment - Fixed in the original patches.

            This was caused by a bug in Di's DNE patch series and was fixed before the patches landed.

            adilger Andreas Dilger added a comment - This was caused by a bug in Di's DNE patch series and was fixed before the patches landed.

            In that case, don't worry about it if you have already fixed the problem in your patch. Just link this bug to the one that your patch is developed under and close it.

            adilger Andreas Dilger added a comment - In that case, don't worry about it if you have already fixed the problem in your patch. Just link this bug to the one that your patch is developed under and close it.
            di.wang Di Wang added a comment -

            Sure. Hmm, I checked the recent result, it seems all recent failures, since August, are related with my readdir change patches. I will check those earlier results to see what I can find.

            di.wang Di Wang added a comment - Sure. Hmm, I checked the recent result, it seems all recent failures, since August, are related with my readdir change patches. I will check those earlier results to see what I can find.

            Di, could you please also look into some of the other failures?

            This is a test failure that has been around a year already, and is causing quite a few failures (170 failures in the past 4 weeks, including 16/167 review-dne sanityn runs in that time). Since we are trying to enforce review-dne, having a 10% failure in each sanityn test separately means only (0.9 * 0.9) = 81% chance of success when we need to pass sanityn twice.

            If you can't find anything that can be fixed easily, it would make sense to submit a patch to add this test to ALWAYS_EXCEPT until it can be fixed.

            adilger Andreas Dilger added a comment - Di, could you please also look into some of the other failures? This is a test failure that has been around a year already, and is causing quite a few failures (170 failures in the past 4 weeks, including 16/167 review-dne sanityn runs in that time). Since we are trying to enforce review-dne, having a 10% failure in each sanityn test separately means only (0.9 * 0.9) = 81% chance of success when we need to pass sanityn twice. If you can't find anything that can be fixed easily, it would make sense to submit a patch to add this test to ALWAYS_EXCEPT until it can be fixed.
            di.wang Di Wang added a comment -

            The failure in http://maloo.whamcloud.com/test_sets/1d250bbe-f0c9-11e2-ba12-52540035b04c is actually from my readdir change patches. But the real reason is that in
            it_to_lock_mode, IT_READDIR is being assigned to LCK_CR instead of LCK_PR. Anyway, I already fixed this problem in the update patch.

            But I did not check the problem in previous tests, probably there are some other problems as well.

            tatic inline int it_to_lock_mode(struct lookup_intent *it)
            {
                    /* CREAT needs to be tested before open (both could be set) */
                    if (it->it_op & IT_CREAT)
                            return LCK_CW;
                    else if (it->it_op & (IT_READDIR | IT_GETATTR | IT_OPEN | IT_LOOKUP |
                                          IT_LAYOUT))
                            return LCK_CR;
            
                    LASSERTF(0, "Invalid it_op: %d\n", it->it_op);
                    return -EINVAL;
            }
            
            
            di.wang Di Wang added a comment - The failure in http://maloo.whamcloud.com/test_sets/1d250bbe-f0c9-11e2-ba12-52540035b04c is actually from my readdir change patches. But the real reason is that in it_to_lock_mode, IT_READDIR is being assigned to LCK_CR instead of LCK_PR. Anyway, I already fixed this problem in the update patch. But I did not check the problem in previous tests, probably there are some other problems as well. tatic inline int it_to_lock_mode(struct lookup_intent *it) { /* CREAT needs to be tested before open (both could be set) */ if (it->it_op & IT_CREAT) return LCK_CW; else if (it->it_op & (IT_READDIR | IT_GETATTR | IT_OPEN | IT_LOOKUP | IT_LAYOUT)) return LCK_CR; LASSERTF(0, "Invalid it_op: %d\n", it->it_op); return -EINVAL; }

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: