Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5240

Test failure sanity test_69: write error

Details

    • 3
    • 14612

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite runs:
      http://maloo.whamcloud.com/test_sets/136a09ee-d5c5-11e3-8758-52540035b04c
      https://maloo.whamcloud.com/test_sets/8000796a-f787-11e3-b15e-52540035b04c
      https://maloo.whamcloud.com/test_sets/90f9e2fc-f790-11e3-b0db-52540035b04c

      The sub-test test_69 failed with the following error:

      write error

      Info required for matching: sanity 69

      Attachments

        Issue Links

          Activity

            [LU-5240] Test failure sanity test_69: write error
            yujian Jian Yu added a comment -

            Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12002

            yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12002

            Patch landed to Master.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.
            yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/sub_tests/34c145f8-ff87-11e3-af4b-5254006e85c2
            yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/test_sets/06ab90ae-fd54-11e3-a270-5254006e85c2

            I've also submitted http://review.whamcloud.com/10855 to revert the original 10237 patch as a fallback measure in case 10802 doesn't solve the problem.

            adilger Andreas Dilger added a comment - I've also submitted http://review.whamcloud.com/10855 to revert the original 10237 patch as a fallback measure in case 10802 doesn't solve the problem.
            utopiabound Nathaniel Clark added a comment - Speculative patch: http://review.whamcloud.com/10802

            One suggestion would be to leave the limiting code in place but simply increase LU_CACHE_NR_ZFS_LIMIT. In practice we haven't seen any issue caused by setting it to 256 but perhaps an Andreas suggested earlier it needs to be at least a large as the number of service threads.

            behlendorf Brian Behlendorf added a comment - One suggestion would be to leave the limiting code in place but simply increase LU_CACHE_NR_ZFS_LIMIT. In practice we haven't seen any issue caused by setting it to 256 but perhaps an Andreas suggested earlier it needs to be at least a large as the number of service threads.

            Lots of the failures are on the two patches:
            http://review.whamcloud.com/10620 LU-5150 mdc: Handle empty but non-zero acl xattr
            http://review.whamcloud.com/10741 LU-5221 vvp: release mmap_sem in error case

            which puts the likely candidate as:
            http://review.whamcloud.com/10237 LU-5164 osd: Limit lu_object cache

            This is reinforced by the fact that the first version of the http://review.whamcloud.com/10237 patch also failed sanity.sh once before it was landed (the only such failure in the past year). The second version of the patch did not fail so it wasn't noticed before landing.

            adilger Andreas Dilger added a comment - Lots of the failures are on the two patches: http://review.whamcloud.com/10620 LU-5150 mdc: Handle empty but non-zero acl xattr http://review.whamcloud.com/10741 LU-5221 vvp: release mmap_sem in error case which puts the likely candidate as: http://review.whamcloud.com/10237 LU-5164 osd: Limit lu_object cache This is reinforced by the fact that the first version of the http://review.whamcloud.com/10237 patch also failed sanity.sh once before it was landed (the only such failure in the past year). The second version of the patch did not fail so it wasn't noticed before landing.

            Looks like this is a very recent regression. Is it possible to track it down to a patch that landed in the last couple of days?

            adilger Andreas Dilger added a comment - Looks like this is a very recent regression. Is it possible to track it down to a patch that landed in the last couple of days?

            People

              utopiabound Nathaniel Clark
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: