Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4011

problems with upstream lustre client code

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • fc19, 3.11 kernel
    • 3
    • 10743

    Description

      This ticket is to track issues with the upstream lustre client code that is part of the 3.11 kernel source in fc19.

      Making a separate ticket as suggested by Andreas in https://jira.hpdd.intel.com/browse/LU-3974?focusedCommentId=67574&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-67574

      Attachments

        Issue Links

          Activity

            [LU-4011] problems with upstream lustre client code

            Bob can you close this ticket.

            simmonsja James A Simmons added a comment - Bob can you close this ticket.

            James,
            I think this ticket was to track only old fc19 upstream client.
            Would suggest a new distinct ticket for upstream clients in 4.12 kernels.

            bogl Bob Glossman (Inactive) added a comment - James, I think this ticket was to track only old fc19 upstream client. Would suggest a new distinct ticket for upstream clients in 4.12 kernels.

            With lastest 4.12-rc5 upstream client as of today in my testing we fail the following sanity test. Besides these test the patch LU-8680 needs to be applied to make the lustre client stable.

            sanity 27z, 27D, 29, 77c, 101g, 102a, 102b, 102n, 103a, 125, 133h, 154B, 154a, 154g, 160a, 160c, 160e, 161c, 161d, 162a, 205, 215, 226a, 242, 251, 405, 900

            simmonsja James A Simmons added a comment - With lastest 4.12-rc5 upstream client as of today in my testing we fail the following sanity test. Besides these test the patch LU-8680 needs to be applied to make the lustre client stable. sanity 27z, 27D, 29, 77c, 101g, 102a, 102b, 102n, 103a, 125, 133h, 154B, 154a, 154g, 160a, 160c, 160e, 161c, 161d, 162a, 205, 215, 226a, 242, 251, 405, 900

            No, but found these failures in our regular autotest runs on master.

            jamesanunez James Nunez (Inactive) added a comment - No, but found these failures in our regular autotest runs on master.

            Numez are you testing the upstream client?

            simmonsja James A Simmons added a comment - Numez are you testing the upstream client?

            Another runtests test 1 failure in review-dne-part-2:
            2015-07-27 11:06:17 - https://testing.hpdd.intel.com/test_sets/558dd67a-3497-11e5-a9b3-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - Another runtests test 1 failure in review-dne-part-2: 2015-07-27 11:06:17 - https://testing.hpdd.intel.com/test_sets/558dd67a-3497-11e5-a9b3-5254006e85c2

            We're seeing a similar (same?) failure with runtests test_1 again at:
            2015-05-29 16:08:46 - https://testing.hpdd.intel.com/test_sets/21fd7836-0668-11e5-bf9f-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - We're seeing a similar (same?) failure with runtests test_1 again at: 2015-05-29 16:08:46 - https://testing.hpdd.intel.com/test_sets/21fd7836-0668-11e5-bf9f-5254006e85c2

            The situation on master even worse. It's not even compiled. I have submit http://review.whamcloud.com/#/c/8853/. And after this fix I observe other crash LU-4489.

            dmiter Dmitry Eremin (Inactive) added a comment - The situation on master even worse. It's not even compiled. I have submit http://review.whamcloud.com/#/c/8853/ . And after this fix I observe other crash LU-4489 .
            bergwolf Peng Tao added a comment - - edited

            Dmitry, thanks for digging the patch. Your patch also applies to lustre master. Do you see the same crash with master?

            bergwolf Peng Tao added a comment - - edited Dmitry, thanks for digging the patch. Your patch also applies to lustre master. Do you see the same crash with master?

            This LBUG was fixed by the following small patch:

            diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c
            index d440da9..2544053 100644
            --- a/lustre/obdclass/cl_lock.c
            +++ b/lustre/obdclass/cl_lock.c
            @@ -2053,8 +2053,8 @@ void cl_lock_hold_add(const struct lu_env *env, struct cl_
            lock *lock,
                     LASSERT(lock->cll_state != CLS_FREEING);
            
                     ENTRY;
            -        cl_lock_hold_mod(env, lock, +1);
                     cl_lock_get(lock);
            +        cl_lock_hold_mod(env, lock, +1);
                     lu_ref_add(&lock->cll_holders, scope, source);
                     lu_ref_add(&lock->cll_reference, scope, source);
                     EXIT;
            

            and also it happens only if CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK=y is set.

            dmiter Dmitry Eremin (Inactive) added a comment - This LBUG was fixed by the following small patch: diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c index d440da9..2544053 100644 --- a/lustre/obdclass/cl_lock.c +++ b/lustre/obdclass/cl_lock.c @@ -2053,8 +2053,8 @@ void cl_lock_hold_add( const struct lu_env *env, struct cl_ lock *lock, LASSERT(lock->cll_state != CLS_FREEING); ENTRY; - cl_lock_hold_mod(env, lock, +1); cl_lock_get(lock); + cl_lock_hold_mod(env, lock, +1); lu_ref_add(&lock->cll_holders, scope, source); lu_ref_add(&lock->cll_reference, scope, source); EXIT; and also it happens only if CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK=y is set.

            People

              bogl Bob Glossman (Inactive)
              bogl Bob Glossman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: