Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4011

problems with upstream lustre client code

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • fc19, 3.11 kernel
    • 3
    • 10743

    Description

      This ticket is to track issues with the upstream lustre client code that is part of the 3.11 kernel source in fc19.

      Making a separate ticket as suggested by Andreas in https://jira.hpdd.intel.com/browse/LU-3974?focusedCommentId=67574&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-67574

      Attachments

        Issue Links

          Activity

            [LU-4011] problems with upstream lustre client code

            Bob can you close this ticket.

            simmonsja James A Simmons added a comment - Bob can you close this ticket.

            James,
            I think this ticket was to track only old fc19 upstream client.
            Would suggest a new distinct ticket for upstream clients in 4.12 kernels.

            bogl Bob Glossman (Inactive) added a comment - James, I think this ticket was to track only old fc19 upstream client. Would suggest a new distinct ticket for upstream clients in 4.12 kernels.

            With lastest 4.12-rc5 upstream client as of today in my testing we fail the following sanity test. Besides these test the patch LU-8680 needs to be applied to make the lustre client stable.

            sanity 27z, 27D, 29, 77c, 101g, 102a, 102b, 102n, 103a, 125, 133h, 154B, 154a, 154g, 160a, 160c, 160e, 161c, 161d, 162a, 205, 215, 226a, 242, 251, 405, 900

            simmonsja James A Simmons added a comment - With lastest 4.12-rc5 upstream client as of today in my testing we fail the following sanity test. Besides these test the patch LU-8680 needs to be applied to make the lustre client stable. sanity 27z, 27D, 29, 77c, 101g, 102a, 102b, 102n, 103a, 125, 133h, 154B, 154a, 154g, 160a, 160c, 160e, 161c, 161d, 162a, 205, 215, 226a, 242, 251, 405, 900

            No, but found these failures in our regular autotest runs on master.

            jamesanunez James Nunez (Inactive) added a comment - No, but found these failures in our regular autotest runs on master.

            Numez are you testing the upstream client?

            simmonsja James A Simmons added a comment - Numez are you testing the upstream client?

            Another runtests test 1 failure in review-dne-part-2:
            2015-07-27 11:06:17 - https://testing.hpdd.intel.com/test_sets/558dd67a-3497-11e5-a9b3-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - Another runtests test 1 failure in review-dne-part-2: 2015-07-27 11:06:17 - https://testing.hpdd.intel.com/test_sets/558dd67a-3497-11e5-a9b3-5254006e85c2

            We're seeing a similar (same?) failure with runtests test_1 again at:
            2015-05-29 16:08:46 - https://testing.hpdd.intel.com/test_sets/21fd7836-0668-11e5-bf9f-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - We're seeing a similar (same?) failure with runtests test_1 again at: 2015-05-29 16:08:46 - https://testing.hpdd.intel.com/test_sets/21fd7836-0668-11e5-bf9f-5254006e85c2

            The situation on master even worse. It's not even compiled. I have submit http://review.whamcloud.com/#/c/8853/. And after this fix I observe other crash LU-4489.

            dmiter Dmitry Eremin (Inactive) added a comment - The situation on master even worse. It's not even compiled. I have submit http://review.whamcloud.com/#/c/8853/ . And after this fix I observe other crash LU-4489 .
            bergwolf Peng Tao added a comment - - edited

            Dmitry, thanks for digging the patch. Your patch also applies to lustre master. Do you see the same crash with master?

            bergwolf Peng Tao added a comment - - edited Dmitry, thanks for digging the patch. Your patch also applies to lustre master. Do you see the same crash with master?

            This LBUG was fixed by the following small patch:

            diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c
            index d440da9..2544053 100644
            --- a/lustre/obdclass/cl_lock.c
            +++ b/lustre/obdclass/cl_lock.c
            @@ -2053,8 +2053,8 @@ void cl_lock_hold_add(const struct lu_env *env, struct cl_
            lock *lock,
                     LASSERT(lock->cll_state != CLS_FREEING);
            
                     ENTRY;
            -        cl_lock_hold_mod(env, lock, +1);
                     cl_lock_get(lock);
            +        cl_lock_hold_mod(env, lock, +1);
                     lu_ref_add(&lock->cll_holders, scope, source);
                     lu_ref_add(&lock->cll_reference, scope, source);
                     EXIT;
            

            and also it happens only if CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK=y is set.

            dmiter Dmitry Eremin (Inactive) added a comment - This LBUG was fixed by the following small patch: diff --git a/lustre/obdclass/cl_lock.c b/lustre/obdclass/cl_lock.c index d440da9..2544053 100644 --- a/lustre/obdclass/cl_lock.c +++ b/lustre/obdclass/cl_lock.c @@ -2053,8 +2053,8 @@ void cl_lock_hold_add( const struct lu_env *env, struct cl_ lock *lock, LASSERT(lock->cll_state != CLS_FREEING); ENTRY; - cl_lock_hold_mod(env, lock, +1); cl_lock_get(lock); + cl_lock_hold_mod(env, lock, +1); lu_ref_add(&lock->cll_holders, scope, source); lu_ref_add(&lock->cll_reference, scope, source); EXIT; and also it happens only if CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK=y is set.
            Lustre: Echo OBD driver; http://www.lustre.org/
            Lustre: Layout lock feature supported.
            Lustre: Mounted lustre-client
            Lustre: DEBUG MARKER: Using TIMEOUT=20
            Lustre: DEBUG MARKER: -----============= acceptance-small: runtests ======
            Lustre: DEBUG MARKER: Using TIMEOUT=20
            Lustre: DEBUG MARKER: == runtests test 1: All Runtests ===================
            Lustre: DEBUG MARKER: touching /mnt/lustre at Mon Dec 16 16:52:56 MSK 2013
            Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.10496
            Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.10496
            Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.10496
            Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.10496 to /mnt/lustre/hosts.10496.ren
            Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.10496 again
            Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.10496
            Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.10496
            Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.10496.2
            Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.10496.2 to 123 bytes
            Lustre: DEBUG MARKER: creating /mnt/lustre/d0.runtests/d1
            Lustre: DEBUG MARKER: copying 1000 files from /etc /bin to /mnt/lustre/d0.runtests/d1/etc /bin at Mon Dec 16 16:52:58 MSK 2013
            Lustre: DEBUG MARKER: comparing 1000 newly copied files at Mon Dec 16 16:53:14 MSK 2013
            Lustre: DEBUG MARKER: finished at Mon Dec 16 16:53:20 MSK 2013 (24)
            Lustre: Unmounted lustre-client
            Lustre: Layout lock feature supported.
            Lustre: Mounted lustre-client
            Lustre: DEBUG MARKER: Using TIMEOUT=20
            Lustre: DEBUG MARKER: comparing 1000 previously copied files
            Lustre: DEBUG MARKER: runtests test_1: @@@@@@ FAIL: old and new files are different: rc=22
            Lustre: Unmounted lustre-client
            Lustre: Layout lock feature supported.
            Lustre: Mounted lustre-client
            Lustre: DEBUG MARKER: Using TIMEOUT=20
            Lustre: DEBUG MARKER: removing /mnt/lustre/d0.runtests/d1
            LustreError: 9679:0:(cl_lock.c:315:cl_lock_get()) ASSERTION( cl_lock_invariant(((void *)0), lock) ) failed:
            
            dmiter Dmitry Eremin (Inactive) added a comment - Lustre: Echo OBD driver; http: //www.lustre.org/ Lustre: Layout lock feature supported. Lustre: Mounted lustre-client Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: -----============= acceptance-small: runtests ====== Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: == runtests test 1: All Runtests =================== Lustre: DEBUG MARKER: touching /mnt/lustre at Mon Dec 16 16:52:56 MSK 2013 Lustre: DEBUG MARKER: create an empty file /mnt/lustre/hosts.10496 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.10496 Lustre: DEBUG MARKER: comparing /etc/hosts and /mnt/lustre/hosts.10496 Lustre: DEBUG MARKER: renaming /mnt/lustre/hosts.10496 to /mnt/lustre/hosts.10496.ren Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.10496 again Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.10496 Lustre: DEBUG MARKER: removing /mnt/lustre/hosts.10496 Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.10496.2 Lustre: DEBUG MARKER: truncating /mnt/lustre/hosts.10496.2 to 123 bytes Lustre: DEBUG MARKER: creating /mnt/lustre/d0.runtests/d1 Lustre: DEBUG MARKER: copying 1000 files from /etc /bin to /mnt/lustre/d0.runtests/d1/etc /bin at Mon Dec 16 16:52:58 MSK 2013 Lustre: DEBUG MARKER: comparing 1000 newly copied files at Mon Dec 16 16:53:14 MSK 2013 Lustre: DEBUG MARKER: finished at Mon Dec 16 16:53:20 MSK 2013 (24) Lustre: Unmounted lustre-client Lustre: Layout lock feature supported. Lustre: Mounted lustre-client Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: comparing 1000 previously copied files Lustre: DEBUG MARKER: runtests test_1: @@@@@@ FAIL: old and new files are different: rc=22 Lustre: Unmounted lustre-client Lustre: Layout lock feature supported. Lustre: Mounted lustre-client Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: removing /mnt/lustre/d0.runtests/d1 LustreError: 9679:0:(cl_lock.c:315:cl_lock_get()) ASSERTION( cl_lock_invariant(((void *)0), lock) ) failed:

            People

              bogl Bob Glossman (Inactive)
              bogl Bob Glossman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: