Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4495

client evicted on parallel append write to the shared file.

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0
    • system with 8 sot's.
    • 3
    • 12299

    Description

      client sometimes evicted with simple workload.

      Attachments

        1. 1590-ptlrpcd
          1.93 MB
        2. 1673-cp_ast
          18 kB
        3. 1884-bl_ast
          4 kB
        4. 2485-main
          1.23 MB
        5. 2487-main
          1.66 MB
        6. mpi_log.c
          0.8 kB
        7. patch
          2 kB

        Issue Links

          Activity

            [LU-4495] client evicted on parallel append write to the shared file.
            paf Patrick Farrell (Inactive) added a comment - This is a duplicate of: https://jira.hpdd.intel.com/browse/LU-4381

            Jinshan:

            Actually, Alexey's original patch (commenting out code) removes exactly the same code as your patch:

            LU-4381 lov: to not hold sub locks at initialization

            Otherwise, it will cause deadlock because it essentially holds
            some sub locks and then to request others in an arbitrary order.

            Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
            Change-Id: I00d51677aa94ec41856402fd8a027e09355786ee
            Reviewed-on: http://review.whamcloud.com/9152
            Tested-by: Jenkins
            Tested-by: Maloo <hpdd-maloo@intel.com>
            Reviewed-by: Lai Siyao <lai.siyao@intel.com>
            Reviewed-by: Bobi Jam <bobijam@gmail.com>
            Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

            Which has been landed to master.
            So, nevermind! This bug can be closed.

            paf Patrick Farrell (Inactive) added a comment - Jinshan: Actually, Alexey's original patch (commenting out code) removes exactly the same code as your patch: LU-4381 lov: to not hold sub locks at initialization Otherwise, it will cause deadlock because it essentially holds some sub locks and then to request others in an arbitrary order. Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Change-Id: I00d51677aa94ec41856402fd8a027e09355786ee Reviewed-on: http://review.whamcloud.com/9152 Tested-by: Jenkins Tested-by: Maloo <hpdd-maloo@intel.com> Reviewed-by: Lai Siyao <lai.siyao@intel.com> Reviewed-by: Bobi Jam <bobijam@gmail.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com> Which has been landed to master. So, nevermind! This bug can be closed.

            Hello,

            I would like to mention that, on a lustre 2.4.2 version we were seeing client evictions when running a testcase which makes tasks append write to a single shared file.

            After applying patch http://review.whamcloud.com/9876 LU-4558 "clio: Solve a race in cl_lock_put", the client evictions did not occur anymore.

            pichong Gregoire Pichon added a comment - Hello, I would like to mention that, on a lustre 2.4.2 version we were seeing client evictions when running a testcase which makes tasks append write to a single shared file. After applying patch http://review.whamcloud.com/9876 LU-4558 "clio: Solve a race in cl_lock_put", the client evictions did not occur anymore.

            Take it easy, Patrick. To be honest, I didn't look into the patch v2 and v3.

            Anyway, can you please create a patch to comment the code out, as I mentioned in patch v1?

            jay Jinshan Xiong (Inactive) added a comment - Take it easy, Patrick. To be honest, I didn't look into the patch v2 and v3. Anyway, can you please create a patch to comment the code out, as I mentioned in patch v1?

            Jinshan - This is my fault, actually. I said to Cory today that it seemed like getting something landed at Intel for this was dragging on and on. I wasn't specific about it - It's on Alexey to update the patch, rather than anyone at Intel to do something.

            Sorry about that!

            I am curious: How do you feel about the approach found in patch set 1, rather than patch set 2/3? IE, disable the code, rather than try to fix it? That approach is working fine for Cray, Alexey was just hoping to do better.

            paf Patrick Farrell (Inactive) added a comment - Jinshan - This is my fault, actually. I said to Cory today that it seemed like getting something landed at Intel for this was dragging on and on. I wasn't specific about it - It's on Alexey to update the patch, rather than anyone at Intel to do something. Sorry about that! I am curious: How do you feel about the approach found in patch set 1, rather than patch set 2/3? IE, disable the code, rather than try to fix it? That approach is working fine for Cray, Alexey was just hoping to do better.

            Hi Cory, Patrick confirmed that the patch couldn't fix the problem so this patch can't be landed. Did I miss anything here?

            Jinshan

            jay Jinshan Xiong (Inactive) added a comment - Hi Cory, Patrick confirmed that the patch couldn't fix the problem so this patch can't be landed. Did I miss anything here? Jinshan
            spitzcor Cory Spitz added a comment -

            The patch is still pending review. Can it land?

            spitzcor Cory Spitz added a comment - The patch is still pending review. Can it land?

            http://review.whamcloud.com/8971

            Xyratex-bug-id: CLSTR-2003, MRP-1603.

            shadow Alexey Lyashkov added a comment - http://review.whamcloud.com/8971 Xyratex-bug-id: CLSTR-2003, MRP-1603.

            A number of updates here. Alexey L. provided a patch for Cray to test, and we confirmed it fixes the deadlock. (We've tested on Cray's version of 2.5, and on current master.)

            I've attached that patch.

            Essentially, in lov_lock_sub_init, it removes a for loop which has this comment on it:
            /*

            • Then, create sub-locks. Once at least one sub-lock was created,
            • top-lock can be reached by other threads.
              */

            Here's Alexey's explanation of the deadlock with a slight edit by me:

            1) client1 has a full set of locks for a file

            { stripe1, stripe2, stripe3}

            , client2 has no locks.

            2) client1 finished an io and put locks to inactive states,
            so client2 is able to take 2 of 3 locks from client1 and reference them.

            3) client1 enters new IO loop (different process)
            and start to take a lock quorum - in top lock allocation time
            it finds a stripe3 lock in cache and holds it.

            4) client2 tries to take a stripe3 lock but
            it's protected by hold flag now and BL AST ignored
            (just mark a lock a callback pending).
            client2 enters the waiting for grant loop.

            5) client1 enter to lock enqueue loop to take a stripe1 lock,
            but client2 can't return that lock because it referenced.
            ... so deadlock.

            This patch removes sub lock creation from lov_lock_sub_init.

            If sub locks are not created in lov_lock_sub_init,
            they are created in the enqueue process. This avoids the
            deadlock described above by avoiding step 3.

            We've run our general Lustre IO performance test suite (multiple clients, various IO patterns, various stripe counts) and weren't able to measure any difference between with the patch and without the patch.

            We're currently running it through sanity and some other stress testing.

            Going forward, we're wondering if this is expected to be the final version of the patch. Essentially, this fixes the problem with this performance optimization by removing the optimization. As far as I can tell, the optimization seems to be basically allowing better re-use of cl_locks, and as I understand it, caching & re-use of cl_locks is slated for removal in the upcoming CLIO simplification anyway.

            So it seems like it may be OK to leave this optimization out.

            Is this the plan, or is the hope for a patch that actually fixes the performance optimization rather than removes it?

            paf Patrick Farrell (Inactive) added a comment - A number of updates here. Alexey L. provided a patch for Cray to test, and we confirmed it fixes the deadlock. (We've tested on Cray's version of 2.5, and on current master.) I've attached that patch. Essentially, in lov_lock_sub_init, it removes a for loop which has this comment on it: /* Then, create sub-locks. Once at least one sub-lock was created, top-lock can be reached by other threads. */ Here's Alexey's explanation of the deadlock with a slight edit by me: — 1) client1 has a full set of locks for a file { stripe1, stripe2, stripe3} , client2 has no locks. 2) client1 finished an io and put locks to inactive states, so client2 is able to take 2 of 3 locks from client1 and reference them. 3) client1 enters new IO loop (different process) and start to take a lock quorum - in top lock allocation time it finds a stripe3 lock in cache and holds it. 4) client2 tries to take a stripe3 lock but it's protected by hold flag now and BL AST ignored (just mark a lock a callback pending). client2 enters the waiting for grant loop. 5) client1 enter to lock enqueue loop to take a stripe1 lock, but client2 can't return that lock because it referenced. ... so deadlock. — This patch removes sub lock creation from lov_lock_sub_init. If sub locks are not created in lov_lock_sub_init, they are created in the enqueue process. This avoids the deadlock described above by avoiding step 3. — We've run our general Lustre IO performance test suite (multiple clients, various IO patterns, various stripe counts) and weren't able to measure any difference between with the patch and without the patch. We're currently running it through sanity and some other stress testing. Going forward, we're wondering if this is expected to be the final version of the patch. Essentially, this fixes the problem with this performance optimization by removing the optimization. As far as I can tell, the optimization seems to be basically allowing better re-use of cl_locks, and as I understand it, caching & re-use of cl_locks is slated for removal in the upcoming CLIO simplification anyway. So it seems like it may be OK to leave this optimization out. Is this the plan, or is the hope for a patch that actually fixes the performance optimization rather than removes it?

            First patch from Alexey L.

            paf Patrick Farrell (Inactive) added a comment - First patch from Alexey L.

            I confirm
            cl_lock_alloc > lov_lock_sub_init -> lov_sublock_alloc> .. ->cl_lock_hold exist on
            /lov_lock.c/1.19/Wed Nov 25 02:10:38 2009//
            so it's very close to the original clio code.

            but it's needs situation when second process start IO while one or more locks for a multi stripe file still in cache. so that bug will be never replicated for lustre default striping (1 stripe per file).

            shadow Alexey Lyashkov added a comment - I confirm cl_lock_alloc > lov_lock_sub_init -> lov_sublock_alloc > .. ->cl_lock_hold exist on /lov_lock.c/1.19/Wed Nov 25 02:10:38 2009// so it's very close to the original clio code. but it's needs situation when second process start IO while one or more locks for a multi stripe file still in cache. so that bug will be never replicated for lustre default striping (1 stripe per file).

            People

              jay Jinshan Xiong (Inactive)
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: