Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12017

Truncate vs setxattr deadlock with DoM

Details

    • 3
    • 9223372036854775807

    Description

      setxattr takes inode lock and sends reint to MDS.
      truncate takes MDS_INODELOCK_DOM lock and  wants to acquire inode lock.

      PID: 14942 TASK: ffff88007659cf10 CPU: 3 COMMAND: "truncate"
       #0 [ffff88011f397af8] __schedule at ffffffff816b3de4
       #1 [ffff88011f397b88] schedule_preempt_disabled at ffffffff816b5329
       #2 [ffff88011f397b98] __mutex_lock_slowpath at ffffffff816b30d7
       #3 [ffff88011f397bf0] mutex_lock at ffffffff816b24bf
       #4 [ffff88011f397c08] vvp_io_setattr_start at ffffffffc118993d [lustre]
       #5 [ffff88011f397c40] cl_io_start at ffffffffc06e7a25 [obdclass]
       #6 [ffff88011f397c68] cl_io_loop at ffffffffc06e9e01 [obdclass]
       #7 [ffff88011f397cd8] cl_setattr_ost at ffffffffc11847ef [lustre]
       #8 [ffff88011f397d28] ll_setattr_raw at ffffffffc11614d8 [lustre]
       #9 [ffff88011f397df0] ll_setattr at ffffffffc11617d3 [lustre]
      #10 [ffff88011f397e00] notify_change at ffffffff81223bc4
      #11 [ffff88011f397e48] do_truncate at ffffffff81203445
      #12 [ffff88011f397ec0] vfs_truncate at ffffffff8120361c
      #13 [ffff88011f397ef8] do_sys_truncate at ffffffff8120370c
      #14 [ffff88011f397f40] sys_truncate at ffffffff812038de
      
      PID: 15194 TASK: ffff880077f18000 CPU: 1 COMMAND: "setfattr"
       #0 [ffff88011d33b8b8] __schedule at ffffffff816b3de4
       #1 [ffff88011d33b948] schedule at ffffffff816b4409
       #2 [ffff88011d33b958] schedule_timeout at ffffffff816b1ca4
       #3 [ffff88011d33ba00] ptlrpc_set_wait at ffffffffc09070a0 [ptlrpc]
       #4 [ffff88011d33baf0] ptlrpc_queue_wait at ffffffffc09074e3 [ptlrpc]
       #5 [ffff88011d33bb10] mdc_xattr_common at ffffffffc0b52186 [mdc]
       #6 [ffff88011d33bb90] mdc_setxattr at ffffffffc0b522de [mdc]
       #7 [ffff88011d33bbd0] lmv_setxattr at ffffffffc0872524 [lmv]
       #8 [ffff88011d33bc48] ll_xattr_set_common at ffffffffc1175b54 [lustre]
       #9 [ffff88011d33bcc8] ll_xattr_set_common_3_11 at ffffffffc11769ab [lustre]
      #10 [ffff88011d33bcd8] generic_setxattr at ffffffff8122c2d8
      #11 [ffff88011d33bd10] __vfs_setxattr_noperm at ffffffff8122cb45
      #12 [ffff88011d33bd58] vfs_setxattr at ffffffff8122cd45
      #13 [ffff88011d33bd98] setxattr at ffffffff8122ce7e
      #14 [ffff88011d33bef0] sys_setxattr at ffffffff8122d177
      

      MDS locks are for different bits MDS_INODELOCK_UPDATE|MDS_INODELOCK_XATTR vs
      MDS_INODELOCK_DOM but they blocks each other if some blocking lock was present earlier because Lustre tries to grant only first lock in the waiting list.

      Attachments

        Issue Links

          Activity

            [LU-12017] Truncate vs setxattr deadlock with DoM
            pjones Peter Jones added a comment -

            As long as there is some value to landing the original patch I would suggest moving remaining work to a distinct ticket.

            pjones Peter Jones added a comment - As long as there is some value to landing the original patch I would suggest moving remaining work to a distinct ticket.

            The problem is not fixed fully, patch helps to avoid some situations with deadlock when IBITS of conflicting locks don't intersect but deadlock still exists and may happen in future when other IBITS combination will occur. The reason of deadlock on client side is not yet fixed.

            I'd like to keep this ticket opened with lowered severity and priority or create new ticket with link to this one to fix the reason of deadlock on client side

            tappro Mikhail Pershin added a comment - The problem is not fixed fully, patch helps to avoid some situations with deadlock when IBITS of conflicting locks don't intersect but deadlock still exists and may happen in future when other IBITS combination will occur. The reason of deadlock on client side is not yet fixed. I'd like to keep this ticket opened with lowered severity and priority or create new ticket with link to this one to fix the reason of deadlock on client side

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35937
            Subject: LU-12017 ldlm: DoM truncate deadlock
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 3f8d06297971efbf773df70c97edb1a7f96a34ad

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35937 Subject: LU-12017 ldlm: DoM truncate deadlock Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 3f8d06297971efbf773df70c97edb1a7f96a34ad
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35057/
            Subject: LU-12017 ldlm: DoM truncate deadlock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2250e072c37855d611aa64027945981fe2c8f4d7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35057/ Subject: LU-12017 ldlm: DoM truncate deadlock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2250e072c37855d611aa64027945981fe2c8f4d7
            spitzcor Cory Spitz added a comment - - edited

            I've added L2.13 to the Fix Version.

            spitzcor Cory Spitz added a comment - - edited I've added L2.13 to the Fix Version.
            spitzcor Cory Spitz added a comment -

            pjones, can we target this for 2.13.0?

            spitzcor Cory Spitz added a comment - pjones , can we target this for 2.13.0?

            Andriy, currently the mdt_dom_discard_data() doesn't wait for lock to be granted, that sort of deadlock should be fixed already by LU-11359

            tappro Mikhail Pershin added a comment - Andriy, currently the mdt_dom_discard_data() doesn't wait for lock to be granted, that sort of deadlock should be fixed already by LU-11359

            lock 0xffff88007905eb40 is for mnew in mdt_reint_rename()
            lock 0xffff880090f23440 is requested by same thread mdt_reint_rename()->mdt_dom_discard_data()
            so DOM lock can't be  granted and mnew lock can't be unlocked as we need to send the reply

            askulysh Andriy Skulysh added a comment - lock 0xffff88007905eb40 is for mnew in mdt_reint_rename() lock 0xffff880090f23440 is requested by same thread mdt_reint_rename()->mdt_dom_discard_data() so DOM lock can't be  granted and mnew lock can't be unlocked as we need to send the reply

            Similar deadlock (getattr vs rename) is triggered :

            granted lock:
                 <struct ldlm_lock 0xffff88007905eb40>
                 0x73f93784e720791e  [0x200000402:0x14c8:0x0] pid 20988
                 MDS_INODELOCK_LOOKUP|MDS_INODELOCK_UPDATE
            waiting locks:
                 <struct ldlm_lock 0xffff8801243d8000>
                 0x73f93784e72079b1 [0x200000402:0x14c8:0x0] pid 10035
                 MDS_INODELOCK_LOOKUP|MDS_INODELOCK_UPDATE|MDS_INODELOCK_PERM
             
                 <struct ldlm_lock 0xffff880090f23440>
                 0x73f93784e72079f0 [0x200000402:0x14c8:0x0] pid 20988
                 MDS_INODELOCK_DOM
            
            askulysh Andriy Skulysh added a comment - Similar deadlock (getattr vs rename) is triggered : granted lock: <struct ldlm_lock 0xffff88007905eb40> 0x73f93784e720791e [0x200000402:0x14c8:0x0] pid 20988 MDS_INODELOCK_LOOKUP|MDS_INODELOCK_UPDATE waiting locks: <struct ldlm_lock 0xffff8801243d8000> 0x73f93784e72079b1 [0x200000402:0x14c8:0x0] pid 10035 MDS_INODELOCK_LOOKUP|MDS_INODELOCK_UPDATE|MDS_INODELOCK_PERM <struct ldlm_lock 0xffff880090f23440> 0x73f93784e72079f0 [0x200000402:0x14c8:0x0] pid 20988 MDS_INODELOCK_DOM

            Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/35057
            Subject: LU-12017 test: DoM truncate deadlock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2f0a4e2a2073fa4c0fced7f1e155ff7b4372298a

            gerrit Gerrit Updater added a comment - Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/35057 Subject: LU-12017 test: DoM truncate deadlock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2f0a4e2a2073fa4c0fced7f1e155ff7b4372298a

            People

              tappro Mikhail Pershin
              askulysh Andriy Skulysh
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: