[LU-15491] rename deadlock Created: 28/Jan/22  Updated: 15/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Oleg Drokin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

thread 1: mv dir1/10 dir2/10
thread 2: mv dir2/11 dir2/2
where dir1/10 is hardlink to dir2/2 and dir2/10 is hardlink to dir2/11
so thread 1 and thread2 try to take those locks in reverse order with the predictable result

This is reproduced with non-dne racer run here: http://testing.linuxhacker.ru:3333/lustre-reports/21289/testresults/racer-long-zfs-centos7_x86_64-centos7_x86_64-retry3/

deadlock report reports:

Deadlock!
ungranted lock
   -- Lock: 0xffff8800a93f5b00/0x539a6d335586e266 (pid: 6606)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0x9fd9:0x0].0
       Req mode: EX, grant mode: --, read: 0, write: 1
       Bits: 0x3
waits for granted lock
   -- Lock: 0xffff8800af61e1c0/0x539a6d335586e235 (pid: 14160)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0x9fd9:0x0].0
       Req mode: EX, grant mode: EX, read: 0, write: 1
       Bits: 0x21
but holder is blocked waiting on another lock
   -- Lock: 0xffff880114890900/0x539a6d335586e258 (pid: 14160)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0xa0b7:0x0].0
       Req mode: EX, grant mode: --, read: 0, write: 1
       Bits: 0x3
that is held by the first thread wanting the first lock in the chain
   -- Lock: 0xffff8800a93f61c0/0x539a6d335586e23c (pid: 6606)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0xa0b7:0x0].0
       Req mode: EX, grant mode: EX, read: 0, write: 1
       Bits: 0x21
Deadlock!
ungranted lock
   -- Lock: 0xffff880114890900/0x539a6d335586e258 (pid: 14160)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0xa0b7:0x0].0
       Req mode: EX, grant mode: --, read: 0, write: 1
       Bits: 0x3
waits for granted lock
   -- Lock: 0xffff8800a93f61c0/0x539a6d335586e23c (pid: 6606)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0xa0b7:0x0].0
       Req mode: EX, grant mode: EX, read: 0, write: 1
       Bits: 0x21
but holder is blocked waiting on another lock
   -- Lock: 0xffff8800a93f5b00/0x539a6d335586e266 (pid: 6606)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0x9fd9:0x0].0
       Req mode: EX, grant mode: --, read: 0, write: 1
       Bits: 0x3
that is held by the first thread wanting the first lock in the chain
   -- Lock: 0xffff8800af61e1c0/0x539a6d335586e235 (pid: 14160)
       Node: local, ns lustre-MDT0000
       Resource: [0x200000401:0x9fd9:0x0].0
       Req mode: EX, grant mode: EX, read: 0, write: 1
       Bits: 0x21

thread info for pid 6606 at 0xffff8800d3c6e000:

  mti_rr = {
    rr_opcode = REINT_RENAME,
    rr_open_handle = 0x0,
    rr_lease_handle = 0x0,
    rr_fid1 = 0xffff880116b15000,
    rr_fid2 = 0xffff880116b15010,
    rr_name = {
      ln_name = 0xffff880116b15060 "10",
      ln_namelen = 2
    },
    rr_tgt_name = {
      ln_name = 0xffff880116b15068 "10",
      ln_namelen = 2
    },
crash> p *(struct ll_fid *)0xffff880116b15000
$3 = {
  id = 8589935617,
  generation = 1,
  f_type = 0
}
crash> p *(struct ll_fid *)0xffff880116b15010
$4 = {
  id = 8589935618,
  generation = 41273,
  f_type = 0
}

thread info for pid 14160 at 0xffff88010f554000

  mti_rr = {
    rr_opcode = REINT_RENAME,
    rr_open_handle = 0x0,
    rr_lease_handle = 0x0,
    rr_fid1 = 0xffff880116b15c20,
    rr_fid2 = 0xffff880116b15c30,
    rr_name = {
      ln_name = 0xffff880116b15c80 "11",
      ln_namelen = 2
    },
    rr_tgt_name = {
      ln_name = 0xffff880116b15c88 "2",
      ln_namelen = 1
    },
crash> p *(struct ll_fid *)0xffff880116b15c20
$6 = {
  id = 8589935617,
  generation = 1,
  f_type = 0
}
crash> p *(struct ll_fid *)0xffff880116b15c30
$7 = {
  id = 8589935617,
  generation = 1,
  f_type = 0
}

 

And indeed in mdt_reint_rename() we see:

                lock_ibits = MDS_INODELOCK_LOOKUP | MDS_INODELOCK_XATTR;
                rc = mdt_rename_source_lock(info, msrcdir, mold, lh_oldp,
                                            lh_rmt, lock_ibits, cos_incompat);
 

aka 0x21 lock bit for mold

                lh_newp = &info->mti_lh[MDT_LH_NEW];
                mdt_lock_reg_init(lh_newp, LCK_EX);
                lock_ibits = MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE;
... // skip remote check
                rc = mdt_reint_object_lock(info, mnew, lh_newp, lock_ibits,
                                           cos_incompat); 

aka 0x3 lock bit

vmcore is available in the report (the server one) if you want to marvel at the coincidence



 Comments   
Comment by Gerrit Updater [ 28/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46352
Subject: LU-15491 mdt: rename lock inversion deadlock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 397236142ce8406f8a996f81f06aee0a6af4a696

Comment by Andreas Dilger [ 04/Nov/22 ]

+1 on master: https://testing.whamcloud.com/test_sets/2f4ce97f-d375-4353-8c2c-7a8696c9f84b
Clients each have two threads stuck waiting in rename:

[Fri Nov  4 17:45:37 2022] task:mv              state:D stack:    0 pid:499292 ppid:235624 flags:0x00000080
[Fri Nov  4 17:45:37 2022] Call Trace:
[Fri Nov  4 17:45:37 2022]  schedule+0x35/0xa0
[Fri Nov  4 17:45:37 2022]  __mutex_lock.isra.7+0x310/0x420
[Fri Nov  4 17:45:37 2022]  lock_rename+0x28/0xb0
[Fri Nov  4 17:45:37 2022]  do_renameat2+0x277/0x4e0
[Fri Nov  4 17:45:37 2022]  __x64_sys_rename+0x1c/0x20

[Fri Nov  4 17:45:37 2022] task:mv              state:D stack:    0 pid:499317 ppid:235529 flags:0x00000080
[Fri Nov  4 17:45:37 2022] Call Trace:
[Fri Nov  4 17:45:37 2022]  schedule+0x35/0xa0
[Fri Nov  4 17:45:37 2022]  __mutex_lock.isra.7+0x310/0x420
[Fri Nov  4 17:45:37 2022]  lock_rename+0x28/0xb0
[Fri Nov  4 17:45:37 2022]  do_renameat2+0x277/0x4e0
[Fri Nov  4 17:45:37 2022]  __x64_sys_rename+0x1c/0x20

[Fri Nov  4 17:43:34 2022] task:mv              state:D stack:    0 pid:198990 ppid:184082 flags:0x00000080
[Fri Nov  4 17:43:34 2022] Call Trace:
[Fri Nov  4 17:43:34 2022]  schedule+0x35/0xa0
[Fri Nov  4 17:43:34 2022]  __mutex_lock.isra.7+0x310/0x420
[Fri Nov  4 17:43:34 2022]  lock_rename+0x28/0xb0
[Fri Nov  4 17:43:34 2022]  do_renameat2+0x277/0x4e0
[Fri Nov  4 17:43:34 2022]  __x64_sys_renameat+0x1d/0x20

[Fri Nov  4 17:43:34 2022] task:mv              state:D stack:    0 pid:198998 ppid:184046 flags:0x00004080
[Fri Nov  4 17:43:34 2022] Call Trace:
[Fri Nov  4 17:43:34 2022]  schedule+0x35/0xa0
[Fri Nov  4 17:43:34 2022]  __mutex_lock.isra.7+0x310/0x420
[Fri Nov  4 17:43:34 2022]  lock_rename+0x28/0xb0
[Fri Nov  4 17:43:34 2022]  do_renameat2+0x277/0x4e0
[Fri Nov  4 17:43:34 2022]  __x64_sys_renameat2+0x20/0x30

and one MDS thread stuck in rename:

Fri Nov  4 17:42:41 2022] Pid: 114790, comm: mdt00_023 4.18.0-372.26.1.el8_lustre.x86_64 #1 SMP Mon Oct 17 07:45:27 UTC 2022
[Fri Nov  4 17:42:41 2022] Call Trace TBD:
[Fri Nov  4 17:42:41 2022] [<0>] ldlm_completion_ast+0x7ac/0x900 [ptlrpc]
[Fri Nov  4 17:42:41 2022] [<0>] ldlm_cli_enqueue_fini+0xa4d/0xf40 [ptlrpc]
[Fri Nov  4 17:42:41 2022] [<0>] ldlm_cli_enqueue+0x61d/0xa70 [ptlrpc]
[Fri Nov  4 17:42:41 2022] [<0>] osp_md_object_lock+0x1cd/0x2c0 [osp]
[Fri Nov  4 17:42:41 2022] [<0>] lod_object_lock+0x132/0x7f0 [lod]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_remote_object_lock_try+0x186/0x3c0 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_remote_object_lock+0x1f/0x30 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_rename_lock+0x26b/0x3a0 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_reint_migrate+0x5bd/0x1430 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_reint_rec+0x117/0x270 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_reint_internal+0x4bc/0x7d0 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] mdt_reint+0x5d/0x110 [mdt]
[Fri Nov  4 17:42:41 2022] [<0>] tgt_request_handle+0xc8c/0x19c0 [ptlrpc]
[Fri Nov  4 17:42:41 2022] [<0>] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
[Fri Nov  4 17:42:41 2022] [<0>] ptlrpc_main+0xc0b/0x1560 [ptlrpc]
Comment by Andreas Dilger [ 15/Aug/23 ]

+1 on master racer with same mdt_reint_migrate() + mdt_rename_lock() stack traces on MDS and stuck mv processes on client.
https://testing.whamcloud.com/test_sets/b3a6dd16-2b8b-4ef7-9be6-59a09053aa7a

Generated at Sat Feb 10 03:18:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.