[LU-3783] MDS crash, LustreError: 34483:0:(ldlm_flock.c:208:ldlm_flock_deadlock()) ASSERTION( req != lock ) failed Created: 20/Aug/13  Updated: 26/Aug/13  Resolved: 26/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Andriy Skulysh Assignee: Keith Mannthey (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-1715 flock deadlock detection does not wor... Resolved
Severity: 3
Rank (Obsolete): 9786

 Description   
Aug  7 14:44:01 snx11003n003 kernel: [1044424.563294] LustreError: 34483:0:(ldlm_flock.c:208:ldlm_flock_deadlock()) ASSERTION( req != lock ) failed: 
Aug  7 14:44:01 snx11003n003 kernel: [1044424.574590] LustreError: 34483:0:(ldlm_flock.c:208:ldlm_flock_deadlock()) LBUG
Aug  7 14:44:01 snx11003n003 kernel: [1044424.583062] Pid: 34483, comm: mdt_13
Aug  7 14:44:01 snx11003n003 kernel: [1044424.587358] 
Aug  7 14:44:01 snx11003n003 kernel: [1044424.587360] Call Trace:
Aug  7 14:44:01 snx11003n003 kernel: [1044424.592376]  [<ffffffffa0475865>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.600494]  [<ffffffffa0475e77>] lbug_with_loc+0x47/0xb0 [libcfs]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.607793]  [<ffffffffa0716694>] ldlm_process_flock_lock+0x12c4/0x15f0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.616518]  [<ffffffffa0771e7b>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.624169]  [<ffffffffa06ebcf5>] ldlm_lock_enqueue+0x405/0x8f0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.632023]  [<ffffffffa071278d>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.640139]  [<ffffffffa0c481c6>] mdt_enqueue+0x46/0x130 [mdt]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.646981]  [<ffffffffa0c3da02>] mdt_handle_common+0x932/0x1770 [mdt]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.654604]  [<ffffffffa0c3e915>] mdt_regular_handle+0x15/0x20 [mdt]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.662077]  [<ffffffffa0741b83>] ptlrpc_main+0xf13/0x19e0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.669450]  [<ffffffffa0740c70>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.676765]  [<ffffffff8100c1ca>] child_rip+0xa/0x20
Aug  7 14:44:01 snx11003n003 kernel: [1044424.682681]  [<ffffffffa0740c70>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.690036]  [<ffffffffa0740c70>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Aug  7 14:44:01 snx11003n003 kernel: [1044424.697350]  [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

the export was reconnected, but flock->blocking_export contains old value.



 Comments   
Comment by Andriy Skulysh [ 20/Aug/13 ]

patch: http://review.whamcloud.com/#/c/7392/

Abort processing of flock blockers list on reaching disconnected export. A deadlock (if there is any) will be found during reprocess (introduced by LU-1715).

Comment by Jodi Levi (Inactive) [ 26/Aug/13 ]

Patch landed to Master. Let me know if anything more is needed and I will reopen the ticket.

Generated at Sat Feb 10 01:36:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.