[LU-7829] Glimpse lock shouldn't match not granted locks Created: 01/Mar/16  Updated: 13/May/16  Resolved: 23/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A deadlock is possible during ccc_prep_size()->ldlm_lock_match() vs
cl_io_lock() which is waiting for a matched lock and conflicts with
already taken lock before ccc_prep_size().

It is better to send an aditional lock request to avoid deadlock.



 Comments   
Comment by Gerrit Updater [ 01/Mar/16 ]

Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/18738
Subject: LU-7829 osc: glimpse lock should match only with granted locks
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 12ca5348a93485a3f2112e82095b83aad02e77ea

Comment by Oleg Drokin [ 04/Mar/16 ]

I am pulling Jinshan into this ticket.

I am not happy about the current matching behavior either, but I think this was discussed a few years ago and there were some compelling reasons to do it too. Possibly they went away with clio simplification, but I want to be sure that is fine now.

Comment by Jinshan Xiong (Inactive) [ 07/Mar/16 ]

Hi Andriy,

Can you please describe the deadlock issue in detail? In my understanding, the cl_io_lock() should have finished before doing ccc_prep_size() in the same I/O.

Comment by Andriy Skulysh [ 08/Mar/16 ]

yes, we have obtained a lock in cl_io_lock() and call ccc_prep_size() which matches not granted lock from another I/O. The second lock can't be granted because it conflicts with lock from first I/O. It can be obtained for some limited range but ccc_prep_size() needs [0..EOF]

Comment by Wally Wang (Inactive) [ 12/Mar/16 ]

We have an instance of this:

crash> bt 7141
PID: 7141 TASK: ffff8808360a9800 CPU: 2 COMMAND: "wc"
#0 [ffff8807ccf87668] schedule at ffffffff8141c637
#1 [ffff8807ccf877d0] schedule_timeout at ffffffff8141d0e0
#2 [ffff8807ccf87870] ldlm_lock_match at ffffffffa0474b01 [ptlrpc]
#3 [ffff8807ccf87950] osc_enqueue_base at ffffffffa06e9970 [osc]
#4 [ffff8807ccf879e0] osc_lock_enqueue at ffffffffa0704576 [osc]
#5 [ffff8807ccf87a70] cl_enqueue_try at ffffffffa037c6eb [obdclass]
#6 [ffff8807ccf87ac0] lov_lock_enqueue at ffffffffa0799f02 [lov]
#7 [ffff8807ccf87b60] cl_enqueue_try at ffffffffa037c6eb [obdclass]
#8 [ffff8807ccf87bb0] cl_enqueue_locked at ffffffffa037d0ff [obdclass]
#9 [ffff8807ccf87bf0] cl_lock_request at ffffffffa037dd1e [obdclass]
#10 [ffff8807ccf87c50] cl_glimpse_lock at ffffffffa0869bff [lustre]
#11 [ffff8807ccf87cb0] ccc_prep_size at ffffffffa086bc3e [lustre]
#12 [ffff8807ccf87d10] vvp_io_read_start at ffffffffa087457a [lustre]
#13 [ffff8807ccf87d80] cl_io_start at ffffffffa037f4d2 [obdclass]
#14 [ffff8807ccf87db0] cl_io_loop at ffffffffa0383054 [obdclass]
#15 [ffff8807ccf87de0] ll_file_io_generic at ffffffffa0815f02 [lustre]
#16 [ffff8807ccf87e60] ll_file_aio_read at ffffffffa081669b [lustre]
#17 [ffff8807ccf87eb0] ll_file_read at ffffffffa081700a [lustre]
#18 [ffff8807ccf87f10] vfs_read at ffffffff8115c048
#19 [ffff8807ccf87f40] sys_read at ffffffff8115c205
#20 [ffff8807ccf87f80] system_call_fastpath at ffffffff81426aab
RIP: 00002aaaaad9db90 RSP: 00007fffffff2f60 RFLAGS: 00000206
RAX: 0000000000000000 RBX: ffffffff81426aab RCX: 0000000000000002
RDX: 0000000000004000 RSI: 00007fffffff2fd0 RDI: 0000000000000003
RBP: 0000000000000000 R8: 00007fffffff6a82 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000021ec000
R13: 0000000000000003 R14: 00007fffffff2fd0 R15: 0000000000004000
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
crash>

The lock matched is conflicting with the lock it held in cl_io_lock before coming to this point. And also because this deadlock the lock couldn't be canceled for blocking AST and the client eventually got evicted by the server.

The dump of the instance is on ftp.cray.com:/outbound/835240-logs-021116.tar.bz2. The client is Lustre 2.5.

Comment by Gerrit Updater [ 23/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18738/
Subject: LU-7829 osc: glimpse lock should match only with granted locks
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a9b3317360261b9a694a445e4dac1d3fa61664fd

Comment by Joseph Gmitter (Inactive) [ 23/Mar/16 ]

Landed for 2.9.0

Generated at Sat Feb 10 02:12:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.