[LU-4320] MDS_SWAP_LAYOUTS handler does not handle ELC locks from client Created: 27/Nov/13  Updated: 22/Oct/14  Resolved: 17/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Critical
Reporter: Andreas Dilger Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: HSM

Severity: 3
Rank (Obsolete): 11818

 Description   

It appears that the MDS_SWAP_LAYOUTS handler (mdt_swap_layouts()) does not handle Early Lock Cancellation (ELC) locks that are packed into the RPC by the client calling mdc_prep_elc_req() in mdc_ioc_swap_layouts(). This can be seen in the debug log by LDLM_BL_CALLBACK (104) RPCs for the two inodes on which the layouts are being swapped (IGIF FID [0x24f7ac:0x2d77b0e5:0x0] and a volatile file FID [0x2000061c2:0x4:0x0], edited client log with just the important bits):

00000002:00000001:1.0:1385592739.077027:0:1247:0:(mdc_request.c:1861:mdc_iocontrol()) Process entered
00000002:00000001:1.0:1385592739.077032:0:1247:0:(mdc_request.c:1804:mdc_ioc_swap_layouts()) Process entered
00000002:00000001:1.0:1385592739.077040:0:1247:0:(mdc_reint.c:82:mdc_resource_get_unused()) Process entered
00010000:00000001:1.0:1385592739.077046:0:1247:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=102506500 : 102506500 : 61c2004)
00000002:00000001:1.0:1385592739.077091:0:1247:0:(mdc_reint.c:105:mdc_resource_get_unused()) Process leaving (rc=0 : 0 : 0)
00000100:00100000:1.0:1385592739.077320:0:1247:0:(client.c:1469:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfs:db82027d-ee92-960a-2336-a3af6ea37680:1247:1452037295952792:192.168.20.1@tcp:61
00000100:00000200:1.0:1385592739.077553:0:19664:0:(events.c:68:request_out_callback()) @@@ type 5, status 0  req@ffff88003bfdb400 x1452037295952792/t0(0) o61->myth-MDT0000-mdc-ffff8800beb8e000@192.168.20.1@tcp:12/10 lens 568/224 e 0 to 0 dl 1385592746 ref 3 fl Rpc:/0/ffffffff rc 0/-1
00000100:00100000:1.0:1385592739.078146:0:20936:0:(service.c:2011:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb01_000:LMV_MDC_UUID+6:24193:x1451116687256284:12345-192.168.20.1@tcp:104
00010000:00010000:1.0:1385592739.078484:0:21569:0:(ldlm_lockd.c:1654:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 2/0,0 mode: CR/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x420000000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3
00010000:00010000:1.0:1385592739.078551:0:21569:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 3/0,0 mode: CR/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x428400000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3
00010000:00010000:1.0:1385592739.078920:0:21569:0:(ldlm_request.c:1186:ldlm_cancel_pack()) ### packing ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 2/0,0 mode: --/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x4c29400000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3
00000100:00100000:0.0:1385592739.079019:0:19668:0:(client.c:1469:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_0:db82027d-ee92-960a-2336-a3af6ea37680:19668:1452037295952796:192.168.20.1@tcp:103
00010000:00010000:1.0:1385592739.079041:0:21569:0:(ldlm_lockd.c:1676:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 1/0,0 mode: --/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x4c29400000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3
00000100:00100000:1.0:1385592739.079979:0:20936:0:(service.c:2011:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb01_000:LMV_MDC_UUID+6:24193:x1451116687256288:12345-192.168.20.1@tcp:104
00010000:00010000:1.0:1385592739.080433:0:21567:0:(ldlm_lockd.c:1654:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff880045f39080/0x1b40bffdae6086a5 lrc: 2/0,0 mode: CR/CR res: [0x24f7ac:0x2d77b0e5:0x0].0 bits 0x9 rrc: 1 type: IBT flags: 0x420000000000 nid: local remote: 0x1ac79fd0e006fb16 expref: -99 pid: 1240 timeout: 0 lvb_type: 0
00010000:00010000:1.0:1385592739.080518:0:21567:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff880045f39080/0x1b40bffdae6086a5 lrc: 3/0,0 mode: CR/CR res: [0x24f7ac:0x2d77b0e5:0x0].0 bits 0x9 rrc: 1 type: IBT flags: 0x428400000000 nid: local remote: 0x1ac79fd0e006fb16 expref: -99 pid: 1240 timeout: 0 lvb_type: 0
00000100:00100000:0.0:1385592739.081065:0:19668:0:(client.c:1469:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_0:db82027d-ee92-960a-2336-a3af6ea37680:19668:1452037295952800:192.168.20.1@tcp:103
00010000:00010000:1.0:1385592739.081083:0:21567:0:(ldlm_lockd.c:1676:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff880045f39080/0x1b40bffdae6086a5 lrc: 1/0,0 mode: --/CR res: [0x24f7ac:0x2d77b0e5:0x0].0 bits 0x9 rrc: 1 type: IBT flags: 0x4c29400000000 nid: local remote: 0x1ac79fd0e006fb16 expref: -99 pid: 1240 timeout: 0 lvb_type: 0
00000100:00000200:1.0:1385592739.081890:0:1247:0:(events.c:120:reply_in_callback()) @@@ unlink  req@ffff88003bfdb400 x1452037295952792/t0(0) o61->myth-MDT0000-mdc-ffff8800beb8e000@192.168.20.1@tcp:12/10 lens 568/224 e 0 to 0 dl 1385592746 ref 2 fl Rpc:R/0/ffffffff rc 0/-1
00000100:00100000:1.0:1385592739.082096:0:1247:0:(client.c:1834:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfs:db82027d-ee92-960a-2336-a3af6ea37680:1247:1452037295952792:192.168.20.1@tcp:61
00000002:00000001:1.0:1385592739.082248:0:1247:0:(mdc_request.c:1977:mdc_iocontrol()) Process leaving via out (rc=18446744073709551615 : -1 : 0xffffffffffffffff)

In this case, the layout swap failed with LU-4293, but I think this bug is independent of that one.

It looks like mdt_swap_layouts() needs to call mdt_dlmreq_unpack() as mdt_reint_unpack_*() do, and then if (info->mti_dlm_req != NULL) call ldlm_request_cancel(mdt_info_req(info), info->mti_dlm_req, 0).



 Comments   
Comment by Bruno Faccini (Inactive) [ 06/Jan/14 ]

Hello Andreas, since after LU-3834/LU-4293 I am "hot" on layouts-swap code I will try to work on this ticket and according to the direction you already indicated. Just for my info, how did you discover this problem ?? Simply during LU-4293 debugging, when keeping track of Client/Server debug-logs analysis and regarding associated source code ?

Comment by Andreas Dilger [ 08/Jan/14 ]

I just saw this in the debug logs while trying to find out why LU-4239 was failing.

Comment by Jodi Levi (Inactive) [ 19/Feb/14 ]

John,
Could you please have a look at Andreas' comments in the description and see if this is something you could quickly complete?
If you have additional questions, feel free to reach out to Oleg.
Thank you!

Comment by John Hammond [ 20/Feb/14 ]

Hi Jodi,

Sure. It seems to be just that LCK_CR is used rather than LCK_EX when calling mdc_resource_get_unused() from mdc_ioc_swap_layouts(). I'll check this and push a patch in the morning.

Comment by John Hammond [ 20/Feb/14 ]

Please see http://review.whamcloud.com/9329.

Comment by Jodi Levi (Inactive) [ 17/Mar/14 ]

Patch landed to Master. Please reopen this ticket if more work is needed.

Comment by Di Wang [ 29/May/14 ]

Hmm, this needs to be landed to 2.5 as well, are there other tickets to track it?

Comment by Andreas Dilger [ 30/May/14 ]

Landing patches to maintenance branches is tracked separately after the bug is closed, but we do need to know that it is needed on b2_5.

Generated at Sat Feb 10 01:41:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.