Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.5.0, Lustre 2.6.0
-
3
-
11818
Description
It appears that the MDS_SWAP_LAYOUTS handler (mdt_swap_layouts()) does not handle Early Lock Cancellation (ELC) locks that are packed into the RPC by the client calling mdc_prep_elc_req() in mdc_ioc_swap_layouts(). This can be seen in the debug log by LDLM_BL_CALLBACK (104) RPCs for the two inodes on which the layouts are being swapped (IGIF FID [0x24f7ac:0x2d77b0e5:0x0] and a volatile file FID [0x2000061c2:0x4:0x0], edited client log with just the important bits):
00000002:00000001:1.0:1385592739.077027:0:1247:0:(mdc_request.c:1861:mdc_iocontrol()) Process entered 00000002:00000001:1.0:1385592739.077032:0:1247:0:(mdc_request.c:1804:mdc_ioc_swap_layouts()) Process entered 00000002:00000001:1.0:1385592739.077040:0:1247:0:(mdc_reint.c:82:mdc_resource_get_unused()) Process entered 00010000:00000001:1.0:1385592739.077046:0:1247:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=102506500 : 102506500 : 61c2004) 00000002:00000001:1.0:1385592739.077091:0:1247:0:(mdc_reint.c:105:mdc_resource_get_unused()) Process leaving (rc=0 : 0 : 0) 00000100:00100000:1.0:1385592739.077320:0:1247:0:(client.c:1469:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfs:db82027d-ee92-960a-2336-a3af6ea37680:1247:1452037295952792:192.168.20.1@tcp:61 00000100:00000200:1.0:1385592739.077553:0:19664:0:(events.c:68:request_out_callback()) @@@ type 5, status 0 req@ffff88003bfdb400 x1452037295952792/t0(0) o61->myth-MDT0000-mdc-ffff8800beb8e000@192.168.20.1@tcp:12/10 lens 568/224 e 0 to 0 dl 1385592746 ref 3 fl Rpc:/0/ffffffff rc 0/-1 00000100:00100000:1.0:1385592739.078146:0:20936:0:(service.c:2011:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb01_000:LMV_MDC_UUID+6:24193:x1451116687256284:12345-192.168.20.1@tcp:104 00010000:00010000:1.0:1385592739.078484:0:21569:0:(ldlm_lockd.c:1654:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 2/0,0 mode: CR/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x420000000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3 00010000:00010000:1.0:1385592739.078551:0:21569:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 3/0,0 mode: CR/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x428400000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3 00010000:00010000:1.0:1385592739.078920:0:21569:0:(ldlm_request.c:1186:ldlm_cancel_pack()) ### packing ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 2/0,0 mode: --/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x4c29400000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3 00000100:00100000:0.0:1385592739.079019:0:19668:0:(client.c:1469:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_0:db82027d-ee92-960a-2336-a3af6ea37680:19668:1452037295952796:192.168.20.1@tcp:103 00010000:00010000:1.0:1385592739.079041:0:21569:0:(ldlm_lockd.c:1676:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff8800c1dea500/0x1b40bffdae6086d6 lrc: 1/0,0 mode: --/CR res: [0x2000061c2:0x4:0x0].0 bits 0x8 rrc: 1 type: IBT flags: 0x4c29400000000 nid: local remote: 0x1ac79fd0e006fbef expref: -99 pid: 1247 timeout: 0 lvb_type: 3 00000100:00100000:1.0:1385592739.079979:0:20936:0:(service.c:2011:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb01_000:LMV_MDC_UUID+6:24193:x1451116687256288:12345-192.168.20.1@tcp:104 00010000:00010000:1.0:1385592739.080433:0:21567:0:(ldlm_lockd.c:1654:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff880045f39080/0x1b40bffdae6086a5 lrc: 2/0,0 mode: CR/CR res: [0x24f7ac:0x2d77b0e5:0x0].0 bits 0x9 rrc: 1 type: IBT flags: 0x420000000000 nid: local remote: 0x1ac79fd0e006fb16 expref: -99 pid: 1240 timeout: 0 lvb_type: 0 00010000:00010000:1.0:1385592739.080518:0:21567:0:(ldlm_request.c:1127:ldlm_cli_cancel_local()) ### client-side cancel ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff880045f39080/0x1b40bffdae6086a5 lrc: 3/0,0 mode: CR/CR res: [0x24f7ac:0x2d77b0e5:0x0].0 bits 0x9 rrc: 1 type: IBT flags: 0x428400000000 nid: local remote: 0x1ac79fd0e006fb16 expref: -99 pid: 1240 timeout: 0 lvb_type: 0 00000100:00100000:0.0:1385592739.081065:0:19668:0:(client.c:1469:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_0:db82027d-ee92-960a-2336-a3af6ea37680:19668:1452037295952800:192.168.20.1@tcp:103 00010000:00010000:1.0:1385592739.081083:0:21567:0:(ldlm_lockd.c:1676:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: myth-MDT0000-mdc-ffff8800beb8e000 lock: ffff880045f39080/0x1b40bffdae6086a5 lrc: 1/0,0 mode: --/CR res: [0x24f7ac:0x2d77b0e5:0x0].0 bits 0x9 rrc: 1 type: IBT flags: 0x4c29400000000 nid: local remote: 0x1ac79fd0e006fb16 expref: -99 pid: 1240 timeout: 0 lvb_type: 0 00000100:00000200:1.0:1385592739.081890:0:1247:0:(events.c:120:reply_in_callback()) @@@ unlink req@ffff88003bfdb400 x1452037295952792/t0(0) o61->myth-MDT0000-mdc-ffff8800beb8e000@192.168.20.1@tcp:12/10 lens 568/224 e 0 to 0 dl 1385592746 ref 2 fl Rpc:R/0/ffffffff rc 0/-1 00000100:00100000:1.0:1385592739.082096:0:1247:0:(client.c:1834:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lfs:db82027d-ee92-960a-2336-a3af6ea37680:1247:1452037295952792:192.168.20.1@tcp:61 00000002:00000001:1.0:1385592739.082248:0:1247:0:(mdc_request.c:1977:mdc_iocontrol()) Process leaving via out (rc=18446744073709551615 : -1 : 0xffffffffffffffff)
In this case, the layout swap failed with LU-4293, but I think this bug is independent of that one.
It looks like mdt_swap_layouts() needs to call mdt_dlmreq_unpack() as mdt_reint_unpack_*() do, and then if (info->mti_dlm_req != NULL) call ldlm_request_cancel(mdt_info_req(info), info->mti_dlm_req, 0).