[LU-10985] Attempting to send a mkdir create intents crashes server Created: 02/May/18 Updated: 21/Jan/19 Resolved: 06/May/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.10.7 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Oleg Drokin | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Testing WBC code against unpatched 2.11 servers it could be easily observed that sending a create intent (a valid intent handled by mdt_intent_reint ) crashes mdt with [ 850.056294] LustreError: 2568:0:(layout.c:2398:req_capsule_extend()) ASSERTION( fmt->rf_fields[i].nr >= old->rf_fields[i].nr ) failed: [ 850.058033] LustreError: 2568:0:(layout.c:2398:req_capsule_extend()) LBUG [ 850.058796] Pid: 2568, comm: mdt01_002 [ 850.059467] Call Trace: [ 850.060682] [<ffffffffa01ab7ce>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 850.061433] [<ffffffffa01ab85c>] lbug_with_loc+0x4c/0xb0 [libcfs] [ 850.062203] [<ffffffffa05b32a9>] req_capsule_extend+0x159/0x1c0 [ptlrpc] [ 850.062920] [<ffffffffa0c5d237>] mdt_create_unpack+0x157/0x4b0 [mdt] [ 850.063630] [<ffffffffa0c5dd78>] mdt_reint_unpack+0xa8/0x210 [mdt] [ 850.064290] [<ffffffffa0c4824f>] mdt_reint_internal+0x3f/0x990 [mdt] [ 850.064992] [<ffffffffa0c54bc7>] mdt_intent_reint+0x157/0x420 [mdt] [ 850.065693] [<ffffffffa0c4b8e2>] mdt_intent_opc+0x442/0xad0 [mdt] [ 850.066381] [<ffffffffa058fdd0>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [ 850.067065] [<ffffffffa0c533b6>] mdt_intent_policy+0x1a6/0x360 [mdt] [ 850.067786] [<ffffffffa053ed63>] ldlm_lock_enqueue+0x363/0xa40 [ptlrpc] [ 850.068160] [<ffffffffa01bcb05>] ? cfs_hash_rw_unlock+0x15/0x20 [libcfs] [ 850.068554] [<ffffffffa01bfe96>] ? cfs_hash_add+0xa6/0x180 [libcfs] [ 850.068958] [<ffffffffa05671a3>] ldlm_handle_enqueue0+0x933/0x1540 [ptlrpc] [ 850.069354] [<ffffffffa058fe50>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [ 850.070049] [<ffffffffa05edd72>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 850.070493] [<ffffffffa05f424b>] tgt_request_handle+0xb1b/0x15c0 [ptlrpc] [ 850.070889] [<ffffffffa01b76a7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 850.071272] [<ffffffffa05995b1>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [ 850.071964] [<ffffffffa059d3ce>] ptlrpc_main+0xabe/0x1fd0 [ptlrpc] [ 850.072373] [<ffffffff810af904>] ? finish_task_switch+0x44/0x180 [ 850.072758] [<ffffffff81703c00>] ? __schedule+0x240/0x950 [ 850.073150] [<ffffffffa059c910>] ? ptlrpc_main+0x0/0x1fd0 [ptlrpc] [ 850.073545] [<ffffffff810a2eda>] kthread+0xea/0xf0 [ 850.074636] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [ 850.074997] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [ 850.075354] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 This actually highlights even bigger problem with this assertion, I think since it does allow various ill-formed requests to cause crashes too. Anyway, the specific problem here is lack of "RMF_EADATA" component in the pill selected which is RQF_MDS_REINT_CREATE_ACL, but in reality that's only valid for a regular reint RPC, the intent RPCs already get their capsules extended as part of ldlm processing (and obviously they are not happy we are changing the format) so we can totally skip this step for intents. The other problem once we overcome this one is mdt_reint_create unconditionally assumes that any request with ldlm handle in it (determined by info->mti_dlm_req set) is ELC cancel As such we really need to rework the current intent-create logic not to crash right away. |
| Comments |
| Comment by Gerrit Updater [ 02/May/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32237 |
| Comment by Gerrit Updater [ 06/May/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32237/ |
| Comment by Peter Jones [ 06/May/18 ] |
|
Landed for 2.12 |
| Comment by Gerrit Updater [ 23/May/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32521 |
| Comment by Gerrit Updater [ 19/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32521/ |