Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10985

Attempting to send a mkdir create intents crashes server

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0, Lustre 2.10.7
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      Testing WBC code against unpatched 2.11 servers it could be easily observed that sending a create intent (a valid intent handled by mdt_intent_reint ) crashes mdt with

      [  850.056294] LustreError: 2568:0:(layout.c:2398:req_capsule_extend()) ASSERTION( fmt->rf_fields[i].nr >= old->rf_fields[i].nr ) failed: 
      [  850.058033] LustreError: 2568:0:(layout.c:2398:req_capsule_extend()) LBUG
      [  850.058796] Pid: 2568, comm: mdt01_002
      [  850.059467] 
      Call Trace:
      [  850.060682]  [<ffffffffa01ab7ce>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [  850.061433]  [<ffffffffa01ab85c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [  850.062203]  [<ffffffffa05b32a9>] req_capsule_extend+0x159/0x1c0 [ptlrpc]
      [  850.062920]  [<ffffffffa0c5d237>] mdt_create_unpack+0x157/0x4b0 [mdt]
      [  850.063630]  [<ffffffffa0c5dd78>] mdt_reint_unpack+0xa8/0x210 [mdt]
      [  850.064290]  [<ffffffffa0c4824f>] mdt_reint_internal+0x3f/0x990 [mdt]
      [  850.064992]  [<ffffffffa0c54bc7>] mdt_intent_reint+0x157/0x420 [mdt]
      [  850.065693]  [<ffffffffa0c4b8e2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [  850.066381]  [<ffffffffa058fdd0>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [  850.067065]  [<ffffffffa0c533b6>] mdt_intent_policy+0x1a6/0x360 [mdt]
      [  850.067786]  [<ffffffffa053ed63>] ldlm_lock_enqueue+0x363/0xa40 [ptlrpc]
      [  850.068160]  [<ffffffffa01bcb05>] ? cfs_hash_rw_unlock+0x15/0x20 [libcfs]
      [  850.068554]  [<ffffffffa01bfe96>] ? cfs_hash_add+0xa6/0x180 [libcfs]
      [  850.068958]  [<ffffffffa05671a3>] ldlm_handle_enqueue0+0x933/0x1540 [ptlrpc]
      [  850.069354]  [<ffffffffa058fe50>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc]
      [  850.070049]  [<ffffffffa05edd72>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [  850.070493]  [<ffffffffa05f424b>] tgt_request_handle+0xb1b/0x15c0 [ptlrpc]
      [  850.070889]  [<ffffffffa01b76a7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [  850.071272]  [<ffffffffa05995b1>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [  850.071964]  [<ffffffffa059d3ce>] ptlrpc_main+0xabe/0x1fd0 [ptlrpc]
      [  850.072373]  [<ffffffff810af904>] ? finish_task_switch+0x44/0x180
      [  850.072758]  [<ffffffff81703c00>] ? __schedule+0x240/0x950
      [  850.073150]  [<ffffffffa059c910>] ? ptlrpc_main+0x0/0x1fd0 [ptlrpc]
      [  850.073545]  [<ffffffff810a2eda>] kthread+0xea/0xf0
      [  850.074636]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
      [  850.074997]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
      [  850.075354]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
      

      This actually highlights even bigger problem with this assertion, I think since it does allow various ill-formed requests to cause crashes too.

      Anyway, the specific problem here is lack of "RMF_EADATA" component in the pill selected which is RQF_MDS_REINT_CREATE_ACL, but in reality that's only valid for a regular reint RPC, the intent RPCs already get their capsules extended as part of ldlm processing (and obviously they are not happy we are changing the format) so we can totally skip this step for intents.

      The other problem once we overcome this one is mdt_reint_create unconditionally assumes that any request with ldlm handle in it (determined by info->mti_dlm_req set) is ELC cancel
      request and calls ldlm_request_cancel right away. Which is great for normal reint requests, but crashes for intent requests because the lock handle provided is not yet granted or not properly referenced or some such.

      As such we really need to rework the current intent-create logic not to crash right away.

      Attachments

        Activity

          [LU-10985] Attempting to send a mkdir create intents crashes server

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32521/
          Subject: LU-10985 mdt: properly handle unknown intent requests
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set:
          Commit: 179bf9a009cd27b0055e23c1478d7b298833ce35

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32521/ Subject: LU-10985 mdt: properly handle unknown intent requests Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 179bf9a009cd27b0055e23c1478d7b298833ce35

          Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32521
          Subject: LU-10985 mdt: properly handle unknown intent requests
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set: 1
          Commit: ac535c47902875ac6c7ec7312f9f1ef7526614a0

          gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32521 Subject: LU-10985 mdt: properly handle unknown intent requests Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: ac535c47902875ac6c7ec7312f9f1ef7526614a0
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32237/
          Subject: LU-10985 mdt: properly handle unknown intent requests
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6a39600f641cc3e179b0149af5ff17ba44d2319f

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32237/ Subject: LU-10985 mdt: properly handle unknown intent requests Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6a39600f641cc3e179b0149af5ff17ba44d2319f

          Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32237
          Subject: LU-10985 mdt: properly handle unknown intent requests
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: edec9a5693ca0c749009ff94c5f75abf2bf00679

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/32237 Subject: LU-10985 mdt: properly handle unknown intent requests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: edec9a5693ca0c749009ff94c5f75abf2bf00679

          People

            green Oleg Drokin
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: