Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2827

mdt_intent_fixup_resent() cannot find the proper lock in hash

Details

    • 3
    • 6847

    Description

      If a successful reply is lost for an intent lock request, MDS will not correctly recover from this situation on resend.

      The cause of which seems to be the code of ldlm_handle_enqueue0() and mdt_intent_fixup_resent()

       int ldlm_handle_enqueue0(struct ldlm_namespace *ns,
                               struct ptlrpc_request *req,
                               const struct ldlm_request *dlm_req,
                               const struct ldlm_callback_suite *cbs)
      {
      ...
              /* The lock's callback data might be set in the policy function */
              lock = ldlm_lock_create(ns, &dlm_req->lock_desc.l_resource.lr_name,
                                      dlm_req->lock_desc.l_resource.lr_type,
                                      dlm_req->lock_desc.l_req_mode,
                                      cbs, NULL, 0);
      ...
              lock->l_export = class_export_lock_get(req->rq_export, lock);
              if (lock->l_export->exp_lock_hash) {
                      cfs_hash_add(lock->l_export->exp_lock_hash,
                                   &lock->l_remote_handle,
                                   &lock->l_exp_hash);
              }
      ...
              err = ldlm_lock_enqueue(ns, &lock, cookie, &flags);
      ...
      }
      
      static void mdt_intent_fixup_resent(struct mdt_thread_info *info,
                                          struct ldlm_lock *new_lock,
                                          struct ldlm_lock **old_lock,
                                          struct mdt_lock_handle *lh)
      {
              struct ptlrpc_request  *req = mdt_info_req(info);
              struct obd_export      *exp = req->rq_export;
              struct lustre_handle    remote_hdl;
              struct ldlm_request    *dlmreq;
              struct ldlm_lock       *lock;
      
              if (!(lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT))
                      return;
      
              dlmreq = req_capsule_client_get(info->mti_pill, &RMF_DLM_REQ);
              remote_hdl = dlmreq->lock_handle[0];
      
              lock = cfs_hash_lookup(exp->exp_lock_hash, &remote_hdl);
              if (lock) {
                      if (lock != new_lock) {
      ...
      }
      

      On resend, ldlm_handle_enqueue0() add the new lock into hash even though there's already a granted lock with the same remote handle. mdt_intent_fixup_resent() will find the newly added lock in hash and ignore it. This will cause to an enqueue request on the newly created lock, deadlock and client eviction.

      Alexey thinks that the problem has existed since we moved from the correct code:

      static void fixup_handle_for_resent_req(struct ptlrpc_request *req, int offset,
                                              struct ldlm_lock *new_lock,
                                              struct ldlm_lock **old_lock,
                                              struct lustre_handle *lockh)
      {
              struct obd_export *exp = req->rq_export;
              struct ldlm_request *dlmreq =
                      lustre_msg_buf(req->rq_reqmsg, offset, sizeof(*dlmreq));
              struct lustre_handle remote_hdl = dlmreq->lock_handle[0];
              struct list_head *iter;
      
              if (!(lustre_msg_get_flags(req->rq_reqmsg) & MSG_RESENT))
                      return;
      
              spin_lock(&exp->exp_ldlm_data.led_lock);
              list_for_each(iter, &exp->exp_ldlm_data.led_held_locks) {
                      struct ldlm_lock *lock;
                      lock = list_entry(iter, struct ldlm_lock, l_export_chain);
                      if (lock == new_lock)
                              continue; <==================== N.B.
                      if (lock->l_remote_handle.cookie == remote_hdl.cookie) {
                              lockh->cookie = lock->l_handle.h_cookie;
                              LDLM_DEBUG(lock, "restoring lock cookie");
                              DEBUG_REQ(D_DLMTRACE, req,"restoring lock cookie "LPX64,
                                        lockh->cookie);
                              if (old_lock)
                                      *old_lock = LDLM_LOCK_GET(lock);
                              spin_unlock(&exp->exp_ldlm_data.led_lock);
                              return;
                      }
              }
      ...
      }
      

      Logs for this issue will follow.

      Attachments

        Issue Links

          Activity

            [LU-2827] mdt_intent_fixup_resent() cannot find the proper lock in hash

            need to included fix from LU-5530 with fix from LU-2827

            apercher Antoine Percher added a comment - need to included fix from LU-5530 with fix from LU-2827

            I should have commented this before, sorry.
            In fact there are regression issues with my b2_4 back-port (http://review.whamcloud.com/#/c/10902/) of LU-2827 changes. And I checked that the b2_5 version (http://review.whamcloud.com/#/c/10492/) is ok and that it should land soon now.

            bfaccini Bruno Faccini (Inactive) added a comment - I should have commented this before, sorry. In fact there are regression issues with my b2_4 back-port ( http://review.whamcloud.com/#/c/10902/ ) of LU-2827 changes. And I checked that the b2_5 version ( http://review.whamcloud.com/#/c/10492/ ) is ok and that it should land soon now.
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            The merge was done to avoid the same miss with original patch where mdt_intent_layout() had been forgotten because not present in Xyratex source-tree at this time.

            BTW, my b2_4 patch/back-port has some problem and needs some re-work, because MDS bombs with "(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_readers > 0 ) failed" when running LLNL reproducer from LU-4584 or recovery-small/test_53 in auto-tests.
            More to come, crash-dump is under investigations ...

            bfaccini Bruno Faccini (Inactive) added a comment - - edited The merge was done to avoid the same miss with original patch where mdt_intent_layout() had been forgotten because not present in Xyratex source-tree at this time. BTW, my b2_4 patch/back-port has some problem and needs some re-work, because MDS bombs with "(ldlm_lock.c:851:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_readers > 0 ) failed" when running LLNL reproducer from LU-4584 or recovery-small/test_53 in auto-tests. More to come, crash-dump is under investigations ...

            We would really, really prefer that you guys not merge together multiple patches when backporting. That makes sanity checking and rebasing quite a bit more complicated for us.

            morrone Christopher Morrone (Inactive) added a comment - We would really, really prefer that you guys not merge together multiple patches when backporting. That makes sanity checking and rebasing quite a bit more complicated for us.

            Merged b2_4 backport of both #5978 and #10378 master changes for this ticket, is at http://review.whamcloud.com/10902.

            bfaccini Bruno Faccini (Inactive) added a comment - Merged b2_4 backport of both #5978 and #10378 master changes for this ticket, is at http://review.whamcloud.com/10902 .
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            jlevi Jodi Levi (Inactive) added a comment - - edited http://review.whamcloud.com/#/c/5978/ http://review.whamcloud.com/#/c/10378/

            I tested master with the patches as well and got the same results.

            simmonsja James A Simmons added a comment - I tested master with the patches as well and got the same results.
            simmonsja James A Simmons added a comment - - edited

            With the patches the evictions are gone which is good. Now its just locking up.

            simmonsja James A Simmons added a comment - - edited With the patches the evictions are gone which is good. Now its just locking up.

            I am afraid I am not able to conclude about what is going wrong, using this only MDS Lustre debug-log. Are there still similar stacks being dumped in dmesg/syslog ?

            Also, I did not find any eviction trace in this log.

            bfaccini Bruno Faccini (Inactive) added a comment - I am afraid I am not able to conclude about what is going wrong, using this only MDS Lustre debug-log. Are there still similar stacks being dumped in dmesg/syslog ? Also, I did not find any eviction trace in this log.

            IOR single shared file run is producing the below messages. Also I have a log dumps that I uploaded to

            ftp.whamcloud.com/uploads/LU-2827/lustre-log-LU-2827-29-05-2014

            May 29 18:47:39 atlas-tds-mds1 kernel: [12046.175064] Lustre: 17065:0:(service.c:1349:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages
            May 29 18:47:45 atlas-tds-mds1 kernel: [12052.188755] Lustre: atlastds-MDT0000: Client 23d4a774-8c3b-4edc-3b98-24d112966233 (at 14@gni2) reconnecting
            May 29 18:47:45 atlas-tds-mds1 kernel: [12052.199379] Lustre: Skipped 1 previous similar message
            May 29 19:00:15 atlas-tds-mds1 kernel: [12801.706037] Lustre: 17829:0:(service.c:1349:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
            May 29 19:00:15 atlas-tds-mds1 kernel: [12801.706039] req@ffff8807e74adc00 x1468567563329976/t0(0) o101->73ecb9e8-d229-752a-3cfc-2036f509fb5b@50@gni2:0/0 lens 512/3512 e 0 to 0 dl 1401404420 ref 2 fl Interpret:/0/0 rc 0/0
            May 29 19:00:15 atlas-tds-mds1 kernel: [12802.233175] Lustre: 17062:0:(service.c:1349:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply
            May 29 19:00:15 atlas-tds-mds1 kernel: [12802.233176] req@ffff8807e76d6400 x1468567565457596/t0(0) o101->e8d3ca06-b910-df85-78e9-5653469b4bb8@81@gni2:0/0 lens 512/3512 e 0 to 0 dl 1401404420 ref 2 fl Interpret:/0/0 rc 0/0
            May 29 19:00:15 atlas-tds-mds1 kernel: [12802.263339] Lustre: 17062:0:(service.c:1349:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages
            May 29 19:00:21 atlas-tds-mds1 kernel: [12808.519392] Lustre: atlastds-MDT0000: Client 568959a5-c11b-5ded-e1c9-1c90892a82ca (at 49@gni2) reconnecting

            simmonsja James A Simmons added a comment - IOR single shared file run is producing the below messages. Also I have a log dumps that I uploaded to ftp.whamcloud.com/uploads/ LU-2827 /lustre-log- LU-2827 -29-05-2014 May 29 18:47:39 atlas-tds-mds1 kernel: [12046.175064] Lustre: 17065:0:(service.c:1349:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages May 29 18:47:45 atlas-tds-mds1 kernel: [12052.188755] Lustre: atlastds-MDT0000: Client 23d4a774-8c3b-4edc-3b98-24d112966233 (at 14@gni2) reconnecting May 29 18:47:45 atlas-tds-mds1 kernel: [12052.199379] Lustre: Skipped 1 previous similar message May 29 19:00:15 atlas-tds-mds1 kernel: [12801.706037] Lustre: 17829:0:(service.c:1349:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply May 29 19:00:15 atlas-tds-mds1 kernel: [12801.706039] req@ffff8807e74adc00 x1468567563329976/t0(0) o101->73ecb9e8-d229-752a-3cfc-2036f509fb5b@50@gni2:0/0 lens 512/3512 e 0 to 0 dl 1401404420 ref 2 fl Interpret:/0/0 rc 0/0 May 29 19:00:15 atlas-tds-mds1 kernel: [12802.233175] Lustre: 17062:0:(service.c:1349:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply May 29 19:00:15 atlas-tds-mds1 kernel: [12802.233176] req@ffff8807e76d6400 x1468567565457596/t0(0) o101->e8d3ca06-b910-df85-78e9-5653469b4bb8@81@gni2:0/0 lens 512/3512 e 0 to 0 dl 1401404420 ref 2 fl Interpret:/0/0 rc 0/0 May 29 19:00:15 atlas-tds-mds1 kernel: [12802.263339] Lustre: 17062:0:(service.c:1349:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 29 19:00:21 atlas-tds-mds1 kernel: [12808.519392] Lustre: atlastds-MDT0000: Client 568959a5-c11b-5ded-e1c9-1c90892a82ca (at 49@gni2) reconnecting

            People

              yujian Jian Yu
              panda Andrew Perepechko
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: