Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.1.6
    • RHEL 6 with Bull kernel based on 2.6.32-279.5.2
    • 3
    • 9223372036854775807

    Description

      A MDS has encountered several times a deadlock where a process seems to have acquired the superblock dqptr_sem semaphore and left without releasing it.

      When looking at a dump taken during this deadlock, we can see this:

      • most of the processes are in jbd2_journal_start() -> start_this_handle() -> schedule(), waiting for the current transaction to be finished.
      • the process initiating the journal_stop is found in this state:
      PID: 1479   TASK: ffff880cefe9e7f0  CPU: 5   COMMAND: "mdt_279"
       #0 [ffff8810093538a0] schedule at ffffffff814965a5
       #1 [ffff881009353968] jbd2_log_wait_commit at ffffffffa00b9e55 [jbd2]
       #2 [ffff8810093539f8] jbd2_journal_stop at ffffffffa00b1b6b [jbd2]
       #3 [ffff881009353a58] __ldiskfs_journal_stop at ffffffffa05c7808 [ldiskfs]
       #4 [ffff881009353a88] osd_trans_stop at ffffffffa0e86b35 [osd_ldiskfs]
       #5 [ffff881009353ab8] mdd_trans_stop at ffffffffa0d8b4aa [mdd]
       #6 [ffff881009353ac8] mdd_attr_set at ffffffffa0d6aa5f [mdd]
       #7 [ffff881009353ba8] cml_attr_set at ffffffffa0ec3a86 [cmm]
       #8 [ffff881009353bd8] mdt_attr_set at ffffffffa0dfe418 [mdt]
       #9 [ffff881009353c28] mdt_reint_setattr at ffffffffa0dfea65 [mdt]
      #10 [ffff881009353cb8] mdt_reint_rec at ffffffffa0df7cb1 [mdt]
      #11 [ffff881009353cd8] mdt_reint_internal at ffffffffa0deeed4 [mdt]
      #12 [ffff881009353d28] mdt_reint at ffffffffa0def2b4 [mdt]
      #13 [ffff881009353d48] mdt_handle_common at ffffffffa0de3762 [mdt]
      #14 [ffff881009353d98] mdt_regular_handle at ffffffffa0de4655 [mdt]
      #15 [ffff881009353da8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc]
      #16 [ffff881009353f48] kernel_thread at ffffffff8100412a
      
      • This process is itself waiting for a process to commit current transaction. This process is then in this state:
        PID: 6539   TASK: ffff880869e910c0  CPU: 14  COMMAND: "mdt_47"
         #0 [ffff880834bbb490] schedule at ffffffff814965a5
         #1 [ffff880834bbb558] rwsem_down_failed_common at ffffffff81498ba5
         #2 [ffff880834bbb5b8] rwsem_down_write_failed at ffffffff81498d23
         #3 [ffff880834bbb5f8] call_rwsem_down_write_failed at ffffffff812689c3
         #4 [ffff880834bbb658] dquot_initialize at ffffffff811c6cfb
         #5 [ffff880834bbb6c8] ldiskfs_dquot_initialize at ffffffffa05c7c44 [ldiskfs]
         #6 [ffff880834bbb6f8] osd_oi_iam_refresh at ffffffffa0e8ef3f [osd_ldiskfs]
         #7 [ffff880834bbb758] osd_oi_insert at ffffffffa0e8f532 [osd_ldiskfs]
         #8 [ffff880834bbb7d8] __osd_oi_insert at ffffffffa0e86e31 [osd_ldiskfs]
         #9 [ffff880834bbb828] osd_object_ea_create at ffffffffa0e87a5c [osd_ldiskfs]
        #10 [ffff880834bbb888] mdd_object_create_internal at ffffffffa0d61ed0 [mdd]
        #11 [ffff880834bbb8e8] mdd_create at ffffffffa0d831de [mdd]
        #12 [ffff880834bbba28] cml_create at ffffffffa0ec4407 [cmm]
        #13 [ffff880834bbba78] mdt_reint_open at ffffffffa0e0fd7f [mdt]
        #14 [ffff880834bbbb58] mdt_reint_rec at ffffffffa0df7cb1 [mdt]
        #15 [ffff880834bbbb78] mdt_reint_internal at ffffffffa0deeed4 [mdt]
        #16 [ffff880834bbbbc8] mdt_intent_reint at ffffffffa0def53d [mdt]
        #17 [ffff880834bbbc18] mdt_intent_policy at ffffffffa0dedc09 [mdt]
        #18 [ffff880834bbbc58] ldlm_lock_enqueue at ffffffffa07d93c1 [ptlrpc]
        #19 [ffff880834bbbcb8] ldlm_handle_enqueue0 at ffffffffa07ff3cd [ptlrpc]
        #20 [ffff880834bbbd28] mdt_enqueue at ffffffffa0dee586 [mdt]
        #21 [ffff880834bbbd48] mdt_handle_common at ffffffffa0de3762 [mdt]
        #22 [ffff880834bbbd98] mdt_regular_handle at ffffffffa0de4655 [mdt]
        #23 [ffff880834bbbda8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc]
        #24 [ffff880834bbbf48] kernel_thread at ffffffff8100412a
        

      We are at this point waiting in dquot_initialize() for the superblock dqptr_sem semaphore to be released.
      Unfortunately, I could not find any process in a code path where this semaphore is already acquired, and (as expected) all code blocks acquiring the semaphore properly release it before exiting the block.

      For reference, the semaphore is seen as follows:

      crash> struct rw_semaphore ffff88086b5c1180
      struct rw_semaphore {
        count = -4294967296,   # == 0xffffffff00000000
        wait_lock = {
          raw_lock = {
            slock = 2653658667
          }
        }, 
        wait_list = {
          next = 0xffff880834bbb5c0, 
          prev = 0xffff880834bbb5c0
        }
      }
      

      Given the definition and comments for rw_semaphore in include/linux/rwsem-spinlock.h below:

      /*
       * the rw-semaphore definition
       * - if activity is 0 then there are no active readers or writers
       * - if activity is +ve then that is the number of active readers
       * - if activity is -1 then there is one active writer
       * - if wait_list is not empty, then there are processes waiting for the semaphore
       */
      struct rw_semaphore {
      	__s32			activity;
      	spinlock_t		wait_lock;
      	struct list_head	wait_list;
      #ifdef CONFIG_DEBUG_LOCK_ALLOC
      	struct lockdep_map dep_map;
      #endif
      };
      

      this would mean that activity is -1 (one active writer).

      What did I miss there ?

      Attachments

        Activity

          [LU-7681] Deadlock on MDS around dqptr_sem
          pjones Peter Jones added a comment -

          Thanks

          pjones Peter Jones added a comment - Thanks

          I did not hear from this problem for quite a while, so I think this was solved by moving away from 2.1.

          Please close.

          spiechurski Sebastien Piechurski added a comment - I did not hear from this problem for quite a while, so I think this was solved by moving away from 2.1. Please close.

          Hello Seb, and sorry to be late on this.

          I have spent more time to analyze the crash-dump you have provided. BTW, this looks like a new occurrence/crash-dump than the one we already worked-on together, so this means that same problem has re-occurred ...

          Also, can you give me some hint on how to use the lustre[-ldiskfs]-core RPMs you have provided (and their embedded sets of patches), in order to get the full+exact source tree being used for this Lustre version?

          Then as a first thought, and despite the fact I still have not identified the thread that presently owns dqptr_sem and the blocked situation looks a bit different, I wonder if my patch for LU-4271 could not help to also avoid this dead-lock. Could it be possible to give it a try?

          bfaccini Bruno Faccini (Inactive) added a comment - Hello Seb, and sorry to be late on this. I have spent more time to analyze the crash-dump you have provided. BTW, this looks like a new occurrence/crash-dump than the one we already worked-on together, so this means that same problem has re-occurred ... Also, can you give me some hint on how to use the lustre [-ldiskfs] -core RPMs you have provided (and their embedded sets of patches), in order to get the full+exact source tree being used for this Lustre version? Then as a first thought, and despite the fact I still have not identified the thread that presently owns dqptr_sem and the blocked situation looks a bit different, I wonder if my patch for LU-4271 could not help to also avoid this dead-lock. Could it be possible to give it a try?

          The transfer finally succeeded with file LU-7681-bundle3.tar.xz.

          spiechurski Sebastien Piechurski added a comment - The transfer finally succeeded with file LU-7681 -bundle3.tar.xz.

          Yes, the tranfer failed, I had not noticed.
          A new file is currently tranferring, but I don't have a lot of bandwidth. Will probably be finished only tonight.

          spiechurski Sebastien Piechurski added a comment - Yes, the tranfer failed, I had not noticed. A new file is currently tranferring, but I don't have a lot of bandwidth. Will probably be finished only tonight.

          Seb,
          Can you check, the bundle xfer appears terminated but is incomplete.

          bfaccini Bruno Faccini (Inactive) added a comment - Seb, Can you check, the bundle xfer appears terminated but is incomplete.

          A bundle with all the debuginfo packages and sources is currently uploading on ftp.whamcloud.com.

          spiechurski Sebastien Piechurski added a comment - A bundle with all the debuginfo packages and sources is currently uploading on ftp.whamcloud.com.

          BTW, later (2.4<=) Lustre version use a Kernel patch to avoid dqptr_sem usage. So it is very likely that this problem no longer exists since then.

          bfaccini Bruno Faccini (Inactive) added a comment - BTW, later (2.4<=) Lustre version use a Kernel patch to avoid dqptr_sem usage. So it is very likely that this problem no longer exists since then.

          Seb, is the crash-dump for this pb still available? And then, can you upload it with kernel-[common-]debugingo and lustre-debuginfo RPMs ?

          bfaccini Bruno Faccini (Inactive) added a comment - Seb, is the crash-dump for this pb still available? And then, can you upload it with kernel- [common-] debugingo and lustre-debuginfo RPMs ?

          Suspecting a dead-lock problem around dqptr_sem in quite old Kernel ...

          bfaccini Bruno Faccini (Inactive) added a comment - Suspecting a dead-lock problem around dqptr_sem in quite old Kernel ...

          People

            bfaccini Bruno Faccini (Inactive)
            spiechurski Sebastien Piechurski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: