Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7681

Deadlock on MDS around dqptr_sem

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.1.6
    • RHEL 6 with Bull kernel based on 2.6.32-279.5.2
    • 3
    • 9223372036854775807

    Description

      A MDS has encountered several times a deadlock where a process seems to have acquired the superblock dqptr_sem semaphore and left without releasing it.

      When looking at a dump taken during this deadlock, we can see this:

      • most of the processes are in jbd2_journal_start() -> start_this_handle() -> schedule(), waiting for the current transaction to be finished.
      • the process initiating the journal_stop is found in this state:
      PID: 1479   TASK: ffff880cefe9e7f0  CPU: 5   COMMAND: "mdt_279"
       #0 [ffff8810093538a0] schedule at ffffffff814965a5
       #1 [ffff881009353968] jbd2_log_wait_commit at ffffffffa00b9e55 [jbd2]
       #2 [ffff8810093539f8] jbd2_journal_stop at ffffffffa00b1b6b [jbd2]
       #3 [ffff881009353a58] __ldiskfs_journal_stop at ffffffffa05c7808 [ldiskfs]
       #4 [ffff881009353a88] osd_trans_stop at ffffffffa0e86b35 [osd_ldiskfs]
       #5 [ffff881009353ab8] mdd_trans_stop at ffffffffa0d8b4aa [mdd]
       #6 [ffff881009353ac8] mdd_attr_set at ffffffffa0d6aa5f [mdd]
       #7 [ffff881009353ba8] cml_attr_set at ffffffffa0ec3a86 [cmm]
       #8 [ffff881009353bd8] mdt_attr_set at ffffffffa0dfe418 [mdt]
       #9 [ffff881009353c28] mdt_reint_setattr at ffffffffa0dfea65 [mdt]
      #10 [ffff881009353cb8] mdt_reint_rec at ffffffffa0df7cb1 [mdt]
      #11 [ffff881009353cd8] mdt_reint_internal at ffffffffa0deeed4 [mdt]
      #12 [ffff881009353d28] mdt_reint at ffffffffa0def2b4 [mdt]
      #13 [ffff881009353d48] mdt_handle_common at ffffffffa0de3762 [mdt]
      #14 [ffff881009353d98] mdt_regular_handle at ffffffffa0de4655 [mdt]
      #15 [ffff881009353da8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc]
      #16 [ffff881009353f48] kernel_thread at ffffffff8100412a
      
      • This process is itself waiting for a process to commit current transaction. This process is then in this state:
        PID: 6539   TASK: ffff880869e910c0  CPU: 14  COMMAND: "mdt_47"
         #0 [ffff880834bbb490] schedule at ffffffff814965a5
         #1 [ffff880834bbb558] rwsem_down_failed_common at ffffffff81498ba5
         #2 [ffff880834bbb5b8] rwsem_down_write_failed at ffffffff81498d23
         #3 [ffff880834bbb5f8] call_rwsem_down_write_failed at ffffffff812689c3
         #4 [ffff880834bbb658] dquot_initialize at ffffffff811c6cfb
         #5 [ffff880834bbb6c8] ldiskfs_dquot_initialize at ffffffffa05c7c44 [ldiskfs]
         #6 [ffff880834bbb6f8] osd_oi_iam_refresh at ffffffffa0e8ef3f [osd_ldiskfs]
         #7 [ffff880834bbb758] osd_oi_insert at ffffffffa0e8f532 [osd_ldiskfs]
         #8 [ffff880834bbb7d8] __osd_oi_insert at ffffffffa0e86e31 [osd_ldiskfs]
         #9 [ffff880834bbb828] osd_object_ea_create at ffffffffa0e87a5c [osd_ldiskfs]
        #10 [ffff880834bbb888] mdd_object_create_internal at ffffffffa0d61ed0 [mdd]
        #11 [ffff880834bbb8e8] mdd_create at ffffffffa0d831de [mdd]
        #12 [ffff880834bbba28] cml_create at ffffffffa0ec4407 [cmm]
        #13 [ffff880834bbba78] mdt_reint_open at ffffffffa0e0fd7f [mdt]
        #14 [ffff880834bbbb58] mdt_reint_rec at ffffffffa0df7cb1 [mdt]
        #15 [ffff880834bbbb78] mdt_reint_internal at ffffffffa0deeed4 [mdt]
        #16 [ffff880834bbbbc8] mdt_intent_reint at ffffffffa0def53d [mdt]
        #17 [ffff880834bbbc18] mdt_intent_policy at ffffffffa0dedc09 [mdt]
        #18 [ffff880834bbbc58] ldlm_lock_enqueue at ffffffffa07d93c1 [ptlrpc]
        #19 [ffff880834bbbcb8] ldlm_handle_enqueue0 at ffffffffa07ff3cd [ptlrpc]
        #20 [ffff880834bbbd28] mdt_enqueue at ffffffffa0dee586 [mdt]
        #21 [ffff880834bbbd48] mdt_handle_common at ffffffffa0de3762 [mdt]
        #22 [ffff880834bbbd98] mdt_regular_handle at ffffffffa0de4655 [mdt]
        #23 [ffff880834bbbda8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc]
        #24 [ffff880834bbbf48] kernel_thread at ffffffff8100412a
        

      We are at this point waiting in dquot_initialize() for the superblock dqptr_sem semaphore to be released.
      Unfortunately, I could not find any process in a code path where this semaphore is already acquired, and (as expected) all code blocks acquiring the semaphore properly release it before exiting the block.

      For reference, the semaphore is seen as follows:

      crash> struct rw_semaphore ffff88086b5c1180
      struct rw_semaphore {
        count = -4294967296,   # == 0xffffffff00000000
        wait_lock = {
          raw_lock = {
            slock = 2653658667
          }
        }, 
        wait_list = {
          next = 0xffff880834bbb5c0, 
          prev = 0xffff880834bbb5c0
        }
      }
      

      Given the definition and comments for rw_semaphore in include/linux/rwsem-spinlock.h below:

      /*
       * the rw-semaphore definition
       * - if activity is 0 then there are no active readers or writers
       * - if activity is +ve then that is the number of active readers
       * - if activity is -1 then there is one active writer
       * - if wait_list is not empty, then there are processes waiting for the semaphore
       */
      struct rw_semaphore {
      	__s32			activity;
      	spinlock_t		wait_lock;
      	struct list_head	wait_list;
      #ifdef CONFIG_DEBUG_LOCK_ALLOC
      	struct lockdep_map dep_map;
      #endif
      };
      

      this would mean that activity is -1 (one active writer).

      What did I miss there ?

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            spiechurski Sebastien Piechurski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: