Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.1.6
-
RHEL 6 with Bull kernel based on 2.6.32-279.5.2
-
3
-
9223372036854775807
Description
A MDS has encountered several times a deadlock where a process seems to have acquired the superblock dqptr_sem semaphore and left without releasing it.
When looking at a dump taken during this deadlock, we can see this:
- most of the processes are in jbd2_journal_start() -> start_this_handle() -> schedule(), waiting for the current transaction to be finished.
- the process initiating the journal_stop is found in this state:
PID: 1479 TASK: ffff880cefe9e7f0 CPU: 5 COMMAND: "mdt_279" #0 [ffff8810093538a0] schedule at ffffffff814965a5 #1 [ffff881009353968] jbd2_log_wait_commit at ffffffffa00b9e55 [jbd2] #2 [ffff8810093539f8] jbd2_journal_stop at ffffffffa00b1b6b [jbd2] #3 [ffff881009353a58] __ldiskfs_journal_stop at ffffffffa05c7808 [ldiskfs] #4 [ffff881009353a88] osd_trans_stop at ffffffffa0e86b35 [osd_ldiskfs] #5 [ffff881009353ab8] mdd_trans_stop at ffffffffa0d8b4aa [mdd] #6 [ffff881009353ac8] mdd_attr_set at ffffffffa0d6aa5f [mdd] #7 [ffff881009353ba8] cml_attr_set at ffffffffa0ec3a86 [cmm] #8 [ffff881009353bd8] mdt_attr_set at ffffffffa0dfe418 [mdt] #9 [ffff881009353c28] mdt_reint_setattr at ffffffffa0dfea65 [mdt] #10 [ffff881009353cb8] mdt_reint_rec at ffffffffa0df7cb1 [mdt] #11 [ffff881009353cd8] mdt_reint_internal at ffffffffa0deeed4 [mdt] #12 [ffff881009353d28] mdt_reint at ffffffffa0def2b4 [mdt] #13 [ffff881009353d48] mdt_handle_common at ffffffffa0de3762 [mdt] #14 [ffff881009353d98] mdt_regular_handle at ffffffffa0de4655 [mdt] #15 [ffff881009353da8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc] #16 [ffff881009353f48] kernel_thread at ffffffff8100412a
- This process is itself waiting for a process to commit current transaction. This process is then in this state:
PID: 6539 TASK: ffff880869e910c0 CPU: 14 COMMAND: "mdt_47" #0 [ffff880834bbb490] schedule at ffffffff814965a5 #1 [ffff880834bbb558] rwsem_down_failed_common at ffffffff81498ba5 #2 [ffff880834bbb5b8] rwsem_down_write_failed at ffffffff81498d23 #3 [ffff880834bbb5f8] call_rwsem_down_write_failed at ffffffff812689c3 #4 [ffff880834bbb658] dquot_initialize at ffffffff811c6cfb #5 [ffff880834bbb6c8] ldiskfs_dquot_initialize at ffffffffa05c7c44 [ldiskfs] #6 [ffff880834bbb6f8] osd_oi_iam_refresh at ffffffffa0e8ef3f [osd_ldiskfs] #7 [ffff880834bbb758] osd_oi_insert at ffffffffa0e8f532 [osd_ldiskfs] #8 [ffff880834bbb7d8] __osd_oi_insert at ffffffffa0e86e31 [osd_ldiskfs] #9 [ffff880834bbb828] osd_object_ea_create at ffffffffa0e87a5c [osd_ldiskfs] #10 [ffff880834bbb888] mdd_object_create_internal at ffffffffa0d61ed0 [mdd] #11 [ffff880834bbb8e8] mdd_create at ffffffffa0d831de [mdd] #12 [ffff880834bbba28] cml_create at ffffffffa0ec4407 [cmm] #13 [ffff880834bbba78] mdt_reint_open at ffffffffa0e0fd7f [mdt] #14 [ffff880834bbbb58] mdt_reint_rec at ffffffffa0df7cb1 [mdt] #15 [ffff880834bbbb78] mdt_reint_internal at ffffffffa0deeed4 [mdt] #16 [ffff880834bbbbc8] mdt_intent_reint at ffffffffa0def53d [mdt] #17 [ffff880834bbbc18] mdt_intent_policy at ffffffffa0dedc09 [mdt] #18 [ffff880834bbbc58] ldlm_lock_enqueue at ffffffffa07d93c1 [ptlrpc] #19 [ffff880834bbbcb8] ldlm_handle_enqueue0 at ffffffffa07ff3cd [ptlrpc] #20 [ffff880834bbbd28] mdt_enqueue at ffffffffa0dee586 [mdt] #21 [ffff880834bbbd48] mdt_handle_common at ffffffffa0de3762 [mdt] #22 [ffff880834bbbd98] mdt_regular_handle at ffffffffa0de4655 [mdt] #23 [ffff880834bbbda8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc] #24 [ffff880834bbbf48] kernel_thread at ffffffff8100412a
We are at this point waiting in dquot_initialize() for the superblock dqptr_sem semaphore to be released.
Unfortunately, I could not find any process in a code path where this semaphore is already acquired, and (as expected) all code blocks acquiring the semaphore properly release it before exiting the block.
For reference, the semaphore is seen as follows:
crash> struct rw_semaphore ffff88086b5c1180 struct rw_semaphore { count = -4294967296, # == 0xffffffff00000000 wait_lock = { raw_lock = { slock = 2653658667 } }, wait_list = { next = 0xffff880834bbb5c0, prev = 0xffff880834bbb5c0 } }
Given the definition and comments for rw_semaphore in include/linux/rwsem-spinlock.h below:
/* * the rw-semaphore definition * - if activity is 0 then there are no active readers or writers * - if activity is +ve then that is the number of active readers * - if activity is -1 then there is one active writer * - if wait_list is not empty, then there are processes waiting for the semaphore */ struct rw_semaphore { __s32 activity; spinlock_t wait_lock; struct list_head wait_list; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif };
this would mean that activity is -1 (one active writer).
What did I miss there ?