[LU-7681] Deadlock on MDS around dqptr_sem Created: 18/Jan/16  Updated: 07/Jun/17  Resolved: 07/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sebastien Piechurski Assignee: Bruno Faccini (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: p4b
Environment:

RHEL 6 with Bull kernel based on 2.6.32-279.5.2


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A MDS has encountered several times a deadlock where a process seems to have acquired the superblock dqptr_sem semaphore and left without releasing it.

When looking at a dump taken during this deadlock, we can see this:

  • most of the processes are in jbd2_journal_start() -> start_this_handle() -> schedule(), waiting for the current transaction to be finished.
  • the process initiating the journal_stop is found in this state:
PID: 1479   TASK: ffff880cefe9e7f0  CPU: 5   COMMAND: "mdt_279"
 #0 [ffff8810093538a0] schedule at ffffffff814965a5
 #1 [ffff881009353968] jbd2_log_wait_commit at ffffffffa00b9e55 [jbd2]
 #2 [ffff8810093539f8] jbd2_journal_stop at ffffffffa00b1b6b [jbd2]
 #3 [ffff881009353a58] __ldiskfs_journal_stop at ffffffffa05c7808 [ldiskfs]
 #4 [ffff881009353a88] osd_trans_stop at ffffffffa0e86b35 [osd_ldiskfs]
 #5 [ffff881009353ab8] mdd_trans_stop at ffffffffa0d8b4aa [mdd]
 #6 [ffff881009353ac8] mdd_attr_set at ffffffffa0d6aa5f [mdd]
 #7 [ffff881009353ba8] cml_attr_set at ffffffffa0ec3a86 [cmm]
 #8 [ffff881009353bd8] mdt_attr_set at ffffffffa0dfe418 [mdt]
 #9 [ffff881009353c28] mdt_reint_setattr at ffffffffa0dfea65 [mdt]
#10 [ffff881009353cb8] mdt_reint_rec at ffffffffa0df7cb1 [mdt]
#11 [ffff881009353cd8] mdt_reint_internal at ffffffffa0deeed4 [mdt]
#12 [ffff881009353d28] mdt_reint at ffffffffa0def2b4 [mdt]
#13 [ffff881009353d48] mdt_handle_common at ffffffffa0de3762 [mdt]
#14 [ffff881009353d98] mdt_regular_handle at ffffffffa0de4655 [mdt]
#15 [ffff881009353da8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc]
#16 [ffff881009353f48] kernel_thread at ffffffff8100412a
  • This process is itself waiting for a process to commit current transaction. This process is then in this state:
    PID: 6539   TASK: ffff880869e910c0  CPU: 14  COMMAND: "mdt_47"
     #0 [ffff880834bbb490] schedule at ffffffff814965a5
     #1 [ffff880834bbb558] rwsem_down_failed_common at ffffffff81498ba5
     #2 [ffff880834bbb5b8] rwsem_down_write_failed at ffffffff81498d23
     #3 [ffff880834bbb5f8] call_rwsem_down_write_failed at ffffffff812689c3
     #4 [ffff880834bbb658] dquot_initialize at ffffffff811c6cfb
     #5 [ffff880834bbb6c8] ldiskfs_dquot_initialize at ffffffffa05c7c44 [ldiskfs]
     #6 [ffff880834bbb6f8] osd_oi_iam_refresh at ffffffffa0e8ef3f [osd_ldiskfs]
     #7 [ffff880834bbb758] osd_oi_insert at ffffffffa0e8f532 [osd_ldiskfs]
     #8 [ffff880834bbb7d8] __osd_oi_insert at ffffffffa0e86e31 [osd_ldiskfs]
     #9 [ffff880834bbb828] osd_object_ea_create at ffffffffa0e87a5c [osd_ldiskfs]
    #10 [ffff880834bbb888] mdd_object_create_internal at ffffffffa0d61ed0 [mdd]
    #11 [ffff880834bbb8e8] mdd_create at ffffffffa0d831de [mdd]
    #12 [ffff880834bbba28] cml_create at ffffffffa0ec4407 [cmm]
    #13 [ffff880834bbba78] mdt_reint_open at ffffffffa0e0fd7f [mdt]
    #14 [ffff880834bbbb58] mdt_reint_rec at ffffffffa0df7cb1 [mdt]
    #15 [ffff880834bbbb78] mdt_reint_internal at ffffffffa0deeed4 [mdt]
    #16 [ffff880834bbbbc8] mdt_intent_reint at ffffffffa0def53d [mdt]
    #17 [ffff880834bbbc18] mdt_intent_policy at ffffffffa0dedc09 [mdt]
    #18 [ffff880834bbbc58] ldlm_lock_enqueue at ffffffffa07d93c1 [ptlrpc]
    #19 [ffff880834bbbcb8] ldlm_handle_enqueue0 at ffffffffa07ff3cd [ptlrpc]
    #20 [ffff880834bbbd28] mdt_enqueue at ffffffffa0dee586 [mdt]
    #21 [ffff880834bbbd48] mdt_handle_common at ffffffffa0de3762 [mdt]
    #22 [ffff880834bbbd98] mdt_regular_handle at ffffffffa0de4655 [mdt]
    #23 [ffff880834bbbda8] ptlrpc_main at ffffffffa082e4e6 [ptlrpc]
    #24 [ffff880834bbbf48] kernel_thread at ffffffff8100412a
    

We are at this point waiting in dquot_initialize() for the superblock dqptr_sem semaphore to be released.
Unfortunately, I could not find any process in a code path where this semaphore is already acquired, and (as expected) all code blocks acquiring the semaphore properly release it before exiting the block.

For reference, the semaphore is seen as follows:

crash> struct rw_semaphore ffff88086b5c1180
struct rw_semaphore {
  count = -4294967296,   # == 0xffffffff00000000
  wait_lock = {
    raw_lock = {
      slock = 2653658667
    }
  }, 
  wait_list = {
    next = 0xffff880834bbb5c0, 
    prev = 0xffff880834bbb5c0
  }
}

Given the definition and comments for rw_semaphore in include/linux/rwsem-spinlock.h below:

/*
 * the rw-semaphore definition
 * - if activity is 0 then there are no active readers or writers
 * - if activity is +ve then that is the number of active readers
 * - if activity is -1 then there is one active writer
 * - if wait_list is not empty, then there are processes waiting for the semaphore
 */
struct rw_semaphore {
	__s32			activity;
	spinlock_t		wait_lock;
	struct list_head	wait_list;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
	struct lockdep_map dep_map;
#endif
};

this would mean that activity is -1 (one active writer).

What did I miss there ?



 Comments   
Comment by Bruno Faccini (Inactive) [ 18/Jan/16 ]

Assigning to me since I have already started working with Seb on this problem from Bull office.

Comment by Bruno Faccini (Inactive) [ 18/Jan/16 ]

Suspecting a dead-lock problem around dqptr_sem in quite old Kernel ...

Comment by Bruno Faccini (Inactive) [ 19/Jan/16 ]

Seb, is the crash-dump for this pb still available? And then, can you upload it with kernel-[common-]debugingo and lustre-debuginfo RPMs ?

Comment by Bruno Faccini (Inactive) [ 19/Jan/16 ]

BTW, later (2.4<=) Lustre version use a Kernel patch to avoid dqptr_sem usage. So it is very likely that this problem no longer exists since then.

Comment by Sebastien Piechurski [ 20/Jan/16 ]

A bundle with all the debuginfo packages and sources is currently uploading on ftp.whamcloud.com.

Comment by Bruno Faccini (Inactive) [ 21/Jan/16 ]

Seb,
Can you check, the bundle xfer appears terminated but is incomplete.

Comment by Sebastien Piechurski [ 21/Jan/16 ]

Yes, the tranfer failed, I had not noticed.
A new file is currently tranferring, but I don't have a lot of bandwidth. Will probably be finished only tonight.

Comment by Sebastien Piechurski [ 23/Jan/16 ]

The transfer finally succeeded with file LU-7681-bundle3.tar.xz.

Comment by Bruno Faccini (Inactive) [ 17/Feb/16 ]

Hello Seb, and sorry to be late on this.

I have spent more time to analyze the crash-dump you have provided. BTW, this looks like a new occurrence/crash-dump than the one we already worked-on together, so this means that same problem has re-occurred ...

Also, can you give me some hint on how to use the lustre[-ldiskfs]-core RPMs you have provided (and their embedded sets of patches), in order to get the full+exact source tree being used for this Lustre version?

Then as a first thought, and despite the fact I still have not identified the thread that presently owns dqptr_sem and the blocked situation looks a bit different, I wonder if my patch for LU-4271 could not help to also avoid this dead-lock. Could it be possible to give it a try?

Comment by Sebastien Piechurski [ 07/Jun/17 ]

I did not hear from this problem for quite a while, so I think this was solved by moving away from 2.1.

Please close.

Comment by Peter Jones [ 07/Jun/17 ]

Thanks

Generated at Sat Feb 10 02:11:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.