[LU-7119] possible circular locking dependency detected Created: 09/Sep/15  Updated: 16/Apr/20  Resolved: 16/Apr/20

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Ashish Purkar (Inactive) Assignee: Alex Zhuravlev
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Single Node Setup, Scientific Linux (2.6.32-431.17.1.el6.x86_64-Lustre), 2.7.59-16-gb8d51c4


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

#sh lustre/tests/llmount.sh

Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000

llog_process_th/3053 is trying to acquire lock:

 (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffffa0b10fee>] osd_obj_add_entry+0x13e/0x510 [osd_ldiskfs]

but task is already holding lock:

 (&mo->oo_sem){++++..}, at: [<ffffffffa0af07df>] osd_object_write_lock+0x9f/0x13 0 [osd_ldiskfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&mo->oo_sem){++++..}:
       [<ffffffff810b92c6>] __lock_acquire+0xef6/0x1570
       [<ffffffff810b99ec>] lock_acquire+0xac/0x120
       [<ffffffff810a45c5>] down_write_nested+0x65/0xb0
       [<ffffffffa0af07df>] osd_object_write_lock+0x9f/0x130 [osd_ldiskfs]
       [<ffffffffa03d816a>] llog_osd_create+0xba/0x920 [obdclass]
       [<ffffffffa03c28b1>] llog_create+0x81/0x1e0 [obdclass]
       [<ffffffffa03c386b>] llog_open_create+0x2fb/0x3e0 [obdclass]
       [<ffffffffa0b891e8>] mgs_find_or_make_fsdb+0x418/0x1370 [mgs]
       [<ffffffffa0b7ad38>] mgs_target_reg+0x978/0xe50 [mgs]
       [<ffffffffa06a0074>] tgt_request_handle+0xa74/0x12b0 [ptlrpc]
       [<ffffffffa064a8ba>] ptlrpc_main+0xdaa/0x18b0 [ptlrpc]
       [<ffffffff8109e836>] kthread+0x96/0xa0
       [<ffffffff8100c30a>] child_rip+0xa/0x20

-> #1 (jbd2_handle){+.+...}:
       [<ffffffff810b92c6>] __lock_acquire+0xef6/0x1570
       [<ffffffff810b99ec>] lock_acquire+0xac/0x120
       [<ffffffffa0092272>] start_this_handle+0x402/0x4d0 [jbd2]
       [<ffffffffa0092553>] jbd2_journal_start+0xd3/0x110 [jbd2]
       [<ffffffffa0ab4ee6>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
       [<ffffffffa0ab5323>] ldiskfs_dquot_initialize+0x63/0xd0 [ldiskfs]
       [<ffffffff811ade16>] vfs_dq_init+0x56/0x60
       [<ffffffff811b02b0>] vfs_create+0xd0/0x110
       [<ffffffff811b398e>] do_filp_open+0xa8e/0xd30
       [<ffffffff8119d429>] do_sys_open+0x69/0x140
       [<ffffffff8119d540>] sys_open+0x20/0x30
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

-> #0 (&sb->s_type->i_mutex_key#13){+.+.+.}:
       [<ffffffff810b98be>] __lock_acquire+0x14ee/0x1570
       [<ffffffff810b99ec>] lock_acquire+0xac/0x120
       [<ffffffff81551f0c>] __mutex_lock_common+0x4c/0x400
       [<ffffffff8155242a>] mutex_lock_nested+0x4a/0x60
       [<ffffffffa0b10fee>] osd_obj_add_entry+0x13e/0x510 [osd_ldiskfs]
       [<ffffffffa0b12b96>] osd_obj_spec_insert+0x76/0x130 [osd_ldiskfs]
       [<ffffffffa0b01c06>] osd_oi_insert+0x66/0x4d0 [osd_ldiskfs]
       [<ffffffffa0afebeb>] osd_object_ea_create+0x62b/0xcc0 [osd_ldiskfs]
       [<ffffffffa04174bf>] dt_find_or_create+0x40f/0x930 [obdclass]
       [<ffffffffa01ae152>] fld_index_init+0x142/0xacc [fld]
       [<ffffffffa01aaccb>] fld_server_init+0xab/0x3d0 [fld]
       [<ffffffffa0c25031>] mdt_device_alloc+0x511/0x1260 [mdt]
       [<ffffffffa03f95ff>] obd_setup+0x1bf/0x290 [obdclass]
       [<ffffffffa03f992e>] class_setup+0x25e/0x940 [obdclass]
       [<ffffffffa0400211>] class_process_config+0x1151/0x26e0 [obdclass]
       [<ffffffffa04031c6>] class_config_llog_handler+0xc16/0x1fb0 [obdclass]
       [<ffffffffa03c6332>] llog_process_thread+0x932/0xfc0 [obdclass]
       [<ffffffffa03c6f08>] llog_process_thread_daemonize+0x48/0x70 [obdclass]
       [<ffffffff8109e836>] kthread+0x96/0xa0
       [<ffffffff8100c30a>] child_rip+0xa/0x20

other info that might help us debug this:

2 locks held by llog_process_th/3053:

 #0:  (jbd2_handle){+.+...}, at: [<ffffffffa0092218>] start_this_handle+0x3a8/0x
4d0 [jbd2]
 #1:  (&mo->oo_sem){++++..}, at: [<ffffffffa0af07df>] osd_object_write_lock+0x9f
/0x130 [osd_ldiskfs]

stack backtrace:

Pid: 3053, comm: llog_process_th Tainted: G        W  --------------- H  2.6.32
#4
Call Trace:
 [<ffffffff810b6943>] ? print_circular_bug+0xf3/0x100
 [<ffffffff810b98be>] ? __lock_acquire+0x14ee/0x1570
 [<ffffffff810b99ec>] ? lock_acquire+0xac/0x120
 [<ffffffffa0b10fee>] ? osd_obj_add_entry+0x13e/0x510 [osd_ldiskfs]
 [<ffffffffa0b112bb>] ? osd_obj_add_entry+0x40b/0x510 [osd_ldiskfs]
 [<ffffffff81551f0c>] ? __mutex_lock_common+0x4c/0x400
 [<ffffffffa0b10fee>] ? osd_obj_add_entry+0x13e/0x510 [osd_ldiskfs]
 [<ffffffffa0b10fee>] ? osd_obj_add_entry+0x13e/0x510 [osd_ldiskfs]
 [<ffffffffa0ab4e58>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
 [<ffffffff8155242a>] ? mutex_lock_nested+0x4a/0x60
 [<ffffffffa0b10fee>] ? osd_obj_add_entry+0x13e/0x510 [osd_ldiskfs]
 [<ffffffffa0b12b96>] ? osd_obj_spec_insert+0x76/0x130 [osd_ldiskfs]
 [<ffffffffa0b09540>] ? iam_lfix_ipd_alloc+0x0/0x20 [osd_ldiskfs]
 [<ffffffffa0b01c06>] ? osd_oi_insert+0x66/0x4d0 [osd_ldiskfs]
 [<ffffffff811c6892>] ? generic_setxattr+0xa2/0xb0
 [<ffffffffa0aea956>] ? osd_trans_exec_op+0x36/0x350 [osd_ldiskfs]
 [<ffffffffa0afebeb>] ? osd_object_ea_create+0x62b/0xcc0 [osd_ldiskfs]
 [<ffffffffa04174bf>] ? dt_find_or_create+0x40f/0x930 [obdclass]
 [<ffffffffa01ae152>] ? fld_index_init+0x142/0xacc [fld]
 [<ffffffffa01a84fa>] ? fld_cache_init+0x14a/0x340 [fld]
 [<ffffffffa01aaccb>] ? fld_server_init+0xab/0x3d0 [fld]
 [<ffffffffa0c25031>] ? mdt_device_alloc+0x511/0x1260 [mdt]
 [<ffffffffa03f95ff>] ? obd_setup+0x1bf/0x290 [obdclass]
 [<ffffffffa03f992e>] ? class_setup+0x25e/0x940 [obdclass]
 [<ffffffffa0400211>] ? class_process_config+0x1151/0x26e0 [obdclass]
 [<ffffffff811823f2>] ? cache_alloc_debugcheck_after+0x152/0x290
 [<ffffffff81537dc0>] ? kmemleak_alloc+0x20/0xd0
 [<ffffffff81185b6a>] ? __kmalloc+0x29a/0x310
 [<ffffffffa040303f>] ? class_config_llog_handler+0xa8f/0x1fb0 [obdclass]
 [<ffffffffa04031c6>] ? class_config_llog_handler+0xc16/0x1fb0 [obdclass]
 [<ffffffff81551cfe>] ? mutex_unlock+0xe/0x10
 [<ffffffffa03c6332>] ? llog_process_thread+0x932/0xfc0 [obdclass]
 [<ffffffffa03c6f08>] ? llog_process_thread_daemonize+0x48/0x70 [obdclass]
 [<ffffffffa03c6ec0>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass]
 [<ffffffff8109e836>] ? kthread+0x96/0xa0
 [<ffffffff8100c30a>] ? child_rip+0xa/0x20
 [<ffffffff8100bb10>] ? restore_args+0x0/0x30
 [<ffffffff8109e7a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c300>] ? child_rip+0x0/0x20
Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space
Lustre: lustre-MDT0000: new disk, initializing
Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400
-0x0000000240000400):0:mdt
LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=on. Opts:
LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=on. Opts:
Lustre: lustre-OST0000: new disk, initializing
Lustre: srv-lustre-OST0000: No data found on store. Initialize space
Lustre: Skipped 1 previous similar message
LDISKFS-fs (loop2): mounted filesystem with ordered data mode. quota=on. Opts:
LDISKFS-fs (loop2): mounted filesystem with ordered data mode. quota=on. Opts:
Lustre: Mounted lustre-client
Lustre: DEBUG MARKER: Using TIMEOUT=20

Generated at Sat Feb 10 02:06:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.