[LU-12509] lockdep warning in ofd_precreate_objects() Created: 05/Jul/19  Updated: 09/Aug/19  Resolved: 09/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Improvement Priority: Minor
Reporter: Dongyang Li Assignee: Dongyang Li
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12269 Support RHEL 8.0 Resolved
Rank (Obsolete): 9223372036854775807

 Description   

[ 141.023079]
============================================
[ 141.023750] WARNING: possible recursive locking detected
[ 141.024443] 4.18.0-debug #8 Tainted: G O --------- - -
[ 141.025226] --------------------------------------------
[ 141.025967] ll_ost00_002/7298 is trying to acquire lock:
[ 141.030372] 000000001989e622 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
[ 141.031190]
[ 141.031190] but task is already holding lock:
[ 141.032872] 0000000086a83aaf (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
[ 141.033676]
[ 141.033676] other info that might help us debug this:
[ 141.034304] Possible unsafe locking scenario:
[ 141.034304]
[ 141.034875] CPU0
[ 141.035121] ----
[ 141.035366] lock(&mo->oo_sem);
[ 141.035678] lock(&mo->oo_sem);
[ 141.035993]
[ 141.035993] *** DEADLOCK ***
[ 141.035993]
[ 141.037086] May be due to missing lock nesting notation
[ 141.037086]
[ 141.037727] 3 locks held by ll_ost00_002/7298:
[ 141.038186] #0: 000000003540612a (&m->ofd_lastid_rwsem)

{.+.+}

, at: ofd_create_hdl+0x1a9/0x2270 [ofd]
[ 141.039081] #1: 00000000ee55375f (&oseq->os_create_lock){..}, at: ofd_create_hdl+0x364/0x2270 [ofd]
[ 141.039969] #2: 0000000086a83aaf (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
[ 141.041147]
[ 141.041147] stack backtrace:
[ 141.041568] CPU: 0 PID: 7298 Comm: ll_ost00_002 Tainted: G O --------- - - 4.18.0-debug #8
[ 141.042465] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 141.043004] Call Trace:
[ 141.043256] dump_stack+0x106/0x175
[ 141.043600] validate_chain.isra.26.cold.44+0x224/0x2da
[ 141.044103] __lock_acquire+0x3df/0xa70
[ 141.044535] lock_acquire+0x13a/0x370
[ 141.044908] ? osd_write_lock+0x8a/0xb0 [osd_zfs]
[ 141.045372] down_write_nested+0x6f/0x120
[ 141.045770] ? osd_write_lock+0x8a/0xb0 [osd_zfs]
[ 141.046236] osd_write_lock+0x8a/0xb0 [osd_zfs]
[ 141.046694] ofd_precreate_objects+0x1764/0x2480 [ofd]
[ 141.047201] ofd_create_hdl+0xbec/0x2270 [ofd]
[ 141.047722] tgt_handle_request0+0xdf/0x890 [ptlrpc]
[ 141.048274] tgt_request_handle+0x3ca/0x1aa0 [ptlrpc]
[ 141.048789] ? libcfs_nid2str_r+0x12e/0x160 [lnet]
[ 141.049332] ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc]
[ 141.049898] ? __wake_up+0x17/0x20
[ 141.050299] ptlrpc_main+0xd7f/0x1470 [ptlrpc]
[ 141.050794] ? ptlrpc_register_service+0x14c0/0x14c0 [ptlrpc]
[ 141.051353] kthread+0x190/0x1c0
[ 141.051668] ? kthread_create_worker+0x90/0x90
[ 141.052102] ret_from_fork+0x3a/0x50

we should use different subclass to silent the lockdep false warning.



 Comments   
Comment by Gerrit Updater [ 05/Jul/19 ]

Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35420
Subject: LU-12509 ofd: ofd_precreate_objects lockdep warning
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: eb02aa76b5a91d5e1f0e0fb5e472ca6cf14e9a75

Comment by Li Xi [ 15/Jul/19 ]

With the patch , still got error:

[  282.413781] ======================================================
[  282.414728] WARNING: possible circular locking dependency detected
[  282.415585] 4.18.0-debug #8 Tainted: G           O     --------- -  -
[  282.416258] ------------------------------------------------------
[  282.416893] ll_ost00_002/11461 is trying to acquire lock:
[  282.417510] 000000002a45f69d (&mo->oo_sem/1){+.+.}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.418461] 
[  282.418461] but task is already holding lock:
[  282.419094] 000000004fe6dea8 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.420475] 
[  282.420475] which lock already depends on the new lock.
[  282.420475] 
[  282.421953] 
[  282.421953] the existing dependency chain (in reverse order) is:
[  282.423119] 
[  282.423119] -> #1 (&mo->oo_sem){++++}:
[  282.423898]        __lock_acquire+0x3df/0xa70
[  282.424291]        lock_acquire+0x13a/0x370
[  282.424820]        down_write_nested+0x6f/0x120
[  282.425497]        osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.426504]        __local_file_create+0xb37/0x2360 [obdclass]
[  282.427635]        local_file_find_or_create+0x17a/0x190 [obdclass]
[  282.429418]        lquota_disk_dir_find_create+0x18a/0x840 [lquota]
[  282.432498]        qmt_device_prepare+0xd0/0x280 [lquota]
[  282.433505]        mdt_quota_init+0x18ba/0x1c60 [mdt]
[  282.434509]        mdt_init0+0x1239/0x1490 [mdt]
[  282.435114]        mdt_device_alloc+0x115/0x170 [mdt]
[  282.435944]        obd_setup+0x150/0x300 [obdclass]
[  282.436867]        class_setup+0x4cd/0xa40 [obdclass]
[  282.437786]        class_process_config+0x194c/0x2ea0 [obdclass]
[  282.438892]        class_config_llog_handler+0xafa/0x1bd0 [obdclass]
[  282.440104]        llog_process_thread+0xc82/0x1f80 [obdclass]
[  282.441045]        llog_process_thread_daemonize+0xe4/0x140 [obdclass]
[  282.441787]        kthread+0x190/0x1c0
[  282.442178]        ret_from_fork+0x3a/0x50
[  282.442667] 
[  282.442667] -> #0 (&mo->oo_sem/1){+.+.}:
[  282.443508]        validate_chain.isra.26+0x9a5/0xce0
[  282.444331]        __lock_acquire+0x3df/0xa70
[  282.445106]        lock_acquire+0x13a/0x370
[  282.445775]        down_write_nested+0x6f/0x120
[  282.446536]        osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.447515]        ofd_precreate_objects+0x1767/0x2480 [ofd]
[  282.448677]        ofd_create_hdl+0x952/0x2150 [ofd]
[  282.449838]        tgt_handle_request0+0xdf/0x890 [ptlrpc]
[  282.450870]        tgt_request_handle+0x3ca/0x1aa0 [ptlrpc]
[  282.452084]        ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc]
[  282.453411]        ptlrpc_main+0xd7f/0x1470 [ptlrpc]
[  282.454434]        kthread+0x190/0x1c0
[  282.455099]        ret_from_fork+0x3a/0x50
[  282.455816] 
[  282.455816] other info that might help us debug this:
[  282.455816] 
[  282.457096]  Possible unsafe locking scenario:
[  282.457096] 
[  282.458055]        CPU0                    CPU1
[  282.458836]        ----                    ----
[  282.459634]   lock(&mo->oo_sem);
[  282.460271]                                lock(&mo->oo_sem/1);
[  282.461337]                                lock(&mo->oo_sem);
[  282.462291]   lock(&mo->oo_sem/1);
[  282.462910] 
[  282.462910]  *** DEADLOCK ***
[  282.462910] 
[  282.463974] 3 locks held by ll_ost00_002/11461:
[  282.464726]  #0: 00000000e1797232 (&m->ofd_lastid_rwsem){.+.+}, at: ofd_create_hdl+0x1a9/0x2150 [ofd]
[  282.466266]  #1: 00000000f9816a9f (&oseq->os_create_lock){+.+.}, at: ofd_create_hdl+0x364/0x2150 [ofd]
[  282.467782]  #2: 000000004fe6dea8 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.469186] 
[  282.469186] stack backtrace:
[  282.470030] CPU: 2 PID: 11461 Comm: ll_ost00_002 Kdump: loaded Tainted: G           O     --------- -  - 4.18.0-debug #8
[  282.471895] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  282.473009] Call Trace:
[  282.473563]  dump_stack+0x106/0x175
[  282.474224]  print_circular_bug.isra.25.cold.36+0x238/0x252
[  282.475345]  check_prev_add.constprop.28+0x607/0x6f0
[  282.476472]  ? __lock_acquire+0x410/0xa70
[  282.477316]  validate_chain.isra.26+0x9a5/0xce0
[  282.478188]  __lock_acquire+0x3df/0xa70
[  282.478922]  lock_acquire+0x13a/0x370
[  282.479711]  ? osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.480625]  down_write_nested+0x6f/0x120
[  282.481385]  ? osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.482245]  osd_write_lock+0x8a/0xb0 [osd_zfs]
[  282.483065]  ofd_precreate_objects+0x1767/0x2480 [ofd]
[  282.484082]  ofd_create_hdl+0x952/0x2150 [ofd]
[  282.485115]  tgt_handle_request0+0xdf/0x890 [ptlrpc]
[  282.486177]  tgt_request_handle+0x3ca/0x1aa0 [ptlrpc]
[  282.487285]  ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc]
[  282.488390]  ? __wake_up+0x17/0x20
[  282.489200]  ptlrpc_main+0xd7f/0x1470 [ptlrpc]
[  282.490286]  ? ptlrpc_register_service+0x14c0/0x14c0 [ptlrpc]
[  282.491460]  kthread+0x190/0x1c0
[  282.492007]  ? kthread_create_worker+0x90/0x90
[  282.492805]  ret_from_fork+0x3a/0x50
[  288.692309] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 192.168.200.130@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.

http://testing.linuxhacker.ru:3333/lustre-reports/1076/testresults/runtests-zfs-rhel8.0_x86_64-rhel8.0_x86_64/oleg30-server-console.txt

Comment by Andreas Dilger [ 19/Jul/19 ]

Alex, does this relate to your patch https://review.whamcloud.com/31293 "LU-10048 ofd: take local locks within transaction"?

Comment by Dongyang Li [ 19/Jul/19 ]

Looks like 31293 changed the dt_write_lock order, so if we have 31293 landed, we don't need this patch anymore.

Comment by Alex Zhuravlev [ 22/Jul/19 ]

what exact kernel options do you use to cause the warning?
I'm using linux-4.18.0-32.el8 with the following options:
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
but can't reproduce the warning.

Comment by Dongyang Li [ 22/Jul/19 ]

This is actually found by Oleg, see here:
http://testing.linuxhacker.ru:3333/lustre-reports/dev/7/testresults/racer-zfs-rhel8.0_x86_64-rhel8.0_x86_64/oleg103-server-console.txt

and here:
http://testing.linuxhacker.ru:3333/lustre-reports/dev/7/

Comment by Alex Zhuravlev [ 22/Jul/19 ]

hmm, took Oleg's .config for rhel8, but still can't get the warning.. keep trying.

Comment by Gerrit Updater [ 09/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35420/
Subject: LU-12509 ofd: ofd_precreate_objects lockdep warning
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 697dcf4e87f2dbebe57f3ccb9c0b0962b89cf1b4

Comment by Peter Jones [ 09/Aug/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:53:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.