Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12509

lockdep warning in ofd_precreate_objects()

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • None
    • None
    • 9223372036854775807

    Description

      [ 141.023079]
      ============================================
      [ 141.023750] WARNING: possible recursive locking detected
      [ 141.024443] 4.18.0-debug #8 Tainted: G O --------- - -
      [ 141.025226] --------------------------------------------
      [ 141.025967] ll_ost00_002/7298 is trying to acquire lock:
      [ 141.030372] 000000001989e622 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
      [ 141.031190]
      [ 141.031190] but task is already holding lock:
      [ 141.032872] 0000000086a83aaf (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
      [ 141.033676]
      [ 141.033676] other info that might help us debug this:
      [ 141.034304] Possible unsafe locking scenario:
      [ 141.034304]
      [ 141.034875] CPU0
      [ 141.035121] ----
      [ 141.035366] lock(&mo->oo_sem);
      [ 141.035678] lock(&mo->oo_sem);
      [ 141.035993]
      [ 141.035993] *** DEADLOCK ***
      [ 141.035993]
      [ 141.037086] May be due to missing lock nesting notation
      [ 141.037086]
      [ 141.037727] 3 locks held by ll_ost00_002/7298:
      [ 141.038186] #0: 000000003540612a (&m->ofd_lastid_rwsem)

      {.+.+}

      , at: ofd_create_hdl+0x1a9/0x2270 [ofd]
      [ 141.039081] #1: 00000000ee55375f (&oseq->os_create_lock){..}, at: ofd_create_hdl+0x364/0x2270 [ofd]
      [ 141.039969] #2: 0000000086a83aaf (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
      [ 141.041147]
      [ 141.041147] stack backtrace:
      [ 141.041568] CPU: 0 PID: 7298 Comm: ll_ost00_002 Tainted: G O --------- - - 4.18.0-debug #8
      [ 141.042465] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 141.043004] Call Trace:
      [ 141.043256] dump_stack+0x106/0x175
      [ 141.043600] validate_chain.isra.26.cold.44+0x224/0x2da
      [ 141.044103] __lock_acquire+0x3df/0xa70
      [ 141.044535] lock_acquire+0x13a/0x370
      [ 141.044908] ? osd_write_lock+0x8a/0xb0 [osd_zfs]
      [ 141.045372] down_write_nested+0x6f/0x120
      [ 141.045770] ? osd_write_lock+0x8a/0xb0 [osd_zfs]
      [ 141.046236] osd_write_lock+0x8a/0xb0 [osd_zfs]
      [ 141.046694] ofd_precreate_objects+0x1764/0x2480 [ofd]
      [ 141.047201] ofd_create_hdl+0xbec/0x2270 [ofd]
      [ 141.047722] tgt_handle_request0+0xdf/0x890 [ptlrpc]
      [ 141.048274] tgt_request_handle+0x3ca/0x1aa0 [ptlrpc]
      [ 141.048789] ? libcfs_nid2str_r+0x12e/0x160 [lnet]
      [ 141.049332] ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc]
      [ 141.049898] ? __wake_up+0x17/0x20
      [ 141.050299] ptlrpc_main+0xd7f/0x1470 [ptlrpc]
      [ 141.050794] ? ptlrpc_register_service+0x14c0/0x14c0 [ptlrpc]
      [ 141.051353] kthread+0x190/0x1c0
      [ 141.051668] ? kthread_create_worker+0x90/0x90
      [ 141.052102] ret_from_fork+0x3a/0x50

      we should use different subclass to silent the lockdep false warning.

      Attachments

        Issue Links

          Activity

            [LU-12509] lockdep warning in ofd_precreate_objects()
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.13.0 [ 14290 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35420/
            Subject: LU-12509 ofd: ofd_precreate_objects lockdep warning
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 697dcf4e87f2dbebe57f3ccb9c0b0962b89cf1b4

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35420/ Subject: LU-12509 ofd: ofd_precreate_objects lockdep warning Project: fs/lustre-release Branch: master Current Patch Set: Commit: 697dcf4e87f2dbebe57f3ccb9c0b0962b89cf1b4

            hmm, took Oleg's .config for rhel8, but still can't get the warning.. keep trying.

            bzzz Alex Zhuravlev added a comment - hmm, took Oleg's .config for rhel8, but still can't get the warning.. keep trying.
            dongyang Dongyang Li added a comment - This is actually found by Oleg, see here: http://testing.linuxhacker.ru:3333/lustre-reports/dev/7/testresults/racer-zfs-rhel8.0_x86_64-rhel8.0_x86_64/oleg103-server-console.txt and here: http://testing.linuxhacker.ru:3333/lustre-reports/dev/7/

            what exact kernel options do you use to cause the warning?
            I'm using linux-4.18.0-32.el8 with the following options:
            CONFIG_LOCKDEP_SUPPORT=y
            CONFIG_LOCKDEP=y
            CONFIG_DEBUG_LOCKDEP=y
            but can't reproduce the warning.

            bzzz Alex Zhuravlev added a comment - what exact kernel options do you use to cause the warning? I'm using linux-4.18.0-32.el8 with the following options: CONFIG_LOCKDEP_SUPPORT=y CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCKDEP=y but can't reproduce the warning.
            dongyang Dongyang Li added a comment -

            Looks like 31293 changed the dt_write_lock order, so if we have 31293 landed, we don't need this patch anymore.

            dongyang Dongyang Li added a comment - Looks like 31293 changed the dt_write_lock order, so if we have 31293 landed, we don't need this patch anymore.

            Alex, does this relate to your patch https://review.whamcloud.com/31293 "LU-10048 ofd: take local locks within transaction"?

            adilger Andreas Dilger added a comment - Alex, does this relate to your patch https://review.whamcloud.com/31293 " LU-10048 ofd: take local locks within transaction "?
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-12269 [ LU-12269 ]
            lixi_wc Li Xi added a comment - - edited

            With the patch , still got error:

            [  282.413781] ======================================================
            [  282.414728] WARNING: possible circular locking dependency detected
            [  282.415585] 4.18.0-debug #8 Tainted: G           O     --------- -  -
            [  282.416258] ------------------------------------------------------
            [  282.416893] ll_ost00_002/11461 is trying to acquire lock:
            [  282.417510] 000000002a45f69d (&mo->oo_sem/1){+.+.}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.418461] 
            [  282.418461] but task is already holding lock:
            [  282.419094] 000000004fe6dea8 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.420475] 
            [  282.420475] which lock already depends on the new lock.
            [  282.420475] 
            [  282.421953] 
            [  282.421953] the existing dependency chain (in reverse order) is:
            [  282.423119] 
            [  282.423119] -> #1 (&mo->oo_sem){++++}:
            [  282.423898]        __lock_acquire+0x3df/0xa70
            [  282.424291]        lock_acquire+0x13a/0x370
            [  282.424820]        down_write_nested+0x6f/0x120
            [  282.425497]        osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.426504]        __local_file_create+0xb37/0x2360 [obdclass]
            [  282.427635]        local_file_find_or_create+0x17a/0x190 [obdclass]
            [  282.429418]        lquota_disk_dir_find_create+0x18a/0x840 [lquota]
            [  282.432498]        qmt_device_prepare+0xd0/0x280 [lquota]
            [  282.433505]        mdt_quota_init+0x18ba/0x1c60 [mdt]
            [  282.434509]        mdt_init0+0x1239/0x1490 [mdt]
            [  282.435114]        mdt_device_alloc+0x115/0x170 [mdt]
            [  282.435944]        obd_setup+0x150/0x300 [obdclass]
            [  282.436867]        class_setup+0x4cd/0xa40 [obdclass]
            [  282.437786]        class_process_config+0x194c/0x2ea0 [obdclass]
            [  282.438892]        class_config_llog_handler+0xafa/0x1bd0 [obdclass]
            [  282.440104]        llog_process_thread+0xc82/0x1f80 [obdclass]
            [  282.441045]        llog_process_thread_daemonize+0xe4/0x140 [obdclass]
            [  282.441787]        kthread+0x190/0x1c0
            [  282.442178]        ret_from_fork+0x3a/0x50
            [  282.442667] 
            [  282.442667] -> #0 (&mo->oo_sem/1){+.+.}:
            [  282.443508]        validate_chain.isra.26+0x9a5/0xce0
            [  282.444331]        __lock_acquire+0x3df/0xa70
            [  282.445106]        lock_acquire+0x13a/0x370
            [  282.445775]        down_write_nested+0x6f/0x120
            [  282.446536]        osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.447515]        ofd_precreate_objects+0x1767/0x2480 [ofd]
            [  282.448677]        ofd_create_hdl+0x952/0x2150 [ofd]
            [  282.449838]        tgt_handle_request0+0xdf/0x890 [ptlrpc]
            [  282.450870]        tgt_request_handle+0x3ca/0x1aa0 [ptlrpc]
            [  282.452084]        ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc]
            [  282.453411]        ptlrpc_main+0xd7f/0x1470 [ptlrpc]
            [  282.454434]        kthread+0x190/0x1c0
            [  282.455099]        ret_from_fork+0x3a/0x50
            [  282.455816] 
            [  282.455816] other info that might help us debug this:
            [  282.455816] 
            [  282.457096]  Possible unsafe locking scenario:
            [  282.457096] 
            [  282.458055]        CPU0                    CPU1
            [  282.458836]        ----                    ----
            [  282.459634]   lock(&mo->oo_sem);
            [  282.460271]                                lock(&mo->oo_sem/1);
            [  282.461337]                                lock(&mo->oo_sem);
            [  282.462291]   lock(&mo->oo_sem/1);
            [  282.462910] 
            [  282.462910]  *** DEADLOCK ***
            [  282.462910] 
            [  282.463974] 3 locks held by ll_ost00_002/11461:
            [  282.464726]  #0: 00000000e1797232 (&m->ofd_lastid_rwsem){.+.+}, at: ofd_create_hdl+0x1a9/0x2150 [ofd]
            [  282.466266]  #1: 00000000f9816a9f (&oseq->os_create_lock){+.+.}, at: ofd_create_hdl+0x364/0x2150 [ofd]
            [  282.467782]  #2: 000000004fe6dea8 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.469186] 
            [  282.469186] stack backtrace:
            [  282.470030] CPU: 2 PID: 11461 Comm: ll_ost00_002 Kdump: loaded Tainted: G           O     --------- -  - 4.18.0-debug #8
            [  282.471895] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
            [  282.473009] Call Trace:
            [  282.473563]  dump_stack+0x106/0x175
            [  282.474224]  print_circular_bug.isra.25.cold.36+0x238/0x252
            [  282.475345]  check_prev_add.constprop.28+0x607/0x6f0
            [  282.476472]  ? __lock_acquire+0x410/0xa70
            [  282.477316]  validate_chain.isra.26+0x9a5/0xce0
            [  282.478188]  __lock_acquire+0x3df/0xa70
            [  282.478922]  lock_acquire+0x13a/0x370
            [  282.479711]  ? osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.480625]  down_write_nested+0x6f/0x120
            [  282.481385]  ? osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.482245]  osd_write_lock+0x8a/0xb0 [osd_zfs]
            [  282.483065]  ofd_precreate_objects+0x1767/0x2480 [ofd]
            [  282.484082]  ofd_create_hdl+0x952/0x2150 [ofd]
            [  282.485115]  tgt_handle_request0+0xdf/0x890 [ptlrpc]
            [  282.486177]  tgt_request_handle+0x3ca/0x1aa0 [ptlrpc]
            [  282.487285]  ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc]
            [  282.488390]  ? __wake_up+0x17/0x20
            [  282.489200]  ptlrpc_main+0xd7f/0x1470 [ptlrpc]
            [  282.490286]  ? ptlrpc_register_service+0x14c0/0x14c0 [ptlrpc]
            [  282.491460]  kthread+0x190/0x1c0
            [  282.492007]  ? kthread_create_worker+0x90/0x90
            [  282.492805]  ret_from_fork+0x3a/0x50
            [  288.692309] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 192.168.200.130@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
            

            http://testing.linuxhacker.ru:3333/lustre-reports/1076/testresults/runtests-zfs-rhel8.0_x86_64-rhel8.0_x86_64/oleg30-server-console.txt

            lixi_wc Li Xi added a comment - - edited With the patch , still got error: [ 282.413781] ====================================================== [ 282.414728] WARNING: possible circular locking dependency detected [ 282.415585] 4.18.0-debug #8 Tainted: G O --------- - - [ 282.416258] ------------------------------------------------------ [ 282.416893] ll_ost00_002/11461 is trying to acquire lock: [ 282.417510] 000000002a45f69d (&mo->oo_sem/1){+.+.}, at: osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.418461] [ 282.418461] but task is already holding lock: [ 282.419094] 000000004fe6dea8 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.420475] [ 282.420475] which lock already depends on the new lock. [ 282.420475] [ 282.421953] [ 282.421953] the existing dependency chain (in reverse order) is: [ 282.423119] [ 282.423119] -> #1 (&mo->oo_sem){++++}: [ 282.423898] __lock_acquire+0x3df/0xa70 [ 282.424291] lock_acquire+0x13a/0x370 [ 282.424820] down_write_nested+0x6f/0x120 [ 282.425497] osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.426504] __local_file_create+0xb37/0x2360 [obdclass] [ 282.427635] local_file_find_or_create+0x17a/0x190 [obdclass] [ 282.429418] lquota_disk_dir_find_create+0x18a/0x840 [lquota] [ 282.432498] qmt_device_prepare+0xd0/0x280 [lquota] [ 282.433505] mdt_quota_init+0x18ba/0x1c60 [mdt] [ 282.434509] mdt_init0+0x1239/0x1490 [mdt] [ 282.435114] mdt_device_alloc+0x115/0x170 [mdt] [ 282.435944] obd_setup+0x150/0x300 [obdclass] [ 282.436867] class_setup+0x4cd/0xa40 [obdclass] [ 282.437786] class_process_config+0x194c/0x2ea0 [obdclass] [ 282.438892] class_config_llog_handler+0xafa/0x1bd0 [obdclass] [ 282.440104] llog_process_thread+0xc82/0x1f80 [obdclass] [ 282.441045] llog_process_thread_daemonize+0xe4/0x140 [obdclass] [ 282.441787] kthread+0x190/0x1c0 [ 282.442178] ret_from_fork+0x3a/0x50 [ 282.442667] [ 282.442667] -> #0 (&mo->oo_sem/1){+.+.}: [ 282.443508] validate_chain.isra.26+0x9a5/0xce0 [ 282.444331] __lock_acquire+0x3df/0xa70 [ 282.445106] lock_acquire+0x13a/0x370 [ 282.445775] down_write_nested+0x6f/0x120 [ 282.446536] osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.447515] ofd_precreate_objects+0x1767/0x2480 [ofd] [ 282.448677] ofd_create_hdl+0x952/0x2150 [ofd] [ 282.449838] tgt_handle_request0+0xdf/0x890 [ptlrpc] [ 282.450870] tgt_request_handle+0x3ca/0x1aa0 [ptlrpc] [ 282.452084] ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc] [ 282.453411] ptlrpc_main+0xd7f/0x1470 [ptlrpc] [ 282.454434] kthread+0x190/0x1c0 [ 282.455099] ret_from_fork+0x3a/0x50 [ 282.455816] [ 282.455816] other info that might help us debug this: [ 282.455816] [ 282.457096] Possible unsafe locking scenario: [ 282.457096] [ 282.458055] CPU0 CPU1 [ 282.458836] ---- ---- [ 282.459634] lock(&mo->oo_sem); [ 282.460271] lock(&mo->oo_sem/1); [ 282.461337] lock(&mo->oo_sem); [ 282.462291] lock(&mo->oo_sem/1); [ 282.462910] [ 282.462910] *** DEADLOCK *** [ 282.462910] [ 282.463974] 3 locks held by ll_ost00_002/11461: [ 282.464726] #0: 00000000e1797232 (&m->ofd_lastid_rwsem){.+.+}, at: ofd_create_hdl+0x1a9/0x2150 [ofd] [ 282.466266] #1: 00000000f9816a9f (&oseq->os_create_lock){+.+.}, at: ofd_create_hdl+0x364/0x2150 [ofd] [ 282.467782] #2: 000000004fe6dea8 (&mo->oo_sem){++++}, at: osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.469186] [ 282.469186] stack backtrace: [ 282.470030] CPU: 2 PID: 11461 Comm: ll_ost00_002 Kdump: loaded Tainted: G O --------- - - 4.18.0-debug #8 [ 282.471895] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 282.473009] Call Trace: [ 282.473563] dump_stack+0x106/0x175 [ 282.474224] print_circular_bug.isra.25.cold.36+0x238/0x252 [ 282.475345] check_prev_add.constprop.28+0x607/0x6f0 [ 282.476472] ? __lock_acquire+0x410/0xa70 [ 282.477316] validate_chain.isra.26+0x9a5/0xce0 [ 282.478188] __lock_acquire+0x3df/0xa70 [ 282.478922] lock_acquire+0x13a/0x370 [ 282.479711] ? osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.480625] down_write_nested+0x6f/0x120 [ 282.481385] ? osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.482245] osd_write_lock+0x8a/0xb0 [osd_zfs] [ 282.483065] ofd_precreate_objects+0x1767/0x2480 [ofd] [ 282.484082] ofd_create_hdl+0x952/0x2150 [ofd] [ 282.485115] tgt_handle_request0+0xdf/0x890 [ptlrpc] [ 282.486177] tgt_request_handle+0x3ca/0x1aa0 [ptlrpc] [ 282.487285] ptlrpc_server_handle_request+0x634/0x1180 [ptlrpc] [ 282.488390] ? __wake_up+0x17/0x20 [ 282.489200] ptlrpc_main+0xd7f/0x1470 [ptlrpc] [ 282.490286] ? ptlrpc_register_service+0x14c0/0x14c0 [ptlrpc] [ 282.491460] kthread+0x190/0x1c0 [ 282.492007] ? kthread_create_worker+0x90/0x90 [ 282.492805] ret_from_fork+0x3a/0x50 [ 288.692309] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 192.168.200.130@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. http://testing.linuxhacker.ru:3333/lustre-reports/1076/testresults/runtests-zfs-rhel8.0_x86_64-rhel8.0_x86_64/oleg30-server-console.txt

            People

              dongyang Dongyang Li
              dongyang Dongyang Li
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: