Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6409

sleeping while atomic in nodemap_destroy

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      nodemap_destroy seems to be using mutexes when called under spinlock/rwlock.

      Here's the relevant error message:

      [ 6003.169249] Lustre: server umount lustre-OST0001 complete
      [ 6007.190173] BUG: sleeping function called from invalid context at kernel/mutex.c:576
      [ 6007.190482] in_atomic(): 1, irqs_disabled(): 0, pid: 20641, name: rmmod
      [ 6007.190659] 1 lock held by rmmod/20641:
      [ 6007.190805]  #0:  (&new_bkts[i]->hsb_lock.rw){......}, at: [<ffffffffa015151d>] cfs_hash_rw_lock+0xd/0x20 [libcfs]
      [ 6007.191177] CPU: 3 PID: 20641 Comm: rmmod Tainted: GF       W  O--------------   3.10.0-debug #5
      [ 6007.191480] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 6007.191648]  ffff880083aac540 000000008428f159 ffff8800967e7d50 ffffffff816ccb68
      [ 6007.191954]  ffff8800967e7d68 ffffffff810a8fd9 ffff8800a8d2b0c0 ffff8800967e7de8
      [ 6007.192275]  ffffffff816d0dd5 0000000000000000 ffff8800967e7de0 0000000000000246
      [ 6007.192582] Call Trace:
      [ 6007.192716]  [<ffffffff816ccb68>] dump_stack+0x19/0x1b
      [ 6007.192879]  [<ffffffff810a8fd9>] __might_sleep+0xe9/0x110
      [ 6007.193063]  [<ffffffff816d0dd5>] mutex_lock_nested+0x55/0x490
      [ 6007.193281]  [<ffffffffa057cfa4>] ? nodemap_destroy+0x74/0x1f0 [ptlrpc]
      [ 6007.193491]  [<ffffffffa05812ab>] nm_member_reclassify_nodemap+0x1b/0x40 [ptlrpc]
      [ 6007.193799]  [<ffffffffa057cfbc>] nodemap_destroy+0x8c/0x1f0 [ptlrpc]
      [ 6007.193993]  [<ffffffffa057d14d>] nodemap_putref+0x2d/0xa0 [ptlrpc]
      [ 6007.194201]  [<ffffffffa057e1f5>] nodemap_hs_put_locked+0x15/0x20 [ptlrpc]
      [ 6007.194392]  [<ffffffffa0151f25>] cfs_hash_bd_del_locked+0x75/0xf0 [libcfs]
      [ 6007.194578]  [<ffffffffa015579c>] cfs_hash_putref+0x1cc/0x500 [libcfs]
      [ 6007.194771]  [<ffffffffa057e22a>] nodemap_cleanup_all+0x2a/0x30 [ptlrpc]
      [ 6007.194967]  [<ffffffffa057e23e>] nodemap_mod_exit+0xe/0x20 [ptlrpc]
      [ 6007.195177]  [<ffffffffa0583bbd>] ptlrpc_exit+0x9/0x44c [ptlrpc]
      [ 6007.195354]  [<ffffffff810e769b>] SyS_delete_module+0x16b/0x2d0
      [ 6007.195526]  [<ffffffff816df369>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-6409] sleeping while atomic in nodemap_destroy
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14275/
            Subject: LU-6409 nodemap: don't destroy nodemaps in cfs_hash_for_each
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6299d78713ef2b6f69f6dd046256aa91a85147a8

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14275/ Subject: LU-6409 nodemap: don't destroy nodemaps in cfs_hash_for_each Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6299d78713ef2b6f69f6dd046256aa91a85147a8
            green Oleg Drokin added a comment -

            ah, I see it's likely a bug in http://review.whamcloud.com/#/c/11813/8

            green Oleg Drokin added a comment - ah, I see it's likely a bug in http://review.whamcloud.com/#/c/11813/8
            green Oleg Drokin added a comment -

            I tried the three patches from LU-5092plus the one patch from this ticket and hit a deadlock when running sanity 133g:

            [15921.648643] LustreError: 7937:0:(lproc_fid.c:182:lprocfs_server_fid_width_seq_write()) Skipped 17 previous similar messages
            [16080.884140] INFO: task badarea_io:7981 blocked for more than 120 seconds.
            [16080.884666]       Tainted: P           ---------------    2.6.32-rhe6.6-debug #1
            [16080.885381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [16080.886109] badarea_io    D 0000000000000002  2792  7981   7003 0x00000000
            [16080.886582]  ffff880046b5fd78 0000000000000086 0000000000000000 ffffffff8114e4a0
            [16080.887360]  0000000000000000 ffffffff8104e658 00000e7b1ce2fe1b ffff880046b5fd58
            [16080.888121]  ffffffffa1395a20 00000001003b97ab ffff88005f2708f8 ffff880046b5ffd8
            [16080.888904] Call Trace:
            [16080.889234]  [<ffffffff8114e4a0>] ? handle_pte_fault+0x90/0xa50
            [16080.889663]  [<ffffffff8104e658>] ? __change_page_attr_set_clr+0x808/0xcc0
            [16080.890154]  [<ffffffffa1395a20>] ? lprocfs_single_release+0x0/0x10 [obdclass]
            [16080.890876]  [<ffffffff8151ff50>] __mutex_lock_slowpath+0x120/0x2e0
            [16080.891364]  [<ffffffffa166d8d0>] ? nodemap_del_nodemap_write+0x0/0xc0 [ptlrpc]
            [16080.892098]  [<ffffffff81520141>] mutex_lock+0x31/0x50
            [16080.892598]  [<ffffffffa16699bc>] nodemap_del+0x3c/0x1c0 [ptlrpc]
            [16080.893118]  [<ffffffffa166d8d0>] ? nodemap_del_nodemap_write+0x0/0xc0 [ptlrpc]
            [16080.896192]  [<ffffffffa166d975>] nodemap_del_nodemap_write+0xa5/0xc0 [ptlrpc]
            [16080.896810]  [<ffffffff811fb8c5>] proc_reg_write+0x85/0xc0
            [16080.897094]  [<ffffffff8118f4a8>] vfs_write+0xb8/0x1a0
            [16080.897388]  [<ffffffff8118fde1>] sys_write+0x51/0x90
            [16080.897719]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
            
            green Oleg Drokin added a comment - I tried the three patches from LU-5092 plus the one patch from this ticket and hit a deadlock when running sanity 133g: [15921.648643] LustreError: 7937:0:(lproc_fid.c:182:lprocfs_server_fid_width_seq_write()) Skipped 17 previous similar messages [16080.884140] INFO: task badarea_io:7981 blocked for more than 120 seconds. [16080.884666] Tainted: P --------------- 2.6.32-rhe6.6-debug #1 [16080.885381] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [16080.886109] badarea_io D 0000000000000002 2792 7981 7003 0x00000000 [16080.886582] ffff880046b5fd78 0000000000000086 0000000000000000 ffffffff8114e4a0 [16080.887360] 0000000000000000 ffffffff8104e658 00000e7b1ce2fe1b ffff880046b5fd58 [16080.888121] ffffffffa1395a20 00000001003b97ab ffff88005f2708f8 ffff880046b5ffd8 [16080.888904] Call Trace: [16080.889234] [<ffffffff8114e4a0>] ? handle_pte_fault+0x90/0xa50 [16080.889663] [<ffffffff8104e658>] ? __change_page_attr_set_clr+0x808/0xcc0 [16080.890154] [<ffffffffa1395a20>] ? lprocfs_single_release+0x0/0x10 [obdclass] [16080.890876] [<ffffffff8151ff50>] __mutex_lock_slowpath+0x120/0x2e0 [16080.891364] [<ffffffffa166d8d0>] ? nodemap_del_nodemap_write+0x0/0xc0 [ptlrpc] [16080.892098] [<ffffffff81520141>] mutex_lock+0x31/0x50 [16080.892598] [<ffffffffa16699bc>] nodemap_del+0x3c/0x1c0 [ptlrpc] [16080.893118] [<ffffffffa166d8d0>] ? nodemap_del_nodemap_write+0x0/0xc0 [ptlrpc] [16080.896192] [<ffffffffa166d975>] nodemap_del_nodemap_write+0xa5/0xc0 [ptlrpc] [16080.896810] [<ffffffff811fb8c5>] proc_reg_write+0x85/0xc0 [16080.897094] [<ffffffff8118f4a8>] vfs_write+0xb8/0x1a0 [16080.897388] [<ffffffff8118fde1>] sys_write+0x51/0x90 [16080.897719] [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b

            Ok, I reported the potential lock inversion as LU-6700.

            I updated the nodemap patches and no longer get the warning messages.

            kit.westneat Kit Westneat (Inactive) added a comment - Ok, I reported the potential lock inversion as LU-6700 . I updated the nodemap patches and no longer get the warning messages.
            green Oleg Drokin added a comment -

            Sure, feel free to report the other ones you see too.

            green Oleg Drokin added a comment - Sure, feel free to report the other ones you see too.

            I finally got the debug kernel working and reproduced the warning you saw. Should I report any other non-nodemap warnings I find? There is a lock inversion warning with s_dquot.dqio_mutex and i_data_sem that I couldn't find any Jira tickets about, but I'm only getting it during mkfs so far.

            kit.westneat Kit Westneat (Inactive) added a comment - I finally got the debug kernel working and reproduced the warning you saw. Should I report any other non-nodemap warnings I find? There is a lock inversion warning with s_dquot.dqio_mutex and i_data_sem that I couldn't find any Jira tickets about, but I'm only getting it during mkfs so far.
            green Oleg Drokin added a comment -

            You need to use rhel7 for this, there's a new kernel option you need to enable:

            CONFIG_DEBUG_ATOMIC_SLEEP=y
            

            Turn this on and run something like sanity, you'll get a bunch of warnings. Also apply patch from LU-6447 (I know it's imperfect) to quiet down unrelated warnings.

            green Oleg Drokin added a comment - You need to use rhel7 for this, there's a new kernel option you need to enable: CONFIG_DEBUG_ATOMIC_SLEEP=y Turn this on and run something like sanity, you'll get a bunch of warnings. Also apply patch from LU-6447 (I know it's imperfect) to quiet down unrelated warnings.

            Ah ok, I see what's going on, I'll update the patches I have in gerrit.

            Can you describe what tests/config options you are running to get these warnings? I am not getting them on my build and I don't want to waste your time if I can catch them before I submit.

            kit.westneat Kit Westneat (Inactive) added a comment - Ah ok, I see what's going on, I'll update the patches I have in gerrit. Can you describe what tests/config options you are running to get these warnings? I am not getting them on my build and I don't want to waste your time if I can catch them before I submit.
            green Oleg Drokin added a comment - - edited

            Hm, so I have several patches from you:
            93ccc4a LU-5092 nodemap: add structure to hold nodemap config
            e4d2197 LU-5092 nodemap: convert member hash to a list
            3f0bb87 LU-6409 nodemap: don't destroy nodemaps in cfs_hash_for_each

            And this is not enough.

            I still get this in the logs:

            leeping function called from invalid context at kernel/mutex.c:576
            May 30 22:19:09 centos6-0 kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 9196,
             name: mdt00_002
            May 30 22:19:09 centos6-0 kernel: 2 locks held by mdt00_002/9196:
            May 30 22:19:09 centos6-0 kernel: #0:  (active_config_lock){......}, at: [<fffff
            fffa05cb147>] nodemap_lock_active_ranges+0x17/0x30 [ptlrpc]
            May 30 22:19:09 centos6-0 kernel: #1:  (&config->nmc_range_tree_lock){......}, a
            t: [<ffffffffa05cb157>] nodemap_lock_active_ranges+0x27/0x30 [ptlrpc] PID: 9196 Comm: mdt00_002 Tainted: GF       W  O--------------   3.10.0-debug #
            re name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            0019366680 00000000b6a08141 ffff88003ac9b9a0 ffffffff816ccb68
            003ac9b9b8 ffffffff810a8fd9 ffff8800aa50
            0e00 ffff88003ac9ba40
            ff816d0dd5 0000000000000000 0000000000000002 0000000000000000
            race:
            ffff816ccb68>] dump_stack+0x19/0x1b
            ffff810a8fd9>] __might_sleep+0xe9/0x110
            ffff816d0dd5>] mutex_lock_nested+0x55/0x490
            ffffa05cb157>] ? nodemap_lock_active_ranges+0x27/0x30 [ptlrpc]
            ffffa05cfbe8>] nm_member_add+0x78/0x180 [ptlrpc]
            ffffa05cc80d>] nodemap_add_member+0x2d/0x50 [ptlrpc]
            ffffa0b147f3>] mdt_obd_connect+0x3b3/0x760 [mdt]
            ffffa01c4a87>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
            ffffa051ded3>] target_handle_connect+0x12c3/0x2be0 [ptlrpc]
            ffff81363263>] ? number.isra.2+0x323/0x360
            ffffa02dbcbd>] ? keys_fill+0x1d/0x130 [obdclass]
            ffffa02dbcd2>] ? keys_fill+0x32/0x130 [obdclass]
            ffffa05b445f>] tgt_request_handle+0x3ef/0x1070 [ptlrpc]
            ffffa0565c01>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc]
            ffffa0563768>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc]
            ffffa0569b70>] ptlrpc_main+0xae0/0x1ee0 [ptlrpc]
            ffffa0569090>] ? ptlrpc_register_service+0xf30/0xf30 [ptlrpc]
            ffff8109c00a>] kthread+0xea/0xf0
            ffff8109bf20>] ? kthread_create_on_node+0x140/0x140
            ffff816df2bc>] ret_from_fork+0x7c/0xb0
            ffff8109bf20>] ? kthread_create_on_node+0x140/0x140
            

            seemingly from every target.

            green Oleg Drokin added a comment - - edited Hm, so I have several patches from you: 93ccc4a LU-5092 nodemap: add structure to hold nodemap config e4d2197 LU-5092 nodemap: convert member hash to a list 3f0bb87 LU-6409 nodemap: don't destroy nodemaps in cfs_hash_for_each And this is not enough. I still get this in the logs: leeping function called from invalid context at kernel/mutex.c:576 May 30 22:19:09 centos6-0 kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 9196, name: mdt00_002 May 30 22:19:09 centos6-0 kernel: 2 locks held by mdt00_002/9196: May 30 22:19:09 centos6-0 kernel: #0: (active_config_lock){......}, at: [<fffff fffa05cb147>] nodemap_lock_active_ranges+0x17/0x30 [ptlrpc] May 30 22:19:09 centos6-0 kernel: #1: (&config->nmc_range_tree_lock){......}, a t: [<ffffffffa05cb157>] nodemap_lock_active_ranges+0x27/0x30 [ptlrpc] PID: 9196 Comm: mdt00_002 Tainted: GF W O-------------- 3.10.0-debug # re name: Red Hat KVM, BIOS 0.5.1 01/01/2011 0019366680 00000000b6a08141 ffff88003ac9b9a0 ffffffff816ccb68 003ac9b9b8 ffffffff810a8fd9 ffff8800aa50 0e00 ffff88003ac9ba40 ff816d0dd5 0000000000000000 0000000000000002 0000000000000000 race: ffff816ccb68>] dump_stack+0x19/0x1b ffff810a8fd9>] __might_sleep+0xe9/0x110 ffff816d0dd5>] mutex_lock_nested+0x55/0x490 ffffa05cb157>] ? nodemap_lock_active_ranges+0x27/0x30 [ptlrpc] ffffa05cfbe8>] nm_member_add+0x78/0x180 [ptlrpc] ffffa05cc80d>] nodemap_add_member+0x2d/0x50 [ptlrpc] ffffa0b147f3>] mdt_obd_connect+0x3b3/0x760 [mdt] ffffa01c4a87>] ? libcfs_debug_msg+0x57/0x80 [libcfs] ffffa051ded3>] target_handle_connect+0x12c3/0x2be0 [ptlrpc] ffff81363263>] ? number.isra.2+0x323/0x360 ffffa02dbcbd>] ? keys_fill+0x1d/0x130 [obdclass] ffffa02dbcd2>] ? keys_fill+0x32/0x130 [obdclass] ffffa05b445f>] tgt_request_handle+0x3ef/0x1070 [ptlrpc] ffffa0565c01>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc] ffffa0563768>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc] ffffa0569b70>] ptlrpc_main+0xae0/0x1ee0 [ptlrpc] ffffa0569090>] ? ptlrpc_register_service+0xf30/0xf30 [ptlrpc] ffff8109c00a>] kthread+0xea/0xf0 ffff8109bf20>] ? kthread_create_on_node+0x140/0x140 ffff816df2bc>] ret_from_fork+0x7c/0xb0 ffff8109bf20>] ? kthread_create_on_node+0x140/0x140 seemingly from every target.

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: