Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.1
-
3
-
4607
Description
On a node with SELinux turned on, stateahead may deadlock and will quickly drive to a hang.
Here some traces : a user issue a du that will start stateahead
crash> bt 60718 PID: 60718 TASK: ffff8817e6b337d0 CPU: 3 COMMAND: "du" #0 [ffff8817df1db938] schedule at ffffffff8147dddc #1 [ffff8817df1dba00] cfs_waitq_wait at ffffffffa03ed75e [libcfs] #2 [ffff8817df1dba10] do_statahead_enter at ffffffffa0877794 [lustre] #3 [ffff8817df1dbb20] ll_lookup_it at ffffffffa0862c75 [lustre] #4 [ffff8817df1dbc30] ll_lookup_nd at ffffffffa0862e58 [lustre] #5 [ffff8817df1dbc90] do_lookup at ffffffff8116bb0b #6 [ffff8817df1dbcf0] __link_path_walk at ffffffff8116c109 #7 [ffff8817df1dbd70] path_walk at ffffffff8116ca8a #8 [ffff8817df1dbdb0] do_path_lookup at ffffffff8116cc5b #9 [ffff8817df1dbde0] user_path_at at ffffffff8116cde7 #10 [ffff8817df1dbeb0] vfs_fstatat at ffffffff811630ac #11 [ffff8817df1dbef0] sys_newfstatat at ffffffff81163114 #12 [ffff8817df1dbf80] system_call_fastpath at ffffffff81003172 RIP: 0000003b564d7eda RSP: 00007fff49f48258 RFLAGS: 00010206 RAX: 0000000000000106 RBX: ffffffff81003172 RCX: 0000000000000000 RDX: 00000000015d12b0 RSI: 00000000015d1340 RDI: 0000000000000005 RBP: 00000000015d12b0 R8: 0000000000000100 R9: 0000000000000000 R10: 0000000000000100 R11: 0000000000000246 R12: 00000000015c6930 R13: 00000000015d1240 R14: 0000000000000000 R15: 00000000015c6930 ORIG_RAX: 0000000000000106 CS: 0033 SS: 002b PID: 60781 TASK: ffff88082f71b7d0 CPU: 7 COMMAND: "ll_sa_60718" #0 [ffff88082f713d10] schedule at ffffffff8147dddc #1 [ffff88082f713dd8] cfs_waitq_wait at ffffffffa03ed75e [libcfs] #2 [ffff88082f713de8] ll_statahead_thread at ffffffffa087b52b [lustre] #3 [ffff88082f713f48] kernel_thread at ffffffff810041aa PID: 60802 TASK: ffff8807a2576040 CPU: 23 COMMAND: "ll_sa_60718" #0 [ffff8807f89ff580] schedule at ffffffff8147dddc #1 [ffff8807f89ff648] schedule_timeout at ffffffff8147eab0 #2 [ffff8807f89ff6f8] cfs_waitq_timedwait at ffffffffa03ed731 [libcfs] #3 [ffff8807f89ff708] ptlrpc_set_wait at ffffffffa05c3788 [ptlrpc] #4 [ffff8807f89ff7e8] ptlrpc_queue_wait at ffffffffa05c3ce6 [ptlrpc] #5 [ffff8807f89ff838] mdc_xattr_common at ffffffffa074795c [mdc] #6 [ffff8807f89ff8d8] mdc_getxattr at ffffffffa0747ea7 [mdc] #7 [ffff8807f89ff928] lmv_getxattr at ffffffffa08e3a1b [lmv] #8 [ffff8807f89ff9b8] ll_getxattr_common at ffffffffa086a956 [lustre] #9 [ffff8807f89ffad8] ll_getxattr at ffffffffa086bd62 [lustre] #10 [ffff8807f89ffb88] inode_doinit_with_dentry at ffffffff8120b2e0 #11 [ffff8807f89ffbf8] selinux_d_instantiate at ffffffff8120b83c #12 [ffff8807f89ffc08] security_d_instantiate at ffffffff811fe57b #13 [ffff8807f89ffc18] ll_d_add at ffffffffa0860a15 [lustre] #14 [ffff8807f89ffc78] ll_lookup_it_finish at ffffffffa0861dff [lustre] #15 [ffff8807f89ffd38] do_statahead_interpret at ffffffffa08771d2 [lustre] #16 [ffff8807f89ffde8] ll_statahead_thread at ffffffffa087b2d2 [lustre] #17 [ffff8807f89fff48] kernel_thread at ffffffff810041aa
Note that stateahead is now awaiting for an RPC to complete and did schedule on CPU 23.
My understanding is that, at this point, it held dcache_lock that has been acquired in ll_lookup_it_finish, just before calling ll_d_add
spin_lock(&dcache_lock); ll_d_add(*de, NULL); spin_unlock(&dcache_lock);
I haven't found a place where this lock might have been released.
During this time, a new thread is scheduled on the same cpu and start spinlocking on ll_lookup_lock (which is just acquired before dcache_lock in many code path)
crash> bt 49060 PID: 49060 TASK: ffff88080736d100 CPU: 23 COMMAND: "ldlm_bl_70" #0 [ffff88189c547e90] crash_nmi_callback at ffffffff8101fd06 #1 [ffff88189c547ea0] notifier_call_chain at ffffffff814837f5 #2 [ffff88189c547ee0] atomic_notifier_call_chain at ffffffff8148385a #3 [ffff88189c547ef0] notify_die at ffffffff8108026e #4 [ffff88189c547f20] do_nmi at ffffffff81481443 #5 [ffff88189c547f50] nmi at ffffffff81480d50 [exception RIP: _spin_lock+33] RIP: ffffffff81480631 RSP: ffff8806e185bc00 RFLAGS: 00000293 RAX: 00000000000041ce RBX: ffff8818205c6238 RCX: 0000000000000000 RDX: 00000000000041c0 RSI: 0000000000000000 RDI: ffffffffa089d100 RBP: ffff8806e185bc00 R8: 000000020817392b R9: 000000000000037e R10: df80000000000000 R11: a5236b391d08037e R12: ffff88185e51d240 R13: 0000000000000003 R14: ffff8818205c6100 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff8806e185bc00] _spin_lock at ffffffff81480631 #7 [ffff8806e185bc08] ll_md_blocking_ast at ffffffffa086049f [lustre] #8 [ffff8806e185bc98] ldlm_cancel_callback at ffffffffa0585d33 [ptlrpc] #9 [ffff8806e185bcb8] ldlm_cli_cancel_local at ffffffffa059feb1 [ptlrpc] #10 [ffff8806e185bd18] ldlm_cli_cancel at ffffffffa05a42e8 [ptlrpc] #11 [ffff8806e185bd98] ll_md_blocking_ast at ffffffffa0860085 [lustre] #12 [ffff8806e185be28] ldlm_handle_bl_callback at ffffffffa05a76ed [ptlrpc] #13 [ffff8806e185be88] ldlm_bl_thread_main at ffffffffa05a7b41 [ptlrpc] #14 [ffff8806e185bf48] kernel_thread at ffffffff810041aa
Quickly the system will hang with all threads spinlocking
crash> ps | egrep "^>" > 9226 1 31 ffff88187beaf080 RU 0.0 199500 5332 snmpd > 9286 1 15 ffff88087b050040 RU 0.0 40948 1308 nrpe > 36612 1 10 ffff88187b8ad850 RU 0.0 64940 1096 sshd > 49038 2 4 ffff88187b783100 RU 0.0 0 0 [ldlm_bl_48] > 49039 2 13 ffff88070cea6080 RU 0.0 0 0 [ldlm_bl_49] > 49042 2 3 ffff880a70d5f810 RU 0.0 0 0 [ldlm_bl_52] > 49043 2 17 ffff8820612d2040 RU 0.0 0 0 [ldlm_bl_53] > 49044 2 1 ffff881861201810 RU 0.0 0 0 [ldlm_bl_54] > 49045 2 11 ffff880ac92cc040 RU 0.0 0 0 [ldlm_bl_55] > 49046 2 21 ffff881f4cf670c0 RU 0.0 0 0 [ldlm_bl_56] > 49047 2 9 ffff88074ef5d810 RU 0.0 0 0 [ldlm_bl_57] > 49048 2 25 ffff880a55860850 RU 0.0 0 0 [ldlm_bl_58] > 49049 2 0 ffff8818225f1850 RU 0.0 0 0 [ldlm_bl_59] > 49050 2 29 ffff88074ef5d0c0 RU 0.0 0 0 [ldlm_bl_60] > 49051 2 27 ffff880a6e2a0850 RU 0.0 0 0 [ldlm_bl_61] > 49052 2 2 ffff881747c447d0 RU 0.0 0 0 [ldlm_bl_62] > 49053 2 5 ffff8817e68fb100 RU 0.0 0 0 [ldlm_bl_63] > 49054 2 7 ffff880a6f8e47d0 RU 0.0 0 0 [ldlm_bl_64] > 49055 2 12 ffff881f3bf72080 RU 0.0 0 0 [ldlm_bl_65] > 49056 2 8 ffff88080736d850 RU 0.0 0 0 [ldlm_bl_66] > 49057 2 30 ffff880fb0e447d0 RU 0.0 0 0 [ldlm_bl_67] > 49058 2 26 ffff8817e8dcb040 RU 0.0 0 0 [ldlm_bl_68] > 49059 2 14 ffff8817d831b790 RU 0.0 0 0 [ldlm_bl_69] > 49060 2 23 ffff88080736d100 RU 0.0 0 0 [ldlm_bl_70] > 49061 2 22 ffff881037704100 RU 0.0 0 0 [ldlm_bl_71] > 49062 2 19 ffff8817d79687d0 RU 0.0 0 0 [ldlm_bl_72] > 49066 2 20 ffff88074db99040 RU 0.0 0 0 [ldlm_bl_76] > 49067 2 28 ffff88103779a0c0 RU 0.0 0 0 [ldlm_bl_77] > 49068 2 24 ffff8817de5b5810 RU 0.0 0 0 [ldlm_bl_78] > 60897 60840 6 ffff8807423e5790 RU 0.0 2412 148 expr > 77040 2 16 ffff881051753810 RU 0.0 0 0 [ldlm_bl_03] > 77139 2 18 ffff88207bf2e850 RU 0.0 0 0 [ldlm_bl_21]
Thanks