[LU-1363] SELinux and stateahead hang Created: 03/May/12  Updated: 22/Dec/12  Resolved: 04/Jun/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: Lustre 2.3.0, Lustre 2.1.4

Type: Bug Priority: Major
Reporter: Alexandre Louvet Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: paj

Severity: 3
Rank (Obsolete): 4607

 Description   

On a node with SELinux turned on, stateahead may deadlock and will quickly drive to a hang.

Here some traces : a user issue a du that will start stateahead

crash> bt 60718
PID: 60718  TASK: ffff8817e6b337d0  CPU: 3   COMMAND: "du"
 #0 [ffff8817df1db938] schedule at ffffffff8147dddc
 #1 [ffff8817df1dba00] cfs_waitq_wait at ffffffffa03ed75e [libcfs]
 #2 [ffff8817df1dba10] do_statahead_enter at ffffffffa0877794 [lustre]
 #3 [ffff8817df1dbb20] ll_lookup_it at ffffffffa0862c75 [lustre]
 #4 [ffff8817df1dbc30] ll_lookup_nd at ffffffffa0862e58 [lustre]
 #5 [ffff8817df1dbc90] do_lookup at ffffffff8116bb0b
 #6 [ffff8817df1dbcf0] __link_path_walk at ffffffff8116c109
 #7 [ffff8817df1dbd70] path_walk at ffffffff8116ca8a
 #8 [ffff8817df1dbdb0] do_path_lookup at ffffffff8116cc5b
 #9 [ffff8817df1dbde0] user_path_at at ffffffff8116cde7
#10 [ffff8817df1dbeb0] vfs_fstatat at ffffffff811630ac
#11 [ffff8817df1dbef0] sys_newfstatat at ffffffff81163114
#12 [ffff8817df1dbf80] system_call_fastpath at ffffffff81003172
    RIP: 0000003b564d7eda  RSP: 00007fff49f48258  RFLAGS: 00010206
    RAX: 0000000000000106  RBX: ffffffff81003172  RCX: 0000000000000000
    RDX: 00000000015d12b0  RSI: 00000000015d1340  RDI: 0000000000000005
    RBP: 00000000015d12b0   R8: 0000000000000100   R9: 0000000000000000
    R10: 0000000000000100  R11: 0000000000000246  R12: 00000000015c6930
    R13: 00000000015d1240  R14: 0000000000000000  R15: 00000000015c6930
    ORIG_RAX: 0000000000000106  CS: 0033  SS: 002b

PID: 60781  TASK: ffff88082f71b7d0  CPU: 7   COMMAND: "ll_sa_60718"
 #0 [ffff88082f713d10] schedule at ffffffff8147dddc
 #1 [ffff88082f713dd8] cfs_waitq_wait at ffffffffa03ed75e [libcfs]
 #2 [ffff88082f713de8] ll_statahead_thread at ffffffffa087b52b [lustre]
 #3 [ffff88082f713f48] kernel_thread at ffffffff810041aa

PID: 60802  TASK: ffff8807a2576040  CPU: 23  COMMAND: "ll_sa_60718"
 #0 [ffff8807f89ff580] schedule at ffffffff8147dddc
 #1 [ffff8807f89ff648] schedule_timeout at ffffffff8147eab0
 #2 [ffff8807f89ff6f8] cfs_waitq_timedwait at ffffffffa03ed731 [libcfs]
 #3 [ffff8807f89ff708] ptlrpc_set_wait at ffffffffa05c3788 [ptlrpc]
 #4 [ffff8807f89ff7e8] ptlrpc_queue_wait at ffffffffa05c3ce6 [ptlrpc]
 #5 [ffff8807f89ff838] mdc_xattr_common at ffffffffa074795c [mdc]
 #6 [ffff8807f89ff8d8] mdc_getxattr at ffffffffa0747ea7 [mdc]
 #7 [ffff8807f89ff928] lmv_getxattr at ffffffffa08e3a1b [lmv]
 #8 [ffff8807f89ff9b8] ll_getxattr_common at ffffffffa086a956 [lustre]
 #9 [ffff8807f89ffad8] ll_getxattr at ffffffffa086bd62 [lustre]
#10 [ffff8807f89ffb88] inode_doinit_with_dentry at ffffffff8120b2e0
#11 [ffff8807f89ffbf8] selinux_d_instantiate at ffffffff8120b83c
#12 [ffff8807f89ffc08] security_d_instantiate at ffffffff811fe57b
#13 [ffff8807f89ffc18] ll_d_add at ffffffffa0860a15 [lustre]
#14 [ffff8807f89ffc78] ll_lookup_it_finish at ffffffffa0861dff [lustre]
#15 [ffff8807f89ffd38] do_statahead_interpret at ffffffffa08771d2 [lustre]
#16 [ffff8807f89ffde8] ll_statahead_thread at ffffffffa087b2d2 [lustre]
#17 [ffff8807f89fff48] kernel_thread at ffffffff810041aa

Note that stateahead is now awaiting for an RPC to complete and did schedule on CPU 23.
My understanding is that, at this point, it held dcache_lock that has been acquired in ll_lookup_it_finish, just before calling ll_d_add

                spin_lock(&dcache_lock);
                ll_d_add(*de, NULL);
                spin_unlock(&dcache_lock);

I haven't found a place where this lock might have been released.

During this time, a new thread is scheduled on the same cpu and start spinlocking on ll_lookup_lock (which is just acquired before dcache_lock in many code path)

crash> bt 49060
PID: 49060  TASK: ffff88080736d100  CPU: 23  COMMAND: "ldlm_bl_70"
 #0 [ffff88189c547e90] crash_nmi_callback at ffffffff8101fd06
 #1 [ffff88189c547ea0] notifier_call_chain at ffffffff814837f5
 #2 [ffff88189c547ee0] atomic_notifier_call_chain at ffffffff8148385a
 #3 [ffff88189c547ef0] notify_die at ffffffff8108026e
 #4 [ffff88189c547f20] do_nmi at ffffffff81481443
 #5 [ffff88189c547f50] nmi at ffffffff81480d50
    [exception RIP: _spin_lock+33]
    RIP: ffffffff81480631  RSP: ffff8806e185bc00  RFLAGS: 00000293
    RAX: 00000000000041ce  RBX: ffff8818205c6238  RCX: 0000000000000000
    RDX: 00000000000041c0  RSI: 0000000000000000  RDI: ffffffffa089d100
    RBP: ffff8806e185bc00   R8: 000000020817392b   R9: 000000000000037e
    R10: df80000000000000  R11: a5236b391d08037e  R12: ffff88185e51d240
    R13: 0000000000000003  R14: ffff8818205c6100  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff8806e185bc00] _spin_lock at ffffffff81480631
 #7 [ffff8806e185bc08] ll_md_blocking_ast at ffffffffa086049f [lustre]
 #8 [ffff8806e185bc98] ldlm_cancel_callback at ffffffffa0585d33 [ptlrpc]
 #9 [ffff8806e185bcb8] ldlm_cli_cancel_local at ffffffffa059feb1 [ptlrpc]
#10 [ffff8806e185bd18] ldlm_cli_cancel at ffffffffa05a42e8 [ptlrpc]
#11 [ffff8806e185bd98] ll_md_blocking_ast at ffffffffa0860085 [lustre]
#12 [ffff8806e185be28] ldlm_handle_bl_callback at ffffffffa05a76ed [ptlrpc]
#13 [ffff8806e185be88] ldlm_bl_thread_main at ffffffffa05a7b41 [ptlrpc]
#14 [ffff8806e185bf48] kernel_thread at ffffffff810041aa

Quickly the system will hang with all threads spinlocking

crash> ps | egrep "^>"
>  9226      1  31  ffff88187beaf080  RU   0.0  199500   5332  snmpd
>  9286      1  15  ffff88087b050040  RU   0.0   40948   1308  nrpe
> 36612      1  10  ffff88187b8ad850  RU   0.0   64940   1096  sshd
> 49038      2   4  ffff88187b783100  RU   0.0       0      0  [ldlm_bl_48]
> 49039      2  13  ffff88070cea6080  RU   0.0       0      0  [ldlm_bl_49]
> 49042      2   3  ffff880a70d5f810  RU   0.0       0      0  [ldlm_bl_52]
> 49043      2  17  ffff8820612d2040  RU   0.0       0      0  [ldlm_bl_53]
> 49044      2   1  ffff881861201810  RU   0.0       0      0  [ldlm_bl_54]
> 49045      2  11  ffff880ac92cc040  RU   0.0       0      0  [ldlm_bl_55]
> 49046      2  21  ffff881f4cf670c0  RU   0.0       0      0  [ldlm_bl_56]
> 49047      2   9  ffff88074ef5d810  RU   0.0       0      0  [ldlm_bl_57]
> 49048      2  25  ffff880a55860850  RU   0.0       0      0  [ldlm_bl_58]
> 49049      2   0  ffff8818225f1850  RU   0.0       0      0  [ldlm_bl_59]
> 49050      2  29  ffff88074ef5d0c0  RU   0.0       0      0  [ldlm_bl_60]
> 49051      2  27  ffff880a6e2a0850  RU   0.0       0      0  [ldlm_bl_61]
> 49052      2   2  ffff881747c447d0  RU   0.0       0      0  [ldlm_bl_62]
> 49053      2   5  ffff8817e68fb100  RU   0.0       0      0  [ldlm_bl_63]
> 49054      2   7  ffff880a6f8e47d0  RU   0.0       0      0  [ldlm_bl_64]
> 49055      2  12  ffff881f3bf72080  RU   0.0       0      0  [ldlm_bl_65]
> 49056      2   8  ffff88080736d850  RU   0.0       0      0  [ldlm_bl_66]
> 49057      2  30  ffff880fb0e447d0  RU   0.0       0      0  [ldlm_bl_67]
> 49058      2  26  ffff8817e8dcb040  RU   0.0       0      0  [ldlm_bl_68]
> 49059      2  14  ffff8817d831b790  RU   0.0       0      0  [ldlm_bl_69]
> 49060      2  23  ffff88080736d100  RU   0.0       0      0  [ldlm_bl_70]
> 49061      2  22  ffff881037704100  RU   0.0       0      0  [ldlm_bl_71]
> 49062      2  19  ffff8817d79687d0  RU   0.0       0      0  [ldlm_bl_72]
> 49066      2  20  ffff88074db99040  RU   0.0       0      0  [ldlm_bl_76]
> 49067      2  28  ffff88103779a0c0  RU   0.0       0      0  [ldlm_bl_77]
> 49068      2  24  ffff8817de5b5810  RU   0.0       0      0  [ldlm_bl_78]
> 60897  60840   6  ffff8807423e5790  RU   0.0    2412    148  expr
> 77040      2  16  ffff881051753810  RU   0.0       0      0  [ldlm_bl_03]
> 77139      2  18  ffff88207bf2e850  RU   0.0       0      0  [ldlm_bl_21]

Thanks



 Comments   
Comment by Peter Jones [ 03/May/12 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 03/May/12 ]

security_d_instantiate() could allocate memory and/or yield cpu, should not held spin locks.

patch tracking at http://review.whamcloud.com/2641

Comment by Peter Jones [ 04/Jun/12 ]

Landed for 2.3

Comment by Bob Glossman (Inactive) [ 12/Nov/12 ]

back port to b2_1: http://review.whamcloud.com/4517

Generated at Sat Feb 10 01:15:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.