Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1363

SELinux and stateahead hang

    XMLWordPrintable

Details

    • 3
    • 4607

    Description

      On a node with SELinux turned on, stateahead may deadlock and will quickly drive to a hang.

      Here some traces : a user issue a du that will start stateahead

      crash> bt 60718
      PID: 60718  TASK: ffff8817e6b337d0  CPU: 3   COMMAND: "du"
       #0 [ffff8817df1db938] schedule at ffffffff8147dddc
       #1 [ffff8817df1dba00] cfs_waitq_wait at ffffffffa03ed75e [libcfs]
       #2 [ffff8817df1dba10] do_statahead_enter at ffffffffa0877794 [lustre]
       #3 [ffff8817df1dbb20] ll_lookup_it at ffffffffa0862c75 [lustre]
       #4 [ffff8817df1dbc30] ll_lookup_nd at ffffffffa0862e58 [lustre]
       #5 [ffff8817df1dbc90] do_lookup at ffffffff8116bb0b
       #6 [ffff8817df1dbcf0] __link_path_walk at ffffffff8116c109
       #7 [ffff8817df1dbd70] path_walk at ffffffff8116ca8a
       #8 [ffff8817df1dbdb0] do_path_lookup at ffffffff8116cc5b
       #9 [ffff8817df1dbde0] user_path_at at ffffffff8116cde7
      #10 [ffff8817df1dbeb0] vfs_fstatat at ffffffff811630ac
      #11 [ffff8817df1dbef0] sys_newfstatat at ffffffff81163114
      #12 [ffff8817df1dbf80] system_call_fastpath at ffffffff81003172
          RIP: 0000003b564d7eda  RSP: 00007fff49f48258  RFLAGS: 00010206
          RAX: 0000000000000106  RBX: ffffffff81003172  RCX: 0000000000000000
          RDX: 00000000015d12b0  RSI: 00000000015d1340  RDI: 0000000000000005
          RBP: 00000000015d12b0   R8: 0000000000000100   R9: 0000000000000000
          R10: 0000000000000100  R11: 0000000000000246  R12: 00000000015c6930
          R13: 00000000015d1240  R14: 0000000000000000  R15: 00000000015c6930
          ORIG_RAX: 0000000000000106  CS: 0033  SS: 002b
      
      PID: 60781  TASK: ffff88082f71b7d0  CPU: 7   COMMAND: "ll_sa_60718"
       #0 [ffff88082f713d10] schedule at ffffffff8147dddc
       #1 [ffff88082f713dd8] cfs_waitq_wait at ffffffffa03ed75e [libcfs]
       #2 [ffff88082f713de8] ll_statahead_thread at ffffffffa087b52b [lustre]
       #3 [ffff88082f713f48] kernel_thread at ffffffff810041aa
      
      PID: 60802  TASK: ffff8807a2576040  CPU: 23  COMMAND: "ll_sa_60718"
       #0 [ffff8807f89ff580] schedule at ffffffff8147dddc
       #1 [ffff8807f89ff648] schedule_timeout at ffffffff8147eab0
       #2 [ffff8807f89ff6f8] cfs_waitq_timedwait at ffffffffa03ed731 [libcfs]
       #3 [ffff8807f89ff708] ptlrpc_set_wait at ffffffffa05c3788 [ptlrpc]
       #4 [ffff8807f89ff7e8] ptlrpc_queue_wait at ffffffffa05c3ce6 [ptlrpc]
       #5 [ffff8807f89ff838] mdc_xattr_common at ffffffffa074795c [mdc]
       #6 [ffff8807f89ff8d8] mdc_getxattr at ffffffffa0747ea7 [mdc]
       #7 [ffff8807f89ff928] lmv_getxattr at ffffffffa08e3a1b [lmv]
       #8 [ffff8807f89ff9b8] ll_getxattr_common at ffffffffa086a956 [lustre]
       #9 [ffff8807f89ffad8] ll_getxattr at ffffffffa086bd62 [lustre]
      #10 [ffff8807f89ffb88] inode_doinit_with_dentry at ffffffff8120b2e0
      #11 [ffff8807f89ffbf8] selinux_d_instantiate at ffffffff8120b83c
      #12 [ffff8807f89ffc08] security_d_instantiate at ffffffff811fe57b
      #13 [ffff8807f89ffc18] ll_d_add at ffffffffa0860a15 [lustre]
      #14 [ffff8807f89ffc78] ll_lookup_it_finish at ffffffffa0861dff [lustre]
      #15 [ffff8807f89ffd38] do_statahead_interpret at ffffffffa08771d2 [lustre]
      #16 [ffff8807f89ffde8] ll_statahead_thread at ffffffffa087b2d2 [lustre]
      #17 [ffff8807f89fff48] kernel_thread at ffffffff810041aa
      

      Note that stateahead is now awaiting for an RPC to complete and did schedule on CPU 23.
      My understanding is that, at this point, it held dcache_lock that has been acquired in ll_lookup_it_finish, just before calling ll_d_add

                      spin_lock(&dcache_lock);
                      ll_d_add(*de, NULL);
                      spin_unlock(&dcache_lock);
      

      I haven't found a place where this lock might have been released.

      During this time, a new thread is scheduled on the same cpu and start spinlocking on ll_lookup_lock (which is just acquired before dcache_lock in many code path)

      crash> bt 49060
      PID: 49060  TASK: ffff88080736d100  CPU: 23  COMMAND: "ldlm_bl_70"
       #0 [ffff88189c547e90] crash_nmi_callback at ffffffff8101fd06
       #1 [ffff88189c547ea0] notifier_call_chain at ffffffff814837f5
       #2 [ffff88189c547ee0] atomic_notifier_call_chain at ffffffff8148385a
       #3 [ffff88189c547ef0] notify_die at ffffffff8108026e
       #4 [ffff88189c547f20] do_nmi at ffffffff81481443
       #5 [ffff88189c547f50] nmi at ffffffff81480d50
          [exception RIP: _spin_lock+33]
          RIP: ffffffff81480631  RSP: ffff8806e185bc00  RFLAGS: 00000293
          RAX: 00000000000041ce  RBX: ffff8818205c6238  RCX: 0000000000000000
          RDX: 00000000000041c0  RSI: 0000000000000000  RDI: ffffffffa089d100
          RBP: ffff8806e185bc00   R8: 000000020817392b   R9: 000000000000037e
          R10: df80000000000000  R11: a5236b391d08037e  R12: ffff88185e51d240
          R13: 0000000000000003  R14: ffff8818205c6100  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #6 [ffff8806e185bc00] _spin_lock at ffffffff81480631
       #7 [ffff8806e185bc08] ll_md_blocking_ast at ffffffffa086049f [lustre]
       #8 [ffff8806e185bc98] ldlm_cancel_callback at ffffffffa0585d33 [ptlrpc]
       #9 [ffff8806e185bcb8] ldlm_cli_cancel_local at ffffffffa059feb1 [ptlrpc]
      #10 [ffff8806e185bd18] ldlm_cli_cancel at ffffffffa05a42e8 [ptlrpc]
      #11 [ffff8806e185bd98] ll_md_blocking_ast at ffffffffa0860085 [lustre]
      #12 [ffff8806e185be28] ldlm_handle_bl_callback at ffffffffa05a76ed [ptlrpc]
      #13 [ffff8806e185be88] ldlm_bl_thread_main at ffffffffa05a7b41 [ptlrpc]
      #14 [ffff8806e185bf48] kernel_thread at ffffffff810041aa
      

      Quickly the system will hang with all threads spinlocking

      crash> ps | egrep "^>"
      >  9226      1  31  ffff88187beaf080  RU   0.0  199500   5332  snmpd
      >  9286      1  15  ffff88087b050040  RU   0.0   40948   1308  nrpe
      > 36612      1  10  ffff88187b8ad850  RU   0.0   64940   1096  sshd
      > 49038      2   4  ffff88187b783100  RU   0.0       0      0  [ldlm_bl_48]
      > 49039      2  13  ffff88070cea6080  RU   0.0       0      0  [ldlm_bl_49]
      > 49042      2   3  ffff880a70d5f810  RU   0.0       0      0  [ldlm_bl_52]
      > 49043      2  17  ffff8820612d2040  RU   0.0       0      0  [ldlm_bl_53]
      > 49044      2   1  ffff881861201810  RU   0.0       0      0  [ldlm_bl_54]
      > 49045      2  11  ffff880ac92cc040  RU   0.0       0      0  [ldlm_bl_55]
      > 49046      2  21  ffff881f4cf670c0  RU   0.0       0      0  [ldlm_bl_56]
      > 49047      2   9  ffff88074ef5d810  RU   0.0       0      0  [ldlm_bl_57]
      > 49048      2  25  ffff880a55860850  RU   0.0       0      0  [ldlm_bl_58]
      > 49049      2   0  ffff8818225f1850  RU   0.0       0      0  [ldlm_bl_59]
      > 49050      2  29  ffff88074ef5d0c0  RU   0.0       0      0  [ldlm_bl_60]
      > 49051      2  27  ffff880a6e2a0850  RU   0.0       0      0  [ldlm_bl_61]
      > 49052      2   2  ffff881747c447d0  RU   0.0       0      0  [ldlm_bl_62]
      > 49053      2   5  ffff8817e68fb100  RU   0.0       0      0  [ldlm_bl_63]
      > 49054      2   7  ffff880a6f8e47d0  RU   0.0       0      0  [ldlm_bl_64]
      > 49055      2  12  ffff881f3bf72080  RU   0.0       0      0  [ldlm_bl_65]
      > 49056      2   8  ffff88080736d850  RU   0.0       0      0  [ldlm_bl_66]
      > 49057      2  30  ffff880fb0e447d0  RU   0.0       0      0  [ldlm_bl_67]
      > 49058      2  26  ffff8817e8dcb040  RU   0.0       0      0  [ldlm_bl_68]
      > 49059      2  14  ffff8817d831b790  RU   0.0       0      0  [ldlm_bl_69]
      > 49060      2  23  ffff88080736d100  RU   0.0       0      0  [ldlm_bl_70]
      > 49061      2  22  ffff881037704100  RU   0.0       0      0  [ldlm_bl_71]
      > 49062      2  19  ffff8817d79687d0  RU   0.0       0      0  [ldlm_bl_72]
      > 49066      2  20  ffff88074db99040  RU   0.0       0      0  [ldlm_bl_76]
      > 49067      2  28  ffff88103779a0c0  RU   0.0       0      0  [ldlm_bl_77]
      > 49068      2  24  ffff8817de5b5810  RU   0.0       0      0  [ldlm_bl_78]
      > 60897  60840   6  ffff8807423e5790  RU   0.0    2412    148  expr
      > 77040      2  16  ffff881051753810  RU   0.0       0      0  [ldlm_bl_03]
      > 77139      2  18  ffff88207bf2e850  RU   0.0       0      0  [ldlm_bl_21]
      

      Thanks

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            louveta Alexandre Louvet (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: