Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.7.0
    • None
    • 2.7.1-fe
    • 3
    • 9223372036854775807

    Description

      OSS deadlocked unable to ping ethernet or IB interfaces. Console showed no errors.
      Attaching full trace of all threads. Most notable are kiblnd

      ID: 8711   TASK: ffff882020b12ab0  CPU: 4   COMMAND: "kiblnd_sd_01_01"
       #0 [ffff880060c86e90] crash_nmi_callback at ffffffff81032256
       #1 [ffff880060c86ea0] notifier_call_chain at ffffffff81568515
       #2 [ffff880060c86ee0] atomic_notifier_call_chain at ffffffff8156857a
       #3 [ffff880060c86ef0] notify_die at ffffffff810a44fe
       #4 [ffff880060c86f20] do_nmi at ffffffff8156618f
       #5 [ffff880060c86f50] nmi at ffffffff815659f0
          [exception RIP: _spin_lock+33]
          RIP: ffffffff81565261  RSP: ffff882021b75b70  RFLAGS: 00000293
          RAX: 0000000000002b8e  RBX: ffff880ffe7dd240  RCX: 0000000000000000
          RDX: 0000000000002b8b  RSI: 0000000000000003  RDI: ffff88201ee3f140
          RBP: ffff882021b75b70   R8: 6950000000000000   R9: 4a80000000000000
          R10: 0000000000000001  R11: 0000000000000001  R12: 0000000000000018
          R13: ffff881013262e40  R14: ffff8820268ecac0  R15: 0000000000000004
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #6 [ffff882021b75b70] _spin_lock at ffffffff81565261
       #7 [ffff882021b75b78] cfs_percpt_lock at ffffffffa049edab [libcfs]
       #8 [ffff882021b75bb8] lnet_ptl_match_md at ffffffffa0529605 [lnet]
       #9 [ffff882021b75c38] lnet_parse_local at ffffffffa05306e7 [lnet]
      #10 [ffff882021b75cd8] lnet_parse at ffffffffa05316da [lnet]
      #11 [ffff882021b75d68] kiblnd_handle_rx at ffffffffa0a16f3b [ko2iblnd]
      #12 [ffff882021b75db8] kiblnd_scheduler at ffffffffa0a182be [ko2iblnd]
      #13 [ffff882021b75ee8] kthread at ffffffff8109dc8e
      #14 [ffff882021b75f48] kernel_thread at ffffffff8100c28a
      

      Attachments

        Issue Links

          Activity

            [LU-8334] OSS lockup

            And what about the next = 0xffff8817213f0000 pointer, is it pointing to a valid area ? And if yes, of which type/family ?
            Also, does it leads to a loop that may be the cause of the looping/pseudo-hung execution of s_show()?
            Last, does the corruption of the slab at 0xffff8813e0ab36c0 seem to come from an overrun from previous locations ??

            bfaccini Bruno Faccini (Inactive) added a comment - And what about the next = 0xffff8817213f0000 pointer, is it pointing to a valid area ? And if yes, of which type/family ? Also, does it leads to a loop that may be the cause of the looping/pseudo-hung execution of s_show()? Last, does the corruption of the slab at 0xffff8813e0ab36c0 seem to come from an overrun from previous locations ??
            green Oleg Drokin added a comment - - edited
            crash> p *(struct slab *)0xffff8813e0ab36c0
            $2 = {
              list = {
                next = 0xffff8817213f0000, 
                prev = 0xffff881ae71d4540
              }, 
              colouroff = 0, 
              s_mem = 0xffff881ae71d4000, 
              inuse = 0, 
              free = 0, 
              nodeid = 1
            }
            
            

            kmem -S size-512 does not list content of this particular one because it is corrupted.
            the s_mem location is also invalid, so I cannot peek inside.

            green Oleg Drokin added a comment - - edited crash> p *(struct slab *)0xffff8813e0ab36c0 $2 = { list = { next = 0xffff8817213f0000, prev = 0xffff881ae71d4540 }, colouroff = 0, s_mem = 0xffff881ae71d4000, inuse = 0, free = 0, nodeid = 1 } kmem -S size-512 does not list content of this particular one because it is corrupted. the s_mem location is also invalid, so I cannot peek inside.

            Right, "crash/kmem -s" had already indicated this!
            But then, what about the full content of this slab and even "kmem -S size-512" output ?

            bfaccini Bruno Faccini (Inactive) added a comment - Right, "crash/kmem -s" had already indicated this! But then, what about the full content of this slab and even "kmem -S size-512" output ?
            green Oleg Drokin added a comment -

            it's "size-512"

            green Oleg Drokin added a comment - it's "size-512"

            Well slab at ffff8813e0ab36c0 may be corrupted and cause s_show() to loop ... Also, for which kmem_cache has it been allocated ?

            bfaccini Bruno Faccini (Inactive) added a comment - Well slab at ffff8813e0ab36c0 may be corrupted and cause s_show() to loop ... Also, for which kmem_cache has it been allocated ?
            green Oleg Drokin added a comment -

            Bruno: it looks like you are on to something.

            when I do kmem -s in the crashdump, I get this error:

            kmem: size-512: full list: slab: ffff8813e0ab36c0  bad prev pointer: ffff881ae71d4540
            kmem: size-512: full list: slab: ffff8813e0ab36c0  bad inuse counter: 0
            kmem: size-512: full list: slab: ffff8813e0ab36c0  bad s_mem pointer: ffff881ae71d4000
            

            Indeed, ffff881ae71d4540 is invalid kernel address that I cannot access. Same for ffff881ae71d4000

            green Oleg Drokin added a comment - Bruno: it looks like you are on to something. when I do kmem -s in the crashdump, I get this error: kmem: size-512: full list: slab: ffff8813e0ab36c0 bad prev pointer: ffff881ae71d4540 kmem: size-512: full list: slab: ffff8813e0ab36c0 bad inuse counter: 0 kmem: size-512: full list: slab: ffff8813e0ab36c0 bad s_mem pointer: ffff881ae71d4000 Indeed, ffff881ae71d4540 is invalid kernel address that I cannot access. Same for ffff881ae71d4000

            Could I get an update on the progress of this issue.

            Thanks,

            mhanafi Mahmoud Hanafi added a comment - Could I get an update on the progress of this issue. Thanks,

            Oleg, just in case it may help, I have better analyzed the all/whole stacks listing originally provided, and I am definitely convinced now that, as I already strongly suspected before, this "python" thread parsing the /proc/slabinfo stuff is the one blocking all the others.

            As per its stack content, that I have already pointed before, it should have taken/spin_lock()'ed a kmem_cache->nodelists[]->list_lock (located @0xffff88203fc00500), that's blocking several others threads, and particularly 2 kiblnd threads (with PIDs 8713/8715) that I have found with the following similar stack :

            PID: 8713   TASK: ffff882027d55520  CPU: 9   COMMAND: "kiblnd_sd_02_01"
             #0 [ffff881078826e90] crash_nmi_callback at ffffffff81032256
             #1 [ffff881078826ea0] notifier_call_chain at ffffffff81568515
             #2 [ffff881078826ee0] atomic_notifier_call_chain at ffffffff8156857a
             #3 [ffff881078826ef0] notify_die at ffffffff810a44fe
             #4 [ffff881078826f20] do_nmi at ffffffff8156618f
             #5 [ffff881078826f50] nmi at ffffffff815659f0
                [exception RIP: _spin_lock+33]
                RIP: ffffffff81565261  RSP: ffff88201e8c5ba0  RFLAGS: 00000097
                RAX: 00000000000056fb  RBX: ffff881376245200  RCX: 0000000000000001
                RDX: 00000000000056f7  RSI: 000000000000005a  RDI: ffff88203fc00500
                RBP: ffff88201e8c5ba0   R8: 0000000000000000   R9: ffff881376245200
                R10: 0000000000000001  R11: 0000000000000000  R12: ffff8820294f8c00
                R13: 0000000000000286  R14: ffff88103fcb02c0  R15: ffff88203fc004c0
                ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
            --- <NMI exception stack> ---
             #6 [ffff88201e8c5ba0] _spin_lock at ffffffff81565261
             #7 [ffff88201e8c5ba8] kfree at ffffffff811704b9
             #8 [ffff88201e8c5c18] null_free_rs at ffffffffa08412b0 [ptlrpc]
             #9 [ffff88201e8c5c38] sptlrpc_svc_free_rs at ffffffffa082fded [ptlrpc]
            #10 [ffff88201e8c5c58] lustre_free_reply_state at ffffffffa0802f64 [ptlrpc]
            #11 [ffff88201e8c5c68] reply_out_callback at ffffffffa0809030 [ptlrpc]
            #12 [ffff88201e8c5c98] ptlrpc_master_callback at ffffffffa08085ca [ptlrpc]
            #13 [ffff88201e8c5ca8] lnet_eq_enqueue_event at ffffffffa0525b55 [lnet]
            #14 [ffff88201e8c5cc8] lnet_msg_detach_md at ffffffffa0523cef [lnet]
            #15 [ffff88201e8c5cf8] lnet_finalize at ffffffffa0524dd6 [lnet]
            #16 [ffff88201e8c5d68] kiblnd_tx_done at ffffffffa0a11fd7 [ko2iblnd]
            #17 [ffff88201e8c5db8] kiblnd_scheduler at ffffffffa0a18170 [ko2iblnd]
            #18 [ffff88201e8c5ee8] kthread at ffffffff8109dc8e
            #19 [ffff88201e8c5f48] kernel_thread at ffffffff8100c28a
            

            and according to this stack's content, these 2 threads should have spin_lock()'ed (lnet_res_lock()/cfs_percpt_lock() before calling lnet_msg_detach_md()) the lock(s) that all others threads spinning in cfs_percpt_lock() are waiting for.

            Now the question is, why does s_show() take so long to release its kmem_cache->nodelists[]->list_lock ? Could there be a loop in any of the associated slabs_full/slabs_partial lists ?

            bfaccini Bruno Faccini (Inactive) added a comment - Oleg, just in case it may help, I have better analyzed the all/whole stacks listing originally provided, and I am definitely convinced now that, as I already strongly suspected before, this "python" thread parsing the /proc/slabinfo stuff is the one blocking all the others. As per its stack content, that I have already pointed before, it should have taken/spin_lock()'ed a kmem_cache->nodelists[]->list_lock (located @0xffff88203fc00500), that's blocking several others threads, and particularly 2 kiblnd threads (with PIDs 8713/8715) that I have found with the following similar stack : PID: 8713 TASK: ffff882027d55520 CPU: 9 COMMAND: "kiblnd_sd_02_01" #0 [ffff881078826e90] crash_nmi_callback at ffffffff81032256 #1 [ffff881078826ea0] notifier_call_chain at ffffffff81568515 #2 [ffff881078826ee0] atomic_notifier_call_chain at ffffffff8156857a #3 [ffff881078826ef0] notify_die at ffffffff810a44fe #4 [ffff881078826f20] do_nmi at ffffffff8156618f #5 [ffff881078826f50] nmi at ffffffff815659f0 [exception RIP: _spin_lock+33] RIP: ffffffff81565261 RSP: ffff88201e8c5ba0 RFLAGS: 00000097 RAX: 00000000000056fb RBX: ffff881376245200 RCX: 0000000000000001 RDX: 00000000000056f7 RSI: 000000000000005a RDI: ffff88203fc00500 RBP: ffff88201e8c5ba0 R8: 0000000000000000 R9: ffff881376245200 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8820294f8c00 R13: 0000000000000286 R14: ffff88103fcb02c0 R15: ffff88203fc004c0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88201e8c5ba0] _spin_lock at ffffffff81565261 #7 [ffff88201e8c5ba8] kfree at ffffffff811704b9 #8 [ffff88201e8c5c18] null_free_rs at ffffffffa08412b0 [ptlrpc] #9 [ffff88201e8c5c38] sptlrpc_svc_free_rs at ffffffffa082fded [ptlrpc] #10 [ffff88201e8c5c58] lustre_free_reply_state at ffffffffa0802f64 [ptlrpc] #11 [ffff88201e8c5c68] reply_out_callback at ffffffffa0809030 [ptlrpc] #12 [ffff88201e8c5c98] ptlrpc_master_callback at ffffffffa08085ca [ptlrpc] #13 [ffff88201e8c5ca8] lnet_eq_enqueue_event at ffffffffa0525b55 [lnet] #14 [ffff88201e8c5cc8] lnet_msg_detach_md at ffffffffa0523cef [lnet] #15 [ffff88201e8c5cf8] lnet_finalize at ffffffffa0524dd6 [lnet] #16 [ffff88201e8c5d68] kiblnd_tx_done at ffffffffa0a11fd7 [ko2iblnd] #17 [ffff88201e8c5db8] kiblnd_scheduler at ffffffffa0a18170 [ko2iblnd] #18 [ffff88201e8c5ee8] kthread at ffffffff8109dc8e #19 [ffff88201e8c5f48] kernel_thread at ffffffff8100c28a and according to this stack's content, these 2 threads should have spin_lock()'ed (lnet_res_lock()/cfs_percpt_lock() before calling lnet_msg_detach_md()) the lock(s) that all others threads spinning in cfs_percpt_lock() are waiting for. Now the question is, why does s_show() take so long to release its kmem_cache->nodelists[]->list_lock ? Could there be a loop in any of the associated slabs_full/slabs_partial lists ?

            core file uploaded and email was sent to Oleg.

            mhanafi Mahmoud Hanafi added a comment - core file uploaded and email was sent to Oleg.
            pjones Peter Jones added a comment -

            Oleg

            pjones Peter Jones added a comment - Oleg

            I can upload the encrypted crash dump. Who should I send the password to?

            mhanafi Mahmoud Hanafi added a comment - I can upload the encrypted crash dump. Who should I send the password to?

            People

              bfaccini Bruno Faccini (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: