Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3848

Compute node crashes due to error in Lustre : ASSERTION( me == md->md_me ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.1.3
    • 3
    • 9966

    Description

      Hi,

      At a customer site, a compute node running Lustre crashes with the following error:

      2013-08-23 17:08:55 LustreError: 3049:0:(lib-move.c:185:lnet_match_md()) ASSERTION( me == md->md_me ) failed:
      2013-08-23 17:08:55 LustreError: 3049:0:(lib-move.c:185:lnet_match_md()) LBUG
      2013-08-23 17:08:55 Pid: 3049, comm: kiblnd_sd_11
      2013-08-23 17:08:55
      2013-08-23 17:08:55 Call Trace:
      2013-08-23 17:08:55 [<ffffffffa04c47f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2013-08-23 17:08:55 [<ffffffffa04c4e07>] lbug_with_loc+0x47/0xb0 [libcfs]
      2013-08-23 17:08:55 [<ffffffffa053c2bd>] lnet_match_md+0x35d/0x3a0 [lnet]
      2013-08-23 17:08:55 [<ffffffffa053a8dd>] ? lnet_ni_recv+0xad/0x2f0 [lnet]
      2013-08-23 17:08:55 [<ffffffffa0541482>] lnet_parse+0xac2/0x1b80 [lnet]
      2013-08-23 17:08:55 [<ffffffff81042f33>] ? enqueue_task+0x43/0x90
      2013-08-23 17:08:55 [<ffffffffa08d177b>] kiblnd_handle_rx+0x2cb/0x680 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff8103d349>] ? __wake_up_common+0x59/0x90
      2013-08-23 17:08:55 [<ffffffffa08d2560>] kiblnd_rx_complete+0x2d0/0x440 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff81042cf3>] ? __wake_up+0x53/0x70
      2013-08-23 17:08:55 [<ffffffffa08d2732>] kiblnd_complete+0x62/0xe0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffffa08d2ae9>] kiblnd_scheduler+0x339/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff81048df0>] ? default_wake_function+0x0/0x20
      2013-08-23 17:08:55 [<ffffffffa08d27b0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff8100412a>] child_rip+0xa/0x20
      2013-08-23 17:08:55 [<ffffffffa08d27b0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffffa08d27b0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff81004120>] ? child_rip+0x0/0x20
      2013-08-23 17:08:55
      2013-08-23 17:08:55 Kernel panic - not syncing: LBUG
      2013-08-23 17:08:55 Pid: 3049, comm: kiblnd_sd_11 Tainted: G --------------- H 2.6.32-279.5.2.bl6.Bull.36.x86_64 #1
      2013-08-23 17:08:55 Call Trace:
      2013-08-23 17:08:55 [<ffffffff81495fe3>] ? panic+0xa0/0x168
      2013-08-23 17:08:55 [<ffffffffa04c4e5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      2013-08-23 17:08:55 [<ffffffffa053c2bd>] ? lnet_match_md+0x35d/0x3a0 [lnet]
      2013-08-23 17:08:55 [<ffffffffa053a8dd>] ? lnet_ni_recv+0xad/0x2f0 [lnet]
      2013-08-23 17:08:55 [<ffffffffa0541482>] ? lnet_parse+0xac2/0x1b80 [lnet]
      2013-08-23 17:08:55 [<ffffffff81042f33>] ? enqueue_task+0x43/0x90
      2013-08-23 17:08:55 [<ffffffffa08d177b>] ? kiblnd_handle_rx+0x2cb/0x680 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff8103d349>] ? __wake_up_common+0x59/0x90
      2013-08-23 17:08:55 [<ffffffffa08d2560>] ? kiblnd_rx_complete+0x2d0/0x440 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff81042cf3>] ? __wake_up+0x53/0x70
      2013-08-23 17:08:55 [<ffffffffa08d2732>] ? kiblnd_complete+0x62/0xe0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffffa08d2ae9>] ? kiblnd_scheduler+0x339/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff81048df0>] ? default_wake_function+0x0/0x20
      2013-08-23 17:08:55 [<ffffffffa08d27b0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff8100412a>] ? child_rip+0xa/0x20
      2013-08-23 17:08:55 [<ffffffffa08d27b0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffffa08d27b0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-08-23 17:08:55 [<ffffffff81004120>] ? child_rip+0x0/0x20
      

      The messages visible in the syslog 2 hours before the failed assertion are:

      2013-08-23 15:26:09 LustreError: 11-0: an error occurred while communicating with 10.200.200.66@o2ib. The obd_ping operation failed with -107
      2013-08-23 15:26:09 Lustre: scratch-OST0018-osc-ffff88087a659000: Connection to scratch-OST0018 (at 10.200.200.66@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      2013-08-23 15:26:09 LustreError: 167-0: This client was evicted by scratch-OST0018; in progress operations using this service will fail.
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) lock@ffff880e871c7d98[2 3 0 1 1 00000000] R(1):[4, 18446744073709551615]@[0x100180000:0x2e225e:0x0] {
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) lovsub@ffff880c2c27c8a0: [0 ffff880f31e5f268 R(1):[4, 18446744073709551615]@[0x2006bc5f0:0x1ac23:0x0]]
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) osc@ffff88104e603d78: ffff881072b1f240 40120002 0x4d3a65b709264355 3 ffff881012f1ae88 size: 26325 mtime: 1377264244 atime: 1377264247 ctime: 1377264244 blocks: 56
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) } lock@ffff880e871c7d98
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) dlmlock returned -5
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) lock@ffff880e871c7078[2 2 0 1 1 00000000] R(1):[2, 3]@[0x100180000:0x2e225e:0x0] {
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) lovsub@ffff880c2c27c920: [0 ffff8801fca3e4d8 R(1):[0, 18446744073709551615]@[0x2006bc5f0:0x1ac23:0x0]]
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) osc@ffff88104e603368: ffff88077ea89d80 40120002 0x4d3a65b709264363 2 (null) size: 26325 mtime: 1377264244 atime: 1377264247 ctime: 1377264244 blocks: 56
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) } lock@ffff880e871c7078
      2013-08-23 15:26:09 LustreError: 24191:0:(osc_lock.c:816:osc_ldlm_completion_ast()) dlmlock returned -5
      2013-08-23 15:26:09 LustreError: 24191:0:(ldlm_resource.c:749:ldlm_resource_complain()) Namespace scratch-OST0018-osc-ffff88087a659000 resource refcount nonzero (2) after lock cleanup; forcing cleanup.
      2013-08-23 15:26:09 LustreError: 24191:0:(ldlm_resource.c:755:ldlm_resource_complain()) Resource: ffff880e973c7480 (3023454/0/0/0) (rc: 2)
      2013-08-23 15:26:09 Lustre: scratch-OST0018-osc-ffff88087a659000: Connection restored to scratch-OST0018 (at 10.200.200.66@o2ib)
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try()) lock@ffff880e871c7698[2 4 0 2 0 00000000] R(1):[4, 18446744073709551615]@[0x2006bc5f0:0x1ac23:0x0] {
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try()) vvp@ffff880dafcd7d30:
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try()) lov@ffff880f31e5f268: 1
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try()) 0 0: ---
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try())
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try()) } lock@ffff880e871c7698
      2013-08-23 15:26:09 LustreError: 24071:0:(cl_lock.c:1413:cl_unuse_try()) unuse return -5
      

      After investigation, no error was seen on the Infiniband network by the time of the Lustre node crash.

      Thanks,
      Sebastien.

      Attachments

        Issue Links

          Activity

            [LU-3848] Compute node crashes due to error in Lustre : ASSERTION( me == md->md_me ) failed

            Hi Hugo, have you already applied the patch I mentioned previously? http://review.whamcloud.com/#/c/8511/ , it did fix potential memory corruption issue, although not sure if it's the reason of your issue here.

            liang Liang Zhen (Inactive) added a comment - Hi Hugo, have you already applied the patch I mentioned previously? http://review.whamcloud.com/#/c/8511/ , it did fix potential memory corruption issue, although not sure if it's the reason of your issue here.

            Hello Hugo, I will review the new crash-dump you uploaded to verify we still encounter the same kind of corruption.

            BTW, I have pushed a patch that should help to better identify the root cause of the MEs/MDs corruption in Slabs, for both LU-4330 and this ticket, it is available at http://review.whamcloud.com/8819/. It has already been successfully exposed to autotests, and you may want to give it a try. May be Sebastien could also validate it and try to integrate it in Bull distro?

            Since this patch extracts LNET MEs and small-MDs allocation from generic <size-128> kmem_cache, I strongly expect that you will no longer encounter LU-3848/LU-4330 LBUGs (and may be suffer new issues with others Kernel codes using <size-128> …). But if not, and you are able to reproduce the LBUGs, then I will need to investigate further.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Hugo, I will review the new crash-dump you uploaded to verify we still encounter the same kind of corruption. BTW, I have pushed a patch that should help to better identify the root cause of the MEs/MDs corruption in Slabs, for both LU-4330 and this ticket, it is available at http://review.whamcloud.com/8819/ . It has already been successfully exposed to autotests, and you may want to give it a try. May be Sebastien could also validate it and try to integrate it in Bull distro? Since this patch extracts LNET MEs and small-MDs allocation from generic <size-128> kmem_cache, I strongly expect that you will no longer encounter LU-3848 / LU-4330 LBUGs (and may be suffer new issues with others Kernel codes using <size-128> …). But if not, and you are able to reproduce the LBUGs, then I will need to investigate further.

            Bruno, Liang: This weekend I was able to obtain kernel coredumps again with this error (after a stable period os 3-4 weeks). As mentioned earlier the lustre version has been upgraded to 2.1.6. The uploaded file is LU3848-tcn23-vmcore-vmlinux-systemmap-weak-updates.zip; please let me know if you need additional files...

            hugo_meiland Hugo Meiland added a comment - Bruno, Liang: This weekend I was able to obtain kernel coredumps again with this error (after a stable period os 3-4 weeks). As mentioned earlier the lustre version has been upgraded to 2.1.6. The uploaded file is LU3848-tcn23-vmcore-vmlinux-systemmap-weak-updates.zip; please let me know if you need additional files...

            btw, I just posted a patch to fix potential memory corruption: http://review.whamcloud.com/#/c/8511/

            liang Liang Zhen (Inactive) added a comment - btw, I just posted a patch to fix potential memory corruption: http://review.whamcloud.com/#/c/8511/
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Hugo: thanks for the log, and yes I agree that no specific activity/msg can be found in the Syslog at crash/LBUG time.

            Liang: thanks for your comment.
            Since the same kind of corruption occured in a MD for LU-4330 and in the ME for this ticket, both being allocated in Slab/kmem-cache size-128. So, as per your advice, I will try to develop a patch where MEs and small MDs (<128bytes) will be allocated in a specific kmem-cache. On the other hand and specifically for MDs, I may also try to use/define LNET_USE_LIB_FREELIST, this way we will use full-size and pre-allocated MDs …

            It is still a bit early to directly link this ticket to LU-4330, because even if it concern the same kind of corruption (-3), the same size-128 kmem-cache, a node of the same cluster/system, the offset in the affected Slab/structure is different.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Hugo: thanks for the log, and yes I agree that no specific activity/msg can be found in the Syslog at crash/LBUG time. Liang: thanks for your comment. Since the same kind of corruption occured in a MD for LU-4330 and in the ME for this ticket, both being allocated in Slab/kmem-cache size-128. So, as per your advice, I will try to develop a patch where MEs and small MDs (<128bytes) will be allocated in a specific kmem-cache. On the other hand and specifically for MDs, I may also try to use/define LNET_USE_LIB_FREELIST, this way we will use full-size and pre-allocated MDs … It is still a bit early to directly link this ticket to LU-4330 , because even if it concern the same kind of corruption (-3), the same size-128 kmem-cache, a node of the same cluster/system, the offset in the affected Slab/structure is different.

            Because this kind of memory corruption can be from different modules, how about creating dedicated slab cache for both lnet_me_t and lnet_libmd_t (LU-4330)? If these are indeed defects in LNet, we still can reproduce, otherwise we should see different symptoms which might help to find out the real reason.

            liang Liang Zhen (Inactive) added a comment - Because this kind of memory corruption can be from different modules, how about creating dedicated slab cache for both lnet_me_t and lnet_libmd_t ( LU-4330 )? If these are indeed defects in LNet, we still can reproduce, otherwise we should see different symptoms which might help to find out the real reason.

            no usefull info on drivers loading during crash, for completeness syslog uploaded as LU-3848-syslog.zip

            hugo_meiland Hugo Meiland added a comment - no usefull info on drivers loading during crash, for completeness syslog uploaded as LU-3848 -syslog.zip

            Argh, this is weird looks like the me->me_md pointer, of the 1st ME linked on the_lnet.ln_portals[15]-> ptl_mlist (there are 1536 on the linked-list) being parsed in lnet_match_md(), is -3 bytes (!!) than the correct MD address ... :

            crash> bt
            PID: 3265   TASK: ffff881079e3c7b0  CPU: 0   COMMAND: "kiblnd_sd_19"
             #0 [ffff88107aed7a48] machine_kexec at ffffffff8102902b
             #1 [ffff88107aed7aa8] crash_kexec at ffffffff810a5292
             #2 [ffff88107aed7b78] panic at ffffffff81495fea
             #3 [ffff88107aed7bf8] lbug_with_loc at ffffffffa04fce5b [libcfs]
             #4 [ffff88107aed7c18] lnet_match_md at ffffffffa05742bd [lnet]
             #5 [ffff88107aed7cc8] lnet_parse at ffffffffa0579482 [lnet]
             #6 [ffff88107aed7db8] kiblnd_handle_rx at ffffffffa090977b [ko2iblnd]
             #7 [ffff88107aed7e08] kiblnd_rx_complete at ffffffffa090a560 [ko2iblnd]
             #8 [ffff88107aed7e58] kiblnd_complete at ffffffffa090a732 [ko2iblnd]
             #9 [ffff88107aed7e68] kiblnd_scheduler at ffffffffa090aae9 [ko2iblnd]
            #10 [ffff88107aed7f48] kernel_thread at ffffffff8100412a
            crash> 
            
            crash> p/x the_lnet.ln_portals[15]
            $6 = {
              ptl_mhash = 0x0, 
              ptl_mlist = {
                next = 0xffff88083674e3c0, 
                prev = 0xffff880866eaebc0
              }, 
              ptl_msgq = {
                next = 0xffff88107ce0d3d8, 
                prev = 0xffff88107ce0d3d8
              }, 
              ptl_ml_version = 0x35c7, 
              ptl_msgq_version = 0x1, 
              ptl_options = 0x5   <<<<<<<<----- LNET_PTL_LAZY | LNET_PTL_MATCH_WILDCARD
            }
            crash> p/x *(lnet_me_t *)0xffff88083674e3c0
            $13 = {
              me_list = {
                next = 0xffff88032837b3c0, 
                prev = 0xffff88107ce0d3c8
              }, 
              me_lh = {
                lh_hash_chain = {
                  next = 0xffff880865e2b450, 
                  prev = 0xffff880413b34850
                }, 
                lh_cookie = 0x45d7862
              }, 
              me_match_id = {
                nid = 0xffffffffffffffff, 
                pid = 0xffffffff
              }, 
              me_portal = 0xf, 
              me_match_bits = 0x0, 
              me_ignore_bits = 0xffffffffffffffff, 
              me_unlink = 0x1, 
              me_md = 0xffff88034c96543d   <<<<<<------ VERY BAD!!
            }
            crash> 
            crash> p/x ((lnet_me_t *)0xffff88083674e3c0)->me_md
            $18 = 0xffff88034c96543d
            crash> p/x ((lnet_me_t *)0xffff88083674e3c0)->me_md->md_me
            $19 = 0x83674e3c0000000
            crash> 
            crash> p/x ((struct lnet_libmd *)0xffff88034c96543d)->md_me
            $20 = 0x83674e3c0000000
            crash> p/x ((struct lnet_libmd *)0xffff88034c965440)->md_me
            $21 = 0xffff88083674e3c0
            crash> 
            

            But how can this happen ??!! When the only place where MD/me->me_md is set is in LNetMDAttach() and this after being allocated from SLabs.

            bfaccini Bruno Faccini (Inactive) added a comment - Argh, this is weird looks like the me->me_md pointer, of the 1st ME linked on the_lnet.ln_portals [15] -> ptl_mlist (there are 1536 on the linked-list) being parsed in lnet_match_md(), is -3 bytes (!!) than the correct MD address ... : crash> bt PID: 3265 TASK: ffff881079e3c7b0 CPU: 0 COMMAND: "kiblnd_sd_19" #0 [ffff88107aed7a48] machine_kexec at ffffffff8102902b #1 [ffff88107aed7aa8] crash_kexec at ffffffff810a5292 #2 [ffff88107aed7b78] panic at ffffffff81495fea #3 [ffff88107aed7bf8] lbug_with_loc at ffffffffa04fce5b [libcfs] #4 [ffff88107aed7c18] lnet_match_md at ffffffffa05742bd [lnet] #5 [ffff88107aed7cc8] lnet_parse at ffffffffa0579482 [lnet] #6 [ffff88107aed7db8] kiblnd_handle_rx at ffffffffa090977b [ko2iblnd] #7 [ffff88107aed7e08] kiblnd_rx_complete at ffffffffa090a560 [ko2iblnd] #8 [ffff88107aed7e58] kiblnd_complete at ffffffffa090a732 [ko2iblnd] #9 [ffff88107aed7e68] kiblnd_scheduler at ffffffffa090aae9 [ko2iblnd] #10 [ffff88107aed7f48] kernel_thread at ffffffff8100412a crash> crash> p/x the_lnet.ln_portals[15] $6 = { ptl_mhash = 0x0, ptl_mlist = { next = 0xffff88083674e3c0, prev = 0xffff880866eaebc0 }, ptl_msgq = { next = 0xffff88107ce0d3d8, prev = 0xffff88107ce0d3d8 }, ptl_ml_version = 0x35c7, ptl_msgq_version = 0x1, ptl_options = 0x5 <<<<<<<<----- LNET_PTL_LAZY | LNET_PTL_MATCH_WILDCARD } crash> p/x *(lnet_me_t *)0xffff88083674e3c0 $13 = { me_list = { next = 0xffff88032837b3c0, prev = 0xffff88107ce0d3c8 }, me_lh = { lh_hash_chain = { next = 0xffff880865e2b450, prev = 0xffff880413b34850 }, lh_cookie = 0x45d7862 }, me_match_id = { nid = 0xffffffffffffffff, pid = 0xffffffff }, me_portal = 0xf, me_match_bits = 0x0, me_ignore_bits = 0xffffffffffffffff, me_unlink = 0x1, me_md = 0xffff88034c96543d <<<<<<------ VERY BAD!! } crash> crash> p/x ((lnet_me_t *)0xffff88083674e3c0)->me_md $18 = 0xffff88034c96543d crash> p/x ((lnet_me_t *)0xffff88083674e3c0)->me_md->md_me $19 = 0x83674e3c0000000 crash> crash> p/x ((struct lnet_libmd *)0xffff88034c96543d)->md_me $20 = 0x83674e3c0000000 crash> p/x ((struct lnet_libmd *)0xffff88034c965440)->md_me $21 = 0xffff88083674e3c0 crash> But how can this happen ??!! When the only place where MD/me->me_md is set is in LNetMDAttach() and this after being allocated from SLabs.

            Thanks Hugo!
            In the Kernel log of the crash-dump you uploaded, I see some OOM traces and also VTune driver being loaded, since there are no time-stamps can you check in syslog (or uplod it too ?) if it occurs around the time of the LBUG ??
            Continuing crash-dump analysis.

            bfaccini Bruno Faccini (Inactive) added a comment - Thanks Hugo! In the Kernel log of the crash-dump you uploaded, I see some OOM traces and also VTune driver being loaded, since there are no time-stamps can you check in syslog (or uplod it too ?) if it occurs around the time of the LBUG ?? Continuing crash-dump analysis.
            hugo_meiland Hugo Meiland added a comment -

            ack, this is the one... please let me know if you need additional info!
            Hugo

            hugo_meiland Hugo Meiland added a comment - ack, this is the one... please let me know if you need additional info! Hugo

            Humm based on the 1st name construct, I think I was able to guess it, is'nt it LU-3848-vmlinux-systemmap-weak_updates.zip ?

            bfaccini Bruno Faccini (Inactive) added a comment - Humm based on the 1st name construct, I think I was able to guess it, is'nt it LU-3848 -vmlinux-systemmap-weak_updates.zip ?

            People

              bfaccini Bruno Faccini (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: