Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8362

page fault: exception RIP: lnet_mt_match_md+135

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.7.0
    • None
    • lustre 2.7.1-fe
    • 2
    • 9223372036854775807

    Description

      OSS console errors

      LNet: Can't send to 17456000@<65535:34821>: src 0@<0:0> is not a local nid^M
      LNet: 46045:0:(lib-move.c:2241:LNetPut()) Error sending PUT to 0-17456000@<65535:34821>: -22^M
      LNet: Can't send to 17456000@<65535:34821>: src 0@<0:0> is not a local nid^M
      LNet: 56154:0:(lib-move.c:2241:LNetPut()) Error sending PUT to 0-17456000@<65535:34821>: -22^M
      ------------[ cut here ]------------^M
      WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted)^M
      Hardware name: SUMMIT^M
      list_del corruption. prev->next should be ffff881d63ead4d0, but was (null)^M
      Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) jbd2 acpi_cpufreq freq_table mperf lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) dm_round_robin scsi_dh_rdac lpfc scsi_transport_fc scsi_tgt sunrpc bonding ib_ucm(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) configfs ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) dm_mirror dm_region_hash dm_log dm_multipath dm_mod iTCO_wdt iTCO_vendor_support microcode sg wmi igb hwmon dca i2c_algo_bit ptp pps_core i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_bic ext3 jbd sd_mod crc_t10dif isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) mlx_compat(U) ahci gru [last unloaded: scsi_wait_scan]^M
      Pid: 8603, comm: kiblnd_sd_02_01 Not tainted 2.6.32-504.30.3.el6.20151008.x86_64.lustre271 #1^M
      Call Trace:^M
       [<ffffffff81074127>] ? warn_slowpath_common+0x87/0xc0^M
       [<ffffffff81074216>] ? warn_slowpath_fmt+0x46/0x50^M
       [<ffffffff812bda6e>] ? list_del+0x6e/0xa0^M
       [<ffffffffa052c5c9>] ? lnet_me_unlink+0x39/0x140 [lnet]^M
       [<ffffffffa05303f8>] ? lnet_md_unlink+0x2f8/0x3e0 [lnet]^M
       [<ffffffffa0531b9f>] ? lnet_try_match_md+0x22f/0x310 [lnet]^M
       [<ffffffffa0a1f727>] ? kiblnd_recv+0x107/0x780 [ko2iblnd]^M
       [<ffffffffa0531d1c>] ? lnet_mt_match_md+0x9c/0x1c0 [lnet]^M
       [<ffffffffa0532621>] ? lnet_ptl_match_md+0x281/0x870 [lnet]^M
       [<ffffffffa05396e7>] ? lnet_parse_local+0x307/0xc60 [lnet]^M
       [<ffffffffa053a6da>] ? lnet_parse+0x69a/0xcf0 [lnet]^M
       [<ffffffffa0a1ff3b>] ? kiblnd_handle_rx+0x19b/0x620 [ko2iblnd]^M
       [<ffffffffa0a212be>] ? kiblnd_scheduler+0xefe/0x10d0 [ko2iblnd]^M
       [<ffffffff81064f90>] ? default_wake_function+0x0/0x20^M
       [<ffffffffa0a203c0>] ? kiblnd_scheduler+0x0/0x10d0 [ko2iblnd]^M
       [<ffffffff8109dc8e>] ? kthread+0x9e/0xc0^M
       [<ffffffff8100c28a>] ? child_rip+0xa/0x20^M
       [<ffffffff8109dbf0>] ? kthread+0x0/0xc0^M
       [<ffffffff8100c280>] ? child_rip+0x0/0x20^M
      ---[ end trace 1063d2ffc2578a2f ]---^M
      ------------[ cut here ]------------^M
      

      From the crash dump bt looks like this.

      PID: 8603   TASK: ffff8810271fa040  CPU: 11  COMMAND: "kiblnd_sd_02_01"
       #0 [ffff880ff8b734f0] machine_kexec at ffffffff8103b5db
       #1 [ffff880ff8b73550] crash_kexec at ffffffff810c9412
       #2 [ffff880ff8b73620] kdb_kdump_check at ffffffff812973d7
       #3 [ffff880ff8b73630] kdb_main_loop at ffffffff8129a5c7
       #4 [ffff880ff8b73740] kdb_save_running at ffffffff8129472e
       #5 [ffff880ff8b73750] kdba_main_loop at ffffffff8147cd68
       #6 [ffff880ff8b73790] kdb at ffffffff812978c6
       #7 [ffff880ff8b73800] kdba_entry at ffffffff8147c687
       #8 [ffff880ff8b73810] notifier_call_chain at ffffffff81568515
       #9 [ffff880ff8b73850] atomic_notifier_call_chain at ffffffff8156857a
      #10 [ffff880ff8b73860] notify_die at ffffffff810a44fe
      #11 [ffff880ff8b73890] __die at ffffffff815663e2
      #12 [ffff880ff8b738c0] no_context at ffffffff8104c822
      #13 [ffff880ff8b73910] __bad_area_nosemaphore at ffffffff8104cad5
      #14 [ffff880ff8b73960] bad_area_nosemaphore at ffffffff8104cba3
      #15 [ffff880ff8b73970] __do_page_fault at ffffffff8104d29c
      #16 [ffff880ff8b73a90] do_page_fault at ffffffff8156845e
      #17 [ffff880ff8b73ac0] page_fault at ffffffff81565765
          [exception RIP: lnet_mt_match_md+135]
          RIP: ffffffffa0531d07  RSP: ffff880ff8b73b70  RFLAGS: 00010286
          RAX: ffff881d88420000  RBX: ffff880ff8b73c70  RCX: 0000000000000007
          RDX: 0000000000000004  RSI: ffff880ff8b73c70  RDI: ffffffffffffffff
          RBP: ffff880ff8b73bb0   R8: 0000000000000001   R9: d400000000000000
          R10: 0000000000000001  R11: 0000000000000012  R12: 0000000000000000
          R13: ffff881730ca6200  R14: 00d100120be91b91  R15: 0000000000000008
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      #18 [ffff880ff8b73bb8] lnet_ptl_match_md at ffffffffa0532621 [lnet]
      #19 [ffff880ff8b73c38] lnet_parse_local at ffffffffa05396e7 [lnet]
      #20 [ffff880ff8b73cd8] lnet_parse at ffffffffa053a6da [lnet]
      #21 [ffff880ff8b73d68] kiblnd_handle_rx at ffffffffa0a1ff3b [ko2iblnd]
      #22 [ffff880ff8b73db8] kiblnd_scheduler at ffffffffa0a212be [ko2iblnd]
      #23 [ffff880ff8b73ee8] kthread at ffffffff8109dc8e
      #24 [ffff880ff8b73f48] kernel_thread at ffffffff8100c28a
      

      Attachments

        1. lnet_msg-lnet_match_table.data
          3 kB
        2. lnet_mt_match_md.dis
          8 kB
        3. lnet_mt_match_md.withlinenumbers.dis
          10 kB
        4. lu-8362.20160725
          37 kB
        5. lu8362.20160802
          409 kB
        6. lu8362-20160803
          4 kB

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: