Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.7.0
-
None
-
lustre 2.7.1-fe
-
2
-
9223372036854775807
Description
OSS console errors
LNet: Can't send to 17456000@<65535:34821>: src 0@<0:0> is not a local nid^M
LNet: 46045:0:(lib-move.c:2241:LNetPut()) Error sending PUT to 0-17456000@<65535:34821>: -22^M
LNet: Can't send to 17456000@<65535:34821>: src 0@<0:0> is not a local nid^M
LNet: 56154:0:(lib-move.c:2241:LNetPut()) Error sending PUT to 0-17456000@<65535:34821>: -22^M
------------[ cut here ]------------^M
WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted)^M
Hardware name: SUMMIT^M
list_del corruption. prev->next should be ffff881d63ead4d0, but was (null)^M
Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) jbd2 acpi_cpufreq freq_table mperf lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) dm_round_robin scsi_dh_rdac lpfc scsi_transport_fc scsi_tgt sunrpc bonding ib_ucm(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) configfs ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) dm_mirror dm_region_hash dm_log dm_multipath dm_mod iTCO_wdt iTCO_vendor_support microcode sg wmi igb hwmon dca i2c_algo_bit ptp pps_core i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_bic ext3 jbd sd_mod crc_t10dif isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) mlx_compat(U) ahci gru [last unloaded: scsi_wait_scan]^M
Pid: 8603, comm: kiblnd_sd_02_01 Not tainted 2.6.32-504.30.3.el6.20151008.x86_64.lustre271 #1^M
Call Trace:^M
[<ffffffff81074127>] ? warn_slowpath_common+0x87/0xc0^M
[<ffffffff81074216>] ? warn_slowpath_fmt+0x46/0x50^M
[<ffffffff812bda6e>] ? list_del+0x6e/0xa0^M
[<ffffffffa052c5c9>] ? lnet_me_unlink+0x39/0x140 [lnet]^M
[<ffffffffa05303f8>] ? lnet_md_unlink+0x2f8/0x3e0 [lnet]^M
[<ffffffffa0531b9f>] ? lnet_try_match_md+0x22f/0x310 [lnet]^M
[<ffffffffa0a1f727>] ? kiblnd_recv+0x107/0x780 [ko2iblnd]^M
[<ffffffffa0531d1c>] ? lnet_mt_match_md+0x9c/0x1c0 [lnet]^M
[<ffffffffa0532621>] ? lnet_ptl_match_md+0x281/0x870 [lnet]^M
[<ffffffffa05396e7>] ? lnet_parse_local+0x307/0xc60 [lnet]^M
[<ffffffffa053a6da>] ? lnet_parse+0x69a/0xcf0 [lnet]^M
[<ffffffffa0a1ff3b>] ? kiblnd_handle_rx+0x19b/0x620 [ko2iblnd]^M
[<ffffffffa0a212be>] ? kiblnd_scheduler+0xefe/0x10d0 [ko2iblnd]^M
[<ffffffff81064f90>] ? default_wake_function+0x0/0x20^M
[<ffffffffa0a203c0>] ? kiblnd_scheduler+0x0/0x10d0 [ko2iblnd]^M
[<ffffffff8109dc8e>] ? kthread+0x9e/0xc0^M
[<ffffffff8100c28a>] ? child_rip+0xa/0x20^M
[<ffffffff8109dbf0>] ? kthread+0x0/0xc0^M
[<ffffffff8100c280>] ? child_rip+0x0/0x20^M
---[ end trace 1063d2ffc2578a2f ]---^M
------------[ cut here ]------------^M
From the crash dump bt looks like this.
PID: 8603 TASK: ffff8810271fa040 CPU: 11 COMMAND: "kiblnd_sd_02_01"
#0 [ffff880ff8b734f0] machine_kexec at ffffffff8103b5db
#1 [ffff880ff8b73550] crash_kexec at ffffffff810c9412
#2 [ffff880ff8b73620] kdb_kdump_check at ffffffff812973d7
#3 [ffff880ff8b73630] kdb_main_loop at ffffffff8129a5c7
#4 [ffff880ff8b73740] kdb_save_running at ffffffff8129472e
#5 [ffff880ff8b73750] kdba_main_loop at ffffffff8147cd68
#6 [ffff880ff8b73790] kdb at ffffffff812978c6
#7 [ffff880ff8b73800] kdba_entry at ffffffff8147c687
#8 [ffff880ff8b73810] notifier_call_chain at ffffffff81568515
#9 [ffff880ff8b73850] atomic_notifier_call_chain at ffffffff8156857a
#10 [ffff880ff8b73860] notify_die at ffffffff810a44fe
#11 [ffff880ff8b73890] __die at ffffffff815663e2
#12 [ffff880ff8b738c0] no_context at ffffffff8104c822
#13 [ffff880ff8b73910] __bad_area_nosemaphore at ffffffff8104cad5
#14 [ffff880ff8b73960] bad_area_nosemaphore at ffffffff8104cba3
#15 [ffff880ff8b73970] __do_page_fault at ffffffff8104d29c
#16 [ffff880ff8b73a90] do_page_fault at ffffffff8156845e
#17 [ffff880ff8b73ac0] page_fault at ffffffff81565765
[exception RIP: lnet_mt_match_md+135]
RIP: ffffffffa0531d07 RSP: ffff880ff8b73b70 RFLAGS: 00010286
RAX: ffff881d88420000 RBX: ffff880ff8b73c70 RCX: 0000000000000007
RDX: 0000000000000004 RSI: ffff880ff8b73c70 RDI: ffffffffffffffff
RBP: ffff880ff8b73bb0 R8: 0000000000000001 R9: d400000000000000
R10: 0000000000000001 R11: 0000000000000012 R12: 0000000000000000
R13: ffff881730ca6200 R14: 00d100120be91b91 R15: 0000000000000008
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#18 [ffff880ff8b73bb8] lnet_ptl_match_md at ffffffffa0532621 [lnet]
#19 [ffff880ff8b73c38] lnet_parse_local at ffffffffa05396e7 [lnet]
#20 [ffff880ff8b73cd8] lnet_parse at ffffffffa053a6da [lnet]
#21 [ffff880ff8b73d68] kiblnd_handle_rx at ffffffffa0a1ff3b [ko2iblnd]
#22 [ffff880ff8b73db8] kiblnd_scheduler at ffffffffa0a212be [ko2iblnd]
#23 [ffff880ff8b73ee8] kthread at ffffffff8109dc8e
#24 [ffff880ff8b73f48] kernel_thread at ffffffff8100c28a
Attachments
Issue Links
- is related to
-
LU-4330 LustreError: 46336:0:(events.c:433:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback ... ) failed
- Reopened
-
LU-7980 Overrun in generic <size-128> kmem_cache Slabs causing OSS to crash
- Resolved