Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.5.3
-
3
-
9223372036854775807
Description
We hit this crash on a Lustre 2.5 client, which matches the stack trace from LU-6596.
2016-07-29 14:07:39 [442101.409121] LustreError: 11-0: lsd-MDT0000-mdc-ffff88200c40b800: Communicating with 172.19.2.102@o2ib100, operation ldlm_enqueue failed with -19. 9680 2016-07-29 14:07:39 [442101.423805] Lustre: lsd-MDT0000-mdc-ffff88200c40b800: Connection to lsd-MDT0000 (at 172.19.2.102@o2ib100) was lost; in progress operations using this service will wait for recovery to complete 9681 2016-07-29 14:07:39 [442101.443039] Lustre: Skipped 12 previous similar messages 9682 2016-07-29 14:08:36 [442159.149684] LustreError: 166-1: MGC172.19.2.102@o2ib100: Connection to MGS (at 172.19.2.102@o2ib100) was lost; in progress operations using this service will fail 9683 2016-07-29 14:08:36 [442159.166010] LustreError: Skipped 1 previous similar message 9684 2016-07-29 14:10:42 [442284.831307] Lustre: Evicted from MGS (at 172.19.2.102@o2ib100) after server handle changed from 0xd01d02d77b80e403 to 0xd01d02d8bb9f7cc6 9685 2016-07-29 14:10:42 [442284.845115] Lustre: Skipped 1 previous similar message 9686 2016-07-29 14:10:42 [442284.851163] Lustre: MGC172.19.2.102@o2ib100: Connection restored to MGS (at 172.19.2.102@o2ib100) 9687 2016-07-29 14:10:43 [442285.410347] general protection fault: 0000 [#1] SMP 9688 2016-07-29 14:10:43 [442285.416010] Modules linked in: xfs libcrc32c lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_res olver xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack nf_log_ipv4 nf_log_common xt_LOG xt_multiport iptable_filter nfsv3 nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm ib_sa ocrdma iw_cxgb4 iw_cm iw_cxgb3 in tel_powerclamp coretemp hfi1(OE) intel_rapl iTCO_wdt ib_mad ib_core iTCO_vendor_support ipmi_devintf kvm ib_addr sg mei_me sb_edac pcspkr mei lpc_ich shpchp i2c_i801 edac_core mfd_core ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq nfsd auth_ rpcgss nfs_acl lockd grace binfmt_misc ip_tables ext4 mbcache jbd2 dm_service_time sd_mod crc_t10dif crct10dif_generic mxm_wmi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel syscopyarea sysfillrect sysimgblt drm_kms_helper aesni_intel be2net lrw ttm igb gf128mul glue_helper vxlan ahci dca ablk_helper libahci ip6_udp_tunnel ptp drm cryptd udp_tunnel libata pps_core i2c_algo_bit i2c_core wmi sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp be2iscsi bnx2i cnic uio cxgb4 cxgb3 mdio libcxgbi libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate dm_multipath dm_mod 9689 2016-07-29 14:10:43 [442285.556322] CPU: 12 PID: 9331 Comm: ptlrpcd_rcv Tainted: P OE ------------ 3.10.0-327.22.2.1chaos.ch6.x86_64 #1 9690 2016-07-29 14:10:43 [442285.568946] Hardware name: Penguin Computing Relion 2900e/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 9691 2016-07-29 14:10:43 [442285.581180] task: ffff88101b35c500 ti: ffff880ff8cd0000 task.ti: ffff880ff8cd0000 9692 2016-07-29 14:10:43 [442285.589631] RIP: 0010:[<ffffffffa0f1ed13>] [<ffffffffa0f1ed13>] ptlrpc_replay_next+0xd3/0x390 [ptlrpc] 9693 2016-07-29 14:10:43 [442285.600278] RSP: 0018:ffff880ff8cd3bd0 EFLAGS: 00010296 9694 2016-07-29 14:10:43 [442285.606305] RAX: 5a5a5a5a5a5a5a5a RBX: ffff88200c75f000 RCX: 0000004b0947e4a5 9695 2016-07-29 14:10:43 [442285.614370] RDX: ffff88200c75f0b0 RSI: ffff880e36ef5110 RDI: ffff88200c75f298 9696 2016-07-29 14:10:43 [442285.622431] RBP: ffff880ff8cd3bf8 R08: 0000000000000004 R09: 0000000000000028 9697 2016-07-29 14:10:43 [442285.630493] R10: ffff88200c75f080 R11: 0000000000000096 R12: ffff88200c75f298 9698 2016-07-29 14:10:43 [442285.638554] R13: ffff880ff8cd3c10 R14: 0000000000000000 R15: ffff880e36ef4d10 9699 2016-07-29 14:10:43 [442285.646618] FS: 0000000000000000(0000) GS:ffff88103f380000(0000) knlGS:0000000000000000 9700 2016-07-29 14:10:43 [442285.655749] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 9701 2016-07-29 14:10:43 [442285.662260] CR2: 00005555557efcd0 CR3: 0000000001962000 CR4: 00000000003407e0 9702 2016-07-29 14:10:43 [442285.670323] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 9703 2016-07-29 14:10:43 [442285.678385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 9704 2016-07-29 14:10:43 [442285.686446] Stack: 9705 2016-07-29 14:10:43 [442285.688785] ffff88200c75f000 ffff88200c75f298 0000000000000001 ffff8810272d4800 9706 2016-07-29 14:10:43 [442285.697180] ffff880e54f620c0 ffff880ff8cd3c38 ffffffffa0f42ff2 ffff880e54f62000 9707 2016-07-29 14:10:43 [442285.705576] ffff882000000000 0000000000000001 000000008573ebfd ffff88200c75f000 9708 2016-07-29 14:10:43 [442285.713975] Call Trace: 9709 2016-07-29 14:10:43 [442285.716825] [<ffffffffa0f42ff2>] ptlrpc_import_recovery_state_machine+0x1b2/0xc80 [ptlrpc] 9710 2016-07-29 14:10:43 [442285.726261] [<ffffffffa0f45c56>] ptlrpc_connect_interpret+0x7c6/0x22b0 [ptlrpc] 9711 2016-07-29 14:10:43 [442285.734629] [<ffffffffa0f1b794>] ptlrpc_check_set.part.21+0x2c4/0x2090 [ptlrpc] 9712 2016-07-29 14:10:43 [442285.742989] [<ffffffff810907be>] ? try_to_del_timer_sync+0x5e/0x90 9713 2016-07-29 14:10:43 [442285.750098] [<ffffffffa0f1d5bb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] 9714 2016-07-29 14:10:43 [442285.757402] [<ffffffffa0f4920b>] ptlrpcd_check+0x4eb/0x730 [ptlrpc] 9715 2016-07-29 14:10:43 [442285.764606] [<ffffffffa0f4967f>] ptlrpcd+0x22f/0x3f0 [ptlrpc] 9716 2016-07-29 14:10:43 [442285.771221] [<ffffffff810bd4b0>] ? wake_up_state+0x20/0x20 9717 2016-07-29 14:10:43 [442285.777551] [<ffffffffa0f49450>] ? ptlrpcd_check+0x730/0x730 [ptlrpc] 9718 2016-07-29 14:10:43 [442285.784942] [<ffffffff810a997f>] kthread+0xcf/0xe0 9719 2016-07-29 14:10:43 [442285.790483] [<ffffffff810a98b0>] ? kthread_create_on_node+0x140/0x140 9720 2016-07-29 14:10:43 [442285.797874] [<ffffffff8165d658>] ret_from_fork+0x58/0x90 9721 2016-07-29 14:10:43 [442285.803998] [<ffffffff810a98b0>] ? kthread_create_on_node+0x140/0x140 9722 2016-07-29 14:10:43 [442285.811382] Code: 00 48 8b 00 48 39 c2 48 89 83 c0 00 00 00 75 1b e9 b0 02 00 00 0f 1f 00 48 8b 00 48 39 c2 48 89 83 c0 00 00 00 0f 84 9c 00 00 00 <4c> 3b 70 f0 73 e7 4c 8d b8 f0 fe ff ff 4d 85 ff 0f 84 86 00 00 9723 2016-07-29 14:10:43 [442285.833162] RIP [<ffffffffa0f1ed13>] ptlrpc_replay_next+0xd3/0x390 [ptlrpc] 9724 2016-07-29 14:10:43 [442285.841150] RSP <ffff880ff8cd3bd0> 9725 2016-07-29 14:10:44 [442286.427802] ---[ end trace 3534deb517e19fd7 ]--- 9726 2016-07-29 14:10:44 [442286.487942] Kernel panic - not syncing: Fatal exception
Attachments
Issue Links
- is related to
-
LU-6802 sanity test_208 fail: “lease not broken over recovery"
- Resolved