Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
Server VM crashed when try to bring the cluster up
[ 26.634442] Key type ._llcrypt registered [ 26.635386] Key type .llcrypt registered [ 26.646303] libcfs: HW NUMA nodes: 1, HW CPU cores: 20, npartitions: 5 [ 26.649550] alg: No test for adler32 (adler32-zlib) [ 27.417181] Lustre: Lustre: Build Version: 2.15.90_23_g8011e33 [ 27.449211] LNet: Added LNI 172.25.80.50@tcp [8/320/0/180] [ 27.450197] LNet: Accept secure, port 988 [ 30.168646] BUG: unable to handle kernel paging request at ffffa4c2cfdbf798 [ 30.169772] PGD 10014a067 P4D 10014a067 PUD 10014b067 PMD 10fe8e067 PTE 0 [ 30.170894] Oops: 0000 [#1] SMP NOPTI [ 30.171490] CPU: 9 PID: 15626 Comm: socknal_cd03 Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [ 30.173438] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 30.174845] RIP: 0010:lnet_ping_event_handler+0x31/0x130 [lnet] [ 30.175865] Code: 48 89 fb 48 8b af 88 00 00 00 f6 05 91 fd fd ff 02 74 09 f6 05 8c fd fd ff 04 75 50 8b 83 a4 00 00 00 85 c0 0f 84 c2 00 00 00 <8b> 75 00 85 f6 0f 84 cb 00 00 00 8b 8b a8 00 00 00 85 c9 74 1c c7 [ 30.178703] RSP: 0018:ffffa4c2cf00fd90 EFLAGS: 00010282 [ 30.179526] RAX: 00000000ffffff8f RBX: ffff889b4e388518 RCX: 00000000801e001d [ 30.180609] RDX: 00000000801e001e RSI: 0000000000000000 RDI: ffff889b4e388518 [ 30.181877] RBP: ffffa4c2cfdbf798 R08: 0000000000000001 R09: 0000000000000001 [ 30.183081] R10: ffff889b15efb990 R11: ffff889c2b15f101 R12: 0000000000000001 [ 30.184181] R13: ffffffffc0bb4750 R14: 0000000000000000 R15: 00000000ffffff8f [ 30.185337] FS: 0000000000000000(0000) GS:ffff88bee9a40000(0000) knlGS:0000000000000000 [ 30.186687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.187716] CR2: ffffa4c2cfdbf798 CR3: 000000118c410004 CR4: 0000000000770ee0 [ 30.188800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 30.189878] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 30.190970] PKRU: 55555554 [ 30.191411] Call Trace: [ 30.191842] ? __die_body+0x1a/0x60 [ 30.192418] ? no_context+0x1ba/0x3f0 [ 30.193007] ? string+0x44/0x60 [ 30.193526] ? __bad_area_nosemaphore+0x157/0x180 [ 30.194246] ? do_page_fault+0x37/0x12d [ 30.194822] ? page_fault+0x1e/0x30 [ 30.195519] ? lnet_unregister_lnd+0xf0/0xf0 [lnet] [ 30.196453] ? lnet_ping_event_handler+0x31/0x130 [lnet] [ 30.197385] lnet_finalize+0x5f1/0xa80 [lnet] [ 30.198114] ? ksocknal_tx_done+0x60/0xe0 [ksocklnd] [ 30.198972] ? kfree+0x22e/0x250 [ 30.199507] ksocknal_txlist_done+0xf9/0x2a0 [ksocklnd] [ 30.200435] ksocknal_connd+0xad6/0xdd0 [ksocklnd] [ 30.201288] ? finish_wait+0x80/0x80 [ 30.201888] ? ksocknal_thread_fini+0x20/0x20 [ksocklnd] [ 30.202721] kthread+0x134/0x150 [ 30.203250] ? set_kthread_struct+0x50/0x50 [ 30.204237] ret_from_fork+0x1f/0x40 [ 30.205019] Modules linked in: lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) sctp ip6_udp_tunnel udp_tunnel libcrc32c rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common intel_uncore_frequency_common isst_if_common nfit libnvdimm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl joydev lpc_ich pcspkr i2c_i801 i6300esb bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt drm ext4 mbcache jbd2 sd_mod t10_pi sr_mod cdrom sg mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) mlxfw(OE) psample ahci pci_hyperv_intf tls virtio_net libahci mlxdevm(OE) net_failover libata crc32c_intel serio_raw igbvf virtio_blk mlx_compat(OE) virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [ 30.218312] CR2: ffffa4c2cfdbf798 [ 30.219039] ---[ end trace 50ac79dd536b4aeb ]--- [ 30.219944] RIP: 0010:lnet_ping_event_handler+0x31/0x130 [lnet] [ 30.221078] Code: 48 89 fb 48 8b af 88 00 00 00 f6 05 91 fd fd ff 02 74 09 f6 05 8c fd fd ff 04 75 50 8b 83 a4 00 00 00 85 c0 0f 84 c2 00 00 00 <8b> 75 00 85 f6 0f 84 cb 00 00 00 8b 8b a8 00 00 00 85 c9 74 1c c7 [ 30.224428] RSP: 0018:ffffa4c2cf00fd90 EFLAGS: 00010282 [ 30.225552] RAX: 00000000ffffff8f RBX: ffff889b4e388518 RCX: 00000000801e001d [ 30.226930] RDX: 00000000801e001e RSI: 0000000000000000 RDI: ffff889b4e388518 [ 30.228340] RBP: ffffa4c2cfdbf798 R08: 0000000000000001 R09: 0000000000000001 [ 30.229797] R10: ffff889b15efb990 R11: ffff889c2b15f101 R12: 0000000000000001 [ 30.231078] R13: ffffffffc0bb4750 R14: 0000000000000000 R15: 00000000ffffff8f [ 30.232336] FS: 0000000000000000(0000) GS:ffff88bee9a40000(0000) knlGS:0000000000000000 [ 30.233737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.234792] CR2: ffffa4c2cfdbf798 CR3: 000000118c410004 CR4: 0000000000770ee0 [ 30.236066] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 30.237298] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 30.238744] PKRU: 55555554 [ 30.239465] Kernel panic - not syncing: Fatal exception [ 30.240627] Kernel Offset: 0x3ac00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 30.242530] ---[ end Kernel panic - not syncing: Fatal exception ]---
Attachments
Issue Links
- is related to
-
LU-18160 lnetctl ping can hang forever
- Resolved