Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.16.0
-
None
-
master, RHEL8.7
-
3
-
9223372036854775807
Description
server crashed due to NULL pointer dereference in kiblnd_passive_connect below
[14161.702631] libcfs: HW NUMA nodes: 1, HW CPU cores: 24, npartitions: 4 [14161.705274] alg: No test for adler32 (adler32-zlib) [14162.456545] Key type ._llcrypt registered [14162.457357] Key type .llcrypt registered [14162.484133] Lustre: Lustre: Build Version: 2.15.58_109_g40074d3 [14162.540341] LNet: Using FastReg for registration [14162.750736] LNet: Added LNI 10.0.11.209@o2ib12 [32/1024/0/180] [14162.950680] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [14162.951989] PGD 0 [14162.952520] Oops: 0000 [#1] SMP NOPTI [14162.953250] CPU: 22 PID: 201160 Comm: kworker/22:4 Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.13.1.el8_lustre.ddn17.x86_64 #1 [14162.955184] Hardware name: DDN SFA400NVX2E, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [14162.956667] Workqueue: ib_cm cm_work_handler [ib_cm] [14162.957565] RIP: 0010:kiblnd_passive_connect+0x1395/0x1620 [ko2iblnd] [14162.958644] Code: c7 05 63 81 01 00 00 01 00 00 e8 26 03 f4 ff 48 89 df ba 40 00 00 00 48 89 c6 e8 06 10 f4 ff 45 8b b4 24 24 01 00 00 49 89 c7 <48> 8b 04 25 40 00 00 00 48 8d 58 38 e8 fa 02 f4 ff 48 89 df ba 40 [14162.961535] RSP: 0018:ff7a599b4dca79a0 EFLAGS: 00010246 [14162.962473] RAX: ffffffffc1038f00 RBX: 0005001614010bd1 RCX: 0000000000000000 [14162.963534] LNet: Added LNI 20.1.11.209@o2ib22 [32/1024/0/180] [14162.963649] RDX: ffffffffc1038f12 RSI: 0000000000000000 RDI: 0000000000000000 [14162.965863] RBP: ff36491ca4dbcc00 R08: 0000000000000001 R09: 0000000000000000 [14162.967015] R10: ffffffffc1038f40 R11: ffffffffc1038f12 R12: ff364925b2ba2a00 [14162.968167] R13: ff36492daa67a5b0 R14: 0000000000000000 R15: ffffffffc1038f00 [14162.969313] FS: 0000000000000000(0000) GS:ff36493e31b80000(0000) knlGS:0000000000000000 [14162.970594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [14162.971560] CR2: 0000000000000040 CR3: 0000000f8bc10003 CR4: 0000000000771ee0 [14162.972711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [14162.973846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [14162.974976] PKRU: 55555554 [14162.975553] Call Trace: [14162.976085] ? xas_store+0x56/0x5a0 [14162.976755] kiblnd_cm_callback+0x3d7/0x1e90 [ko2iblnd] [14162.977639] ? __xa_alloc_cyclic+0x49/0xe0 [14162.978375] cma_cm_event_handler+0x25/0xd0 [rdma_cm] [14162.979227] cma_ib_req_handler+0x7d1/0x1260 [rdma_cm] [14162.980090] ? update_group_capacity+0x25/0x220 [14162.980872] cm_process_work+0x22/0xf0 [ib_cm] [14162.981638] cm_req_handler+0x7f1/0xf40 [ib_cm] [14162.982416] cm_work_handler+0x79c/0xf30 [ib_cm] [14162.983198] ? __switch_to+0x10c/0x450 [14162.983872] ? finish_task_switch+0xaf/0x2e0 [14162.984607] process_one_work+0x1a7/0x360 [14162.985300] ? create_worker+0x1a0/0x1a0 [14162.985979] worker_thread+0x30/0x390 [14162.986623] ? create_worker+0x1a0/0x1a0 [14162.987292] kthread+0x10b/0x130 [14162.987874] ? set_kthread_struct+0x50/0x50 [14162.988577] ret_from_fork+0x1f/0x40 [14162.989205] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm irqbypass iTCO_wdt ppdev iTCO_vendor_support crct10dif_pclmul crc32_pclmul bochs drm_vram_helper drm_ttm_helper ghash_clmulni_intel ttm rapl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr i2c_i801 drm joydev lpc_ich i6300esb parport_pc parport ext4 mbcache jbd2 sr_mod sd_mod cdrom t10_pi sg mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) mlxfw(OE) pci_hyperv_intf ahci tls libahci psample mlxdevm(OE) virtio_net libata bnxt_en crc32c_intel net_failover serio_raw virtio_blk mlx_compat(OE) virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
Attachments
Issue Links
- duplicates
-
LU-17071 o2iblnd: Oops caused by IBLND_REJECT_EARLY code
- Resolved