Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.16.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Here are the latest messages and stack at the time of the crash :

[3060059.779323] list_add corruption. prev->next should be next (ffffffffc0daa210), but was ff810a564ca87ec8. (prev=ff433550d76b8a20).
[3060059.781424] ------------[ cut here ]------------
[3060059.782441] kernel BUG at lib/list_debug.c:28!
[3060059.783327] invalid opcode: 0000 [#1] SMP NOPTI
[3060059.784187] CPU: 16 PID: 2183470 Comm: lnet_discovery Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
[3060059.786275] Hardware name: DDN SFA400NVX2E, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[3060059.787822] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
[3060059.788790] Code: d0 46 91 00 48 89 d1 48 c7 c7 80 80 13 8f 48 89 c2 e8 52 cb c7 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 d8 80 13 8f e8 3e cb c7 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 68 81 13 8f e8 2a cb c7 ff 0f 0b
[3060059.791787] RSP: 0018:ff810a5681ee3dd8 EFLAGS: 00010246
[3060059.792757] RAX: 0000000000000075 RBX: ff43355ded7fb200 RCX: 0000000000000000
[3060059.793979] RDX: 0000000000000000 RSI: ff43357271a1e698 RDI: ff43357271a1e698
[3060059.795199] RBP: ff43355ded7fb220 R08: 0000000000000000 R09: c0000000ffff7fff
[3060059.796416] R10: 0000000000000001 R11: ff810a5681ee3bf8 R12: ff433550d76b8a20
[3060059.797631] R13: 0000000000000002 R14: ff433555412fc2b0 R15: 0000000000000030
[3060059.798847] FS:  0000000000000000(0000) GS:ff43357271a00000(0000) knlGS:0000000000000000
[3060059.800182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3060059.801208] CR2: 00007fab1e8c7000 CR3: 000000091c210006 CR4: 0000000000771ee0
[3060059.802421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3060059.803630] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[3060059.804838] PKRU: 55555554
[3060059.805462] Call Trace:
[3060059.806052]  lnet_peer_ni_add_to_recoveryq_locked.part.27+0x5e/0x150 [lnet]
[3060059.807254]  lnet_peer_merge_data+0xcfa/0x1210 [lnet]
[3060059.808202]  lnet_peer_discovery+0xea7/0x1680 [lnet]
[3060059.809130]  ? finish_wait+0x80/0x80
[3060059.809863]  ? lnet_peer_merge_data+0x1210/0x1210 [lnet]
[3060059.810830]  kthread+0x134/0x150
[3060059.830811]  ? set_kthread_struct+0x50/0x50
[3060059.831615]  ret_from_fork+0x1f/0x40
[3060059.832340] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) ptlrpc_gss(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) binfmt_misc sctp ip6_udp_tunnel udp_tunnel libcrc32c rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) intel_rapl_msr intel_rapl_common intel_uncore_frequency_common nfit libnvdimm bochs drm_vram_helper drm_ttm_helper kvm_intel ttm drm_kms_helper kvm mlx5_core(OE) iTCO_wdt ppdev syscopyarea iTCO_vendor_support mlxdevm(OE) irqbypass crct10dif_pclmul sysfillrect mlx_compat(OE) crc32_pclmul sysimgblt ghash_clmulni_intel psample fb_sys_fops rapl mlxfw(OE) drm parport_pc tls bnxt_en pci_hyperv_intf i2c_i801 lpc_ich pcspkr parport joydev i6300esb auth_rpcgss sunrpc ext4 mbcache jbd2 sd_mod t10_pi sr_mod cdrom sg ahci libahci libata virtio_net crc32c_intel serio_raw virtio_blk net_failover
[3060059.832414]  virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [last unloaded: obdecho]

crash-dump analysis has permitted to point to a race between this lnet_discovery and the monitor_thread LNet threads, during the access of &the_lnet.ln_mt_peerNIRecovq list-head.

And this comes from the fact the BUG()/Oops occurs when a peer_ni is being linked at the end of &the_lnet.ln_mt_peerNIRecovq list-head by lnet_discovery in lnet_peer_ni_add_to_recoveryq_locked() , but the invalid 0xff810a564ca87ec8 value __ for prev->next is on monitor_thread Kernel stack , when this thread can also access &the_lnet.ln_mt_peerNIRecovq in lnet_recover_peer_nis() to splice it in an on-stack local variable !!!
But lnet_recover_peer_nis() is doing it with lnet_net_lock(0) protection ** when * this is not the case in current *lnet_discovery thread call-stack (even if lnet_peer_ni_add_to_recoveryq_locked() name prefix indicates it should !), hence the race.

This has occured with 2.14 version, but looks like problem still exists in master code.
I will try to cook and push a fix for this.

Attachments

Activity

People

Assignee:: Bruno Faccini

Reporter:: Bruno Faccini

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Feb/24 12:02 PM

Updated:: 18/Apr/24 4:10 PM

Resolved:: 13/Mar/24 1:46 PM