[LU-17417] (Durham University) Grace Hopper + Rocky 9 aarch64 + kernel-64k + Lustre 2.15.4 client = kernel panic Created: 11/Jan/24  Updated: 15/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mark Dixon Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: arm
Environment:

Rocky 9.3 aarch64
Lustre 2.15.4 client (2.12.x server)
NVIDIA Grace Hopper seed unit (integrated arm cpu + gpu socket)
InfiniBand (in tree modules)
No gpu modules loaded


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi there!

We are lucky enough to have a few 1 socket Grace Hopper servers and we would like them to mount our Lustre filesystem. Unfortunately, starting up lnet causes the client to panic, for example:
```
[ 8919.610649] libcfs: loading out-of-tree module taints kernel.
[ 8919.610870] libcfs: module verification failed: signature and/or required key missing - tainting kernel
[ 8919.627075] Unable to handle kernel paging request at virtual address 00000196a9025cc5
[ 8919.635176] Mem abort info:
[ 8919.638025] ESR = 0x0000000096000005
[ 8919.641855] EC = 0x25: DABT (current EL), IL = 32 bits
[ 8919.647282] SET = 0, FnV = 0
[ 8919.650399] EA = 0, S1PTW = 0
[ 8919.653606] FSC = 0x05: level 1 translation fault
[ 8919.658589] Data abort info:
[ 8919.661531] ISV = 0, ISS = 0x00000005
[ 8919.665447] CM = 0, WnR = 0
[ 8919.668473] user pgtable: 64k pages, 48-bit VAs, pgdp=0000000155cd0400
[ 8919.675150] [00000196a9025cc5] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 8919.684050] Internal error: Oops: 0000000096000005 1 SMP
[ 8919.689746] Modules linked in: libcfs(OE+) 8021q garp mrp stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm iw_cm ib_ipoib ib_cm vfat fat drm_display_helper ast acpi_ipmi drm_shmem_helper ses ipmi_ssif enclosure cec i2c_smbus drm_ttm_helper spi_nor ttm i2c_algo_bit ipmi_devintf drm_kms_helper mtd syscopyarea sysfillrect sysimgblt ipmi_msghandler mlx5_ib ib_uverbs coresight_stm coresight_tmc coresight_funnel stm_core ib_core coresight cppc_cpufreq auth_rpcgss drm sunrpc fuse xfs libcrc32c mlx5_core sg crct10dif_ce ghash_ce sha2_ce sha256_arm64 mpt3sas sha1_ce sbsa_gwdt nv
me nvme_core mlxfw tls raid_class scsi_transport_sas nvme_common psample pci_hyperv_intf spi_tegra210_quad acpi_power_meter dm_mirror
[ 8919.689783] dm_region_hash dm_log dm_mod
[ 8919.783038] CPU: 38 PID: 105046 Comm: modprobe Kdump: loaded Tainted: G OE ------- — 5.14.0-362.13.1.el9_3.aarch64+64k #1
[ 8919.795846] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 1S7GZ9Z0000/S7G MB (CG1), BIOS 3A06 10/05/2023
[ 8919.807054] pstate: 23400009 (nzCv daif +PAN UAO +TCO +DIT -SSBS BTYPE=-)
[ 8919.814173] pc : mod_sysfs_setup+0x1a4/0x290
[ 8919.818542] lr : mod_sysfs_setup+0x174/0x290
[ 8919.822903] sp : ffff80009682fa70
[ 8919.826286] x29: ffff80009682fa70 x28: ffff80009682fbf0 x27: ffffa0608ae23948
[ 8919.833580] x26: ffffa06042663b88 x25: ffff80009682fbf0 x24: ffffa06042630cf8
[ 8919.840874] x23: ffffa06042648890 x22: ffffa06042663818 x21: ffffa06042663850
[ 8919.848168] x20: 0000000000000000 x19: ffffa06042663800 x18: 0000000000000000
[ 8919.855462] x17: 00000000000001a4 x16: ffffa06042640d58 x15: ffffa06088c1a560
[ 8919.862757] x14: ffffa06088c19e00 x13: 0073656761705f6f x12: 74707972635f636f
[ 8919.870050] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa060897f2e6c
[ 8919.877344] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872
[ 8919.884639] x5 : 0000000000000000 x4 : 0000000000000030 x3 : 0000000000000000
[ 8919.891933] x2 : ffffa06042663818 x1 : ffffa06042663850 x0 : 90000196a9025bf5
[ 8919.899229] Call trace:
[ 8919.901723] mod_sysfs_setup+0x1a4/0x290
[ 8919.905728] load_module+0xaec/0xc6c
[ 8919.909382] __do_sys_finit_module+0xa4/0x110
[ 8919.913832] __arm64_sys_finit_module+0x24/0x30
[ 8919.918461] invoke_syscall.constprop.0+0x7c/0xd0
[ 8919.923276] el0_svc_common.constprop.0+0x140/0x150
[ 8919.928259] do_el0_svc+0x38/0xa0
[ 8919.931642] el0_svc+0x38/0x18c
[ 8919.934853] el0t_64_sync_handler+0xb4/0x130
[ 8919.939216] el0t_64_sync+0x17c/0x180
[ 8919.942958] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406800)
[ 8919.949189] SMP: stopping secondary CPUs
[ 8919.955258] Starting crashdump kernel...
[ 8919.959265] Bye!
```

We prefer a dkms build but, as we are in testing, the client was built with the more usual:
```
git clone git://git.whamcloud.com/fs/lustre-release.git
cd lustre-release
git checkout 2.15.4
kernel=`uname -r`
sh autogen.sh
./configure --with-linux=/usr/src/kernels/$kernel
make rpms
```

Tried backing off to the more usual 4k kernel using the same method and successfully mounted our lustre filesystem, although attempting to move to a dkms build for that 4k kernel strangely results in the panic returning.

Can you help, please?

Thanks,

Mark



 Comments   
Comment by Andreas Dilger [ 11/Jan/24 ]

Have you tried any other kernels or Lustre versions? Was the lustre code built on this node? Just wondering if there is a chance of kernel module version mismatch?

Comment by Andreas Dilger [ 11/Jan/24 ]

kevin.zhao, xinliang any thoughts on this? It looks very early in module loading, and hasn't even called the module init function AFAICS.

Comment by Mark Dixon [ 11/Jan/24 ]

Hi Andreas, thanks for taking a look at this.

For lustre, the code was built on the same node. I first attempted 2.12.9 but it wouldn't complete configure and so quickly switched to 2.15.4.

The kernel versions I've played with are the latest Rocky 9.3, so 5.14.0-362.13.1.el9_3.aarch64+64k and its 4k page equivalent, 5.14.0-362.13.1.el9_3.aarch64.

I had played with in-tree InfiniBand modules vs. MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.3-aarch64, but uninstalled MLNX_OFED.

With the 64k kernel booted I've just removed the lustre rpms, checked that "find /lib/modules | grep libcfs" didn't return anything, built a fresh set of 2.15.4 rpms, installed as above, checked "find /lib/modules | grep libcfs" reported a real file under /extra and not a symlink under /weak-modules, ran "modprobe libcfs" - then get a similar kernel oops.

Comment by Peter Jones [ 11/Jan/24 ]

Mark

Given that this is still a bit experimental at this stage, I wonder if it is worth seeing if you have better success with the tip of master. It could well be that some recent useful landings have not been back ported to the LTS branch yet...

Peter

Comment by Mark Dixon [ 11/Jan/24 ]

Thanks Peter. For some reason I never think of trying the bleeding edge - but sadly the same result : (

Comment by Xinliang Liu [ 12/Jan/24 ]

Hi,

The kernel address 00000196a9025cc5 is not a valid one(starting with fffxxxxxx), so it makes a data abort oops.

See https://www.kernel.org/doc/html/latest/arch/arm64/memory.html

Our arm64 master CI is still running on Rocky9.2, will try Rocky9.3 as there is a ldiskfs patch for rocky9.3 now. 

See http://213.146.155.72:8080/job/test-periodically-lustre-master-rhel9/ and http://213.146.155.72:8080/job/build-lustre-master-rhel9/

 

bodgerer , you said the 4k page size kernel has no this issue in the bug description. We are also testing 2.15 and master for Rocky8 which is 64K page size, and no this issue. Cou you try only tcp no rdma?

Comment by Mark Dixon [ 12/Jan/24 ]

Hi!

At the moment we just have the default /etc/lnet.conf which only contains comments, so it shouldn't be trying to setup any o2ib or tcp devices. We do have InfiniBand devices, but if we get rid of the drivers we won't have any ethernet: the unit also has a Mellanox SFP28+ ethernet card.

Comment by Mark Dixon [ 12/Jan/24 ]

I think that 4k vs. 64k page size and InfiniBand are a red herrings.

Rebuilt the tip of master with --with-o2ib=no and added a modprobe.d file to block the kernel from loading ib_core, mlx5_core and mlxfw, so there shouldnt be any rdma business going on. /etc/lnet.conf is filled with comment lines only. Unfortunately, `modprobe lnet` still panics the 64k kernel:

[ 75.635984] libcfs: loading out-of-tree module taints kernel.
[ 75.636150] libcfs: module verification failed: signature and/or required key missing - tainting kernel
[ 75.636917] Unable to handle kernel paging request at virtual address 004666a090000244
[ 75.660509] Mem abort info:
[ 75.663358] ESR = 0x0000000096000004
[ 75.667187] EC = 0x25: DABT (current EL), IL = 32 bits
[ 75.672616] SET = 0, FnV = 0
[ 75.675734] EA = 0, S1PTW = 0
[ 75.678938] FSC = 0x04: level 0 translation fault
[ 75.683921] Data abort info:
[ 75.686861] ISV = 0, ISS = 0x00000004
[ 75.690776] CM = 0, WnR = 0
[ 75.693803] [004666a090000244] address between user and kernel address ranges
[ 75.701100] Internal error: Oops: 0000000096000004 1 SMP
[ 75.706797] Modules linked in: libcfs(OE+) rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat drm_display_helper cec drm_ttm_helper ast ttm drm_shmem_helper ses i2c_algo_bit enclosure i2c_smbus acpi_ipmi drm_kms_helper syscopyarea ipmi_ssif sysfillrect sysimgblt spi_nor mtd ipmi_devintf coresight_stm coresight_tmc stm_core coresight_funnel ipmi_msghandler coresight cppc_cpufreq auth_rpcgss drm sunrpc fuse xfs libcrc32c sg crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce mpt3sas sbsa_gwdt nvme nvme_core raid_class scsi_transport_sas nvme_common tls psample pci_hyperv_intf spi_tegra210_quad acpi_power_meter dm_mirror dm_region_hash dm_log dm_mod
[ 75.780175] CPU: 12 PID: 1836 Comm: modprobe Tainted: G OE ------- — 5.14.0-362.13.1.el9_3.aarch64+64k #1
[ 75.791559] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 1S7GZ9Z0000/S7G MB (CG1), BIOS 3A06 10/05/2023
[ 75.802765] pstate: 23400009 (nzCv daif +PAN UAO +TCO +DIT -SSBS BTYPE=-)
[ 75.809883] pc : mod_sysfs_setup+0x1a4/0x290
[ 75.814249] lr : mod_sysfs_setup+0x174/0x290
[ 75.818610] sp : ffff80003236fbc0
[ 75.821992] x29: ffff80003236fbc0 x28: ffff80003236fd40 x27: ffffa0206a9c3948
[ 75.829287] x26: ffffa01ff5d13308 x25: ffff80003236fd40 x24: ffffa01ff5ce7478
[ 75.836582] x23: ffffa01ff5cf7578 x22: ffffa01ff5d12f98 x21: ffffa01ff5d12fd0
[ 75.843876] x20: 0000000000000000 x19: ffffa01ff5d12f80 x18: 0000000000000001
[ 75.851171] x17: 00000000000001a4 x16: ffffa01ff5cf0c58 x15: ffffa020687ba560
[ 75.858465] x14: ffffa020687b9e00 x13: 0073656761705f6f x12: 74707972635f636f
[ 75.865759] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa02069392e6c
[ 75.873053] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872
[ 75.880347] x5 : 0000000000000000 x4 : 0000000000000030 x3 : 0000000000000000
[ 75.887642] x2 : ffffa01ff5d12f98 x1 : ffffa01ff5d12fd0 x0 : f94666a090000174
[ 75.894935] Call trace:
[ 75.897430] mod_sysfs_setup+0x1a4/0x290
[ 75.901434] load_module+0xaec/0xc6c
[ 75.905086] __do_sys_finit_module+0xa4/0x110
[ 75.909536] __arm64_sys_finit_module+0x24/0x30
[ 75.914163] invoke_syscall.constprop.0+0x7c/0xd0
[ 75.918974] el0_svc_common.constprop.0+0x140/0x150
[ 75.923957] do_el0_svc+0x38/0xa0
[ 75.927341] el0_svc+0x38/0x18c
[ 75.930555] el0t_64_sync_handler+0xb4/0x130
[ 75.934916] el0t_64_sync+0x17c/0x180
[ 75.938656] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406800)
[ 75.944885] --[ end trace 2b55dea9c9e19201 ]--
[ 75.952014] Kernel panic - not syncing: Oops: Fatal exception
[ 75.957888] SMP: stopping secondary CPUs
[ 75.961907] Kernel Offset: 0x202060700000 from 0xffff800008000000
[ 75.968133] PHYS_OFFSET: 0x80000000
[ 75.971693] CPU features: 0x0000000,034016d8,c867fe03
[ 75.976854] Memory Limit: none
[ 75.982347] --[ end Kernel panic - not syncing: Oops: Fatal exception ]--

It's not quite true that I said the 4k kernel didn't have this issue. For 2.15.4 I did manage to build and use the kmod lustre client rpm successfully, but the 2.15.4 dkms lustre client rpm generated a panic. Rebuilt the tip of master against the 4k kernel and, with the same settings/options to avoid Infiniband as above, installing the kmod lustre client rpm and `modprobe lnet` gave this oops:

^[[?2004l^M[ 945.766464] libcfs: loading out-of-tree module taints kernel.^M
[ 945.766633] libcfs: module verification failed: signature and/or required key missing - tainting kernel^M
[ 945.767517] Unable to handle kernel paging request at virtual address 004666a0f0000144^M
[ 945.791118] Mem abort info:^M
[ 945.793968] ESR = 0x0000000096000004^M
[ 945.797798] EC = 0x25: DABT (current EL), IL = 32 bits^M
[ 945.803225] SET = 0, FnV = 0^M
[ 945.806344] EA = 0, S1PTW = 0^M
[ 945.809548] FSC = 0x04: level 0 translation fault^M
[ 945.814531] Data abort info:^M
[ 945.817471] ISV = 0, ISS = 0x00000004^M
[ 945.821387] CM = 0, WnR = 0^M
[ 945.824415] [004666a0f0000144] address between user and kernel address ranges^M
[ 945.831714] Internal error: Oops: 0000000096000004 1 SMP^M
[ 945.837412] Modules linked in: libcfs(OE+) 8021q garp mrp stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat drm_display_helper ast cec drm_shmem_helper drm_ttm_helper ses enclosure ttm acpi_ipmi i2c_smbus i2c_algo_bit ipmi_ssif ipmi_devintf drm_kms_helper syscopyarea ipmi_msghandler sysfillrect coresight_stm spi_nor sysimgblt coresight_tmc stm_core mtd coresight_funnel coresight cppc_cpufreq auth_rpcgss drm sunrpc fuse xfs libcrc32c sg crct10dif_ce ghash_ce sha2_ce mpt3sas sha256_arm64 sha1_ce sbsa_gwdt nvme nvme_core tls raid_class scsi_transport_sas nvme_common psample pci_hyperv_intf spi_tegra210_quad acpi_power_meter dm_mirror dm_region_hash dm_log dm_mod^M
[ 945.912831] CPU: 51 PID: 112745 Comm: modprobe Kdump: loaded Tainted: G OE ------- — 5.14.0-362.13.1.el9_3.aarch64 #1^M
[ 945.925281] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 1S7GZ9Z0000/S7G MB (CG1), BIOS 3A06 10/05/2023^M
[ 945.936487] pstate: 23400009 (nzCv daif +PAN UAO +TCO +DIT -SSBS BTYPE=-)^M
[ 945.943605] pc : mod_sysfs_setup+0x1a4/0x290^M
[ 945.947974] lr : mod_sysfs_setup+0x174/0x290^M
[ 945.952336] sp : ffff800009053a30^M
[ 945.955718] x29: ffff800009053a30 x28: ffff800009053bb0 x27: ffffa0206a8f9948^M
[ 945.963014] x26: ffffa020639c7308 x25: ffff800009053bb0 x24: ffffa020639b8478^M
[ 945.970308] x23: ffffa020639c1578 x22: ffffa020639c6f98 x21: ffffa020639c6fd0^M
[ 945.977602] x20: 0000000000000000 x19: ffffa020639c6f80 x18: 0000000000000000^M
[ 945.984897] x17: 00000000000001a4 x16: ffffa020639bac58 x15: ffffa0206889a430^M
[ 945.992191] x14: ffffa02068899cd0 x13: 0073656761705f6f x12: 74707972635f636f^M
[ 945.999487] x11: 0000000000000000 x10: 00000000000236bc x9 : ffffa0206947348c^M
[ 946.006782] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872^M
[ 946.014078] x5 : 0000000000000000 x4 : 0000000000000030 x3 : 0000000000000000^M
[ 946.021374] x2 : ffffa020639c6f98 x1 : ffffa020639c6fd0 x0 : f94666a0f0000074^M
[ 946.028669] Call trace:^M
[ 946.031163] mod_sysfs_setup+0x1a4/0x290^M
[ 946.035168] load_module+0xae8/0xc70^M
[ 946.038822] __do_sys_finit_module+0xa4/0x110^M
[ 946.043272] __arm64_sys_finit_module+0x24/0x30^M
[ 946.047901] invoke_syscall.constprop.0+0x7c/0xd0^M
[ 946.052712] el0_svc_common.constprop.0+0x140/0x150^M
[ 946.057696] do_el0_svc+0x38/0xa0^M
^M^M
^GM[e s s9a4g6e. 0f6r1o0m8 0s]y el0_svc+0x38/0x18c^M
slogd@gh003 at Jan 12 11:06:40 ...^M^M
kernel:Internal error: Oops: 0000000096000004 1 SMP^M
^M[ 946.074418] el0t_64_sync_handler+0xb4/0x130^M
[ 946.078779] el0t_64_sync+0x17c/0x180^M
[ 946.082520] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406800) ^M
[ 946.088751] SMP: stopping secondary CPUs^M
[ 946.093091] Starting crashdump kernel...^M
[ 946.097099] Bye!^M

Also tried rebuilding 2.15.4 on 4k kernel and now a modprobe lnet with the kmod rpm installed generate the panic when they didn't before, so I guess I just got lucky initially.

Comment by Xinliang Liu [ 15/Jan/24 ]

Hi bodgerer , 

I didn't reproduce the oops issue on my aarch64 test VM for both 4K and 64K kernels.

4K page size kernel, try ~10 times

...
rocky@rocky9-test-01 lustre-release]$ sudo modprobe libcfs
[rocky@rocky9-test-01 lustre-release]$ sudo modprobe lnet
[rocky@rocky9-test-01 lustre-release]$ lsmod |grep -E 'libcfs|lnet'
lnet                  778240  0
libcfs                237568  1 lnet
sunrpc                626688  2 lnet
[rocky@rocky9-test-01 lustre-release]$ uname -r
5.14.0-362.13.1.el9_3.aarch64

 

64K page size kernel, try ~10 times

...
[rocky@rocky9-test-01 ~]$ sudo modprobe lnet && lsmod | grep -E "(libcfs|lnet)" && sudo modprobe -r lnet
lnet                  917504  0
libcfs                458752  1 lnet
sunrpc                851968  2 lnet
[rocky@rocky9-test-01 ~]$ uname -r
5.14.0-362.13.1.el9_3.aarch64+64k 

 

I guess you are encountering a use-after-free or out-of-bounds or other memory-corrupted issue caused by another process (here is another ko).

You can try the debug kernels and see what kernel outputs more, a.k.a the kernel-debug and kernel-64k-debug [1] (hoping KASAN and KFENCE[2] are enabled).

Also,  you can try to unload other ko one by one[3], to find out which ko causes the problem, usually, it might be the driver ko.

It seems a tough issue to troubleshoot. but there are ways to solve it, just take time.

[1] https://download.rockylinux.org/pub/rocky/9/BaseOS/aarch64/debug/tree/Packages/k/

[2] https://www.kernel.org/doc/html/latest/dev-tools/

[3] https://access.redhat.com/solutions/41278

Generated at Sat Feb 10 03:35:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.