<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:35:16 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17417] (Durham University) Grace Hopper + Rocky 9 aarch64 + kernel-64k + Lustre 2.15.4 client = kernel panic</title>
                <link>https://jira.whamcloud.com/browse/LU-17417</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi there!&lt;/p&gt;

&lt;p&gt;We are lucky enough to have a few 1 socket Grace Hopper servers and we would like them to mount our Lustre filesystem. Unfortunately, starting up lnet causes the client to panic, for example:&lt;br/&gt;
```&lt;br/&gt;
[ 8919.610649] libcfs: loading out-of-tree module taints kernel.&lt;br/&gt;
[ 8919.610870] libcfs: module verification failed: signature and/or required key missing - tainting kernel&lt;br/&gt;
[ 8919.627075] Unable to handle kernel paging request at virtual address 00000196a9025cc5&lt;br/&gt;
[ 8919.635176] Mem abort info:&lt;br/&gt;
[ 8919.638025]   ESR = 0x0000000096000005&lt;br/&gt;
[ 8919.641855]   EC = 0x25: DABT (current EL), IL = 32 bits&lt;br/&gt;
[ 8919.647282]   SET = 0, FnV = 0&lt;br/&gt;
[ 8919.650399]   EA = 0, S1PTW = 0&lt;br/&gt;
[ 8919.653606]   FSC = 0x05: level 1 translation fault&lt;br/&gt;
[ 8919.658589] Data abort info:&lt;br/&gt;
[ 8919.661531]   ISV = 0, ISS = 0x00000005&lt;br/&gt;
[ 8919.665447]   CM = 0, WnR = 0&lt;br/&gt;
[ 8919.668473] user pgtable: 64k pages, 48-bit VAs, pgdp=0000000155cd0400&lt;br/&gt;
[ 8919.675150] &lt;span class=&quot;error&quot;&gt;&amp;#91;00000196a9025cc5&amp;#93;&lt;/span&gt; pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000&lt;br/&gt;
[ 8919.684050] Internal error: Oops: 0000000096000005 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP&lt;br/&gt;
[ 8919.689746] Modules linked in: libcfs(OE+) 8021q garp mrp stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm iw_cm ib_ipoib ib_cm vfat fat drm_display_helper ast acpi_ipmi drm_shmem_helper ses ipmi_ssif enclosure cec i2c_smbus drm_ttm_helper spi_nor ttm i2c_algo_bit ipmi_devintf drm_kms_helper mtd syscopyarea sysfillrect sysimgblt ipmi_msghandler mlx5_ib ib_uverbs coresight_stm coresight_tmc coresight_funnel stm_core ib_core coresight cppc_cpufreq auth_rpcgss drm sunrpc fuse xfs libcrc32c mlx5_core sg crct10dif_ce ghash_ce sha2_ce sha256_arm64 mpt3sas sha1_ce sbsa_gwdt nv&lt;br/&gt;
me nvme_core mlxfw tls raid_class scsi_transport_sas nvme_common psample pci_hyperv_intf spi_tegra210_quad acpi_power_meter dm_mirror&lt;br/&gt;
[ 8919.689783]  dm_region_hash dm_log dm_mod&lt;br/&gt;
[ 8919.783038] CPU: 38 PID: 105046 Comm: modprobe Kdump: loaded Tainted: G           OE     -------  &amp;#8212;  5.14.0-362.13.1.el9_3.aarch64+64k #1&lt;br/&gt;
[ 8919.795846] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 1S7GZ9Z0000/S7G MB (CG1), BIOS 3A06 10/05/2023&lt;br/&gt;
[ 8919.807054] pstate: 23400009 (nzCv daif +PAN &lt;del&gt;UAO +TCO +DIT -SSBS BTYPE=&lt;/del&gt;-)&lt;br/&gt;
[ 8919.814173] pc : mod_sysfs_setup+0x1a4/0x290&lt;br/&gt;
[ 8919.818542] lr : mod_sysfs_setup+0x174/0x290&lt;br/&gt;
[ 8919.822903] sp : ffff80009682fa70&lt;br/&gt;
[ 8919.826286] x29: ffff80009682fa70 x28: ffff80009682fbf0 x27: ffffa0608ae23948&lt;br/&gt;
[ 8919.833580] x26: ffffa06042663b88 x25: ffff80009682fbf0 x24: ffffa06042630cf8&lt;br/&gt;
[ 8919.840874] x23: ffffa06042648890 x22: ffffa06042663818 x21: ffffa06042663850&lt;br/&gt;
[ 8919.848168] x20: 0000000000000000 x19: ffffa06042663800 x18: 0000000000000000&lt;br/&gt;
[ 8919.855462] x17: 00000000000001a4 x16: ffffa06042640d58 x15: ffffa06088c1a560&lt;br/&gt;
[ 8919.862757] x14: ffffa06088c19e00 x13: 0073656761705f6f x12: 74707972635f636f&lt;br/&gt;
[ 8919.870050] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa060897f2e6c&lt;br/&gt;
[ 8919.877344] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872&lt;br/&gt;
[ 8919.884639] x5 : 0000000000000000 x4 : 0000000000000030 x3 : 0000000000000000&lt;br/&gt;
[ 8919.891933] x2 : ffffa06042663818 x1 : ffffa06042663850 x0 : 90000196a9025bf5&lt;br/&gt;
[ 8919.899229] Call trace:&lt;br/&gt;
[ 8919.901723]  mod_sysfs_setup+0x1a4/0x290&lt;br/&gt;
[ 8919.905728]  load_module+0xaec/0xc6c&lt;br/&gt;
[ 8919.909382]  __do_sys_finit_module+0xa4/0x110&lt;br/&gt;
[ 8919.913832]  __arm64_sys_finit_module+0x24/0x30&lt;br/&gt;
[ 8919.918461]  invoke_syscall.constprop.0+0x7c/0xd0&lt;br/&gt;
[ 8919.923276]  el0_svc_common.constprop.0+0x140/0x150&lt;br/&gt;
[ 8919.928259]  do_el0_svc+0x38/0xa0&lt;br/&gt;
[ 8919.931642]  el0_svc+0x38/0x18c&lt;br/&gt;
[ 8919.934853]  el0t_64_sync_handler+0xb4/0x130&lt;br/&gt;
[ 8919.939216]  el0t_64_sync+0x17c/0x180&lt;br/&gt;
[ 8919.942958] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406800) &lt;br/&gt;
[ 8919.949189] SMP: stopping secondary CPUs&lt;br/&gt;
[ 8919.955258] Starting crashdump kernel...&lt;br/&gt;
[ 8919.959265] Bye!&lt;br/&gt;
```&lt;/p&gt;

&lt;p&gt;We prefer a dkms build but, as we are in testing, the client was built with the more usual:&lt;br/&gt;
```&lt;br/&gt;
git clone git://git.whamcloud.com/fs/lustre-release.git&lt;br/&gt;
cd lustre-release&lt;br/&gt;
git checkout 2.15.4&lt;br/&gt;
kernel=`uname -r`&lt;br/&gt;
sh autogen.sh&lt;br/&gt;
./configure --with-linux=/usr/src/kernels/$kernel&lt;br/&gt;
make rpms&lt;br/&gt;
```&lt;/p&gt;

&lt;p&gt;Tried backing off to the more usual 4k kernel using the same method and successfully mounted our lustre filesystem, although attempting to move to a dkms build for that 4k kernel strangely results in the panic returning.&lt;/p&gt;

&lt;p&gt;Can you help, please?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Mark&lt;/p&gt;</description>
                <environment>Rocky 9.3 aarch64&lt;br/&gt;
Lustre 2.15.4 client (2.12.x server)&lt;br/&gt;
NVIDIA Grace Hopper seed unit (integrated arm cpu + gpu socket)&lt;br/&gt;
InfiniBand (in tree modules)&lt;br/&gt;
No gpu modules loaded</environment>
        <key id="79944">LU-17417</key>
            <summary>(Durham University) Grace Hopper + Rocky 9 aarch64 + kernel-64k + Lustre 2.15.4 client = kernel panic</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="bodgerer">Mark Dixon</reporter>
                        <labels>
                            <label>arm</label>
                    </labels>
                <created>Thu, 11 Jan 2024 13:32:03 +0000</created>
                <updated>Mon, 15 Jan 2024 09:00:55 +0000</updated>
                                            <version>Lustre 2.15.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="399299" author="adilger" created="Thu, 11 Jan 2024 15:24:59 +0000"  >&lt;p&gt;Have you tried any other kernels or Lustre versions?  Was the lustre code built on this node?  Just wondering if there is a chance of kernel module version mismatch?&lt;/p&gt;</comment>
                            <comment id="399300" author="adilger" created="Thu, 11 Jan 2024 15:28:15 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=kevin.zhao&quot; class=&quot;user-hover&quot; rel=&quot;kevin.zhao&quot;&gt;kevin.zhao&lt;/a&gt;, &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=xinliang&quot; class=&quot;user-hover&quot; rel=&quot;xinliang&quot;&gt;xinliang&lt;/a&gt; any thoughts on this?  It looks very early in module loading, and hasn&apos;t even called the module init function AFAICS. &lt;/p&gt;</comment>
                            <comment id="399320" author="bodgerer" created="Thu, 11 Jan 2024 16:36:18 +0000"  >&lt;p&gt;Hi Andreas, thanks for taking a look at this.&lt;/p&gt;

&lt;p&gt;For lustre, the code was built on the same node. I first attempted 2.12.9 but it wouldn&apos;t complete configure and so quickly switched to 2.15.4.&lt;/p&gt;

&lt;p&gt;The kernel versions I&apos;ve played with are the latest Rocky 9.3, so 5.14.0-362.13.1.el9_3.aarch64+64k and its 4k page equivalent, 5.14.0-362.13.1.el9_3.aarch64.&lt;/p&gt;

&lt;p&gt;I had played with in-tree InfiniBand modules vs. MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.3-aarch64, but uninstalled MLNX_OFED.&lt;/p&gt;

&lt;p&gt;With the 64k kernel booted I&apos;ve just removed the lustre rpms, checked that &quot;find /lib/modules | grep libcfs&quot; didn&apos;t return anything, built a fresh set of 2.15.4 rpms, installed as above, checked &quot;find /lib/modules | grep libcfs&quot; reported a real file under /extra and not a symlink under /weak-modules, ran &quot;modprobe libcfs&quot; -  then get a similar kernel oops.&lt;/p&gt;</comment>
                            <comment id="399323" author="pjones" created="Thu, 11 Jan 2024 16:48:30 +0000"  >&lt;p&gt;Mark&lt;/p&gt;

&lt;p&gt;Given that this is still a bit experimental at this stage, I wonder if it is worth seeing if you have better success with the tip of master. It could well be that some recent useful landings have not been back ported to the LTS branch yet...&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="399326" author="bodgerer" created="Thu, 11 Jan 2024 17:10:13 +0000"  >&lt;p&gt;Thanks Peter. For some reason I never think of trying the bleeding edge - but sadly the same result : (&lt;/p&gt;</comment>
                            <comment id="399400" author="xinliang" created="Fri, 12 Jan 2024 02:18:26 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;The kernel address 00000196a9025cc5 is not a valid one(starting with fffxxxxxx), so it makes a data abort oops.&lt;/p&gt;

&lt;p&gt;See &lt;a href=&quot;https://www.kernel.org/doc/html/latest/arch/arm64/memory.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.kernel.org/doc/html/latest/arch/arm64/memory.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our arm64 master CI is still running on Rocky9.2, will try Rocky9.3 as there is a &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/53394&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;ldiskfs patch&lt;/a&gt; for rocky9.3 now.&#160;&lt;/p&gt;

&lt;p&gt;See &lt;a href=&quot;http://213.146.155.72:8080/job/test-periodically-lustre-master-rhel9/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://213.146.155.72:8080/job/test-periodically-lustre-master-rhel9/&lt;/a&gt; and &lt;a href=&quot;http://213.146.155.72:8080/job/build-lustre-master-rhel9/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://213.146.155.72:8080/job/build-lustre-master-rhel9/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=bodgerer&quot; class=&quot;user-hover&quot; rel=&quot;bodgerer&quot;&gt;bodgerer&lt;/a&gt; , you said the 4k page size kernel has no this issue in the bug description. We are also testing 2.15 and master for Rocky8 which is 64K page size, and no this issue. Cou you try only tcp no rdma?&lt;/p&gt;</comment>
                            <comment id="399438" author="bodgerer" created="Fri, 12 Jan 2024 09:27:32 +0000"  >&lt;p&gt;Hi!&lt;/p&gt;

&lt;p&gt;At the moment we just have the default /etc/lnet.conf which only contains comments, so it shouldn&apos;t be trying to setup any o2ib or tcp devices. We do have InfiniBand devices, but if we get rid of the drivers we won&apos;t have any ethernet: the unit also has a Mellanox SFP28+ ethernet card.&lt;/p&gt;</comment>
                            <comment id="399451" author="bodgerer" created="Fri, 12 Jan 2024 11:32:49 +0000"  >&lt;p&gt;I think that 4k vs. 64k page size and InfiniBand are a red herrings.&lt;/p&gt;


&lt;p&gt;Rebuilt the tip of master with --with-o2ib=no and added a modprobe.d file to block the kernel from loading ib_core, mlx5_core and mlxfw, so there shouldnt be any rdma business going on. /etc/lnet.conf is filled with comment lines only. Unfortunately, `modprobe lnet` still panics the 64k kernel:&lt;/p&gt;

&lt;p&gt;[   75.635984] libcfs: loading out-of-tree module taints kernel.&lt;br/&gt;
[   75.636150] libcfs: module verification failed: signature and/or required key missing - tainting kernel&lt;br/&gt;
[   75.636917] Unable to handle kernel paging request at virtual address 004666a090000244&lt;br/&gt;
[   75.660509] Mem abort info:&lt;br/&gt;
[   75.663358]   ESR = 0x0000000096000004&lt;br/&gt;
[   75.667187]   EC = 0x25: DABT (current EL), IL = 32 bits&lt;br/&gt;
[   75.672616]   SET = 0, FnV = 0&lt;br/&gt;
[   75.675734]   EA = 0, S1PTW = 0&lt;br/&gt;
[   75.678938]   FSC = 0x04: level 0 translation fault&lt;br/&gt;
[   75.683921] Data abort info:&lt;br/&gt;
[   75.686861]   ISV = 0, ISS = 0x00000004&lt;br/&gt;
[   75.690776]   CM = 0, WnR = 0&lt;br/&gt;
[   75.693803] &lt;span class=&quot;error&quot;&gt;&amp;#91;004666a090000244&amp;#93;&lt;/span&gt; address between user and kernel address ranges&lt;br/&gt;
[   75.701100] Internal error: Oops: 0000000096000004 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP&lt;br/&gt;
[   75.706797] Modules linked in: libcfs(OE+) rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat drm_display_helper cec drm_ttm_helper ast ttm drm_shmem_helper ses i2c_algo_bit enclosure i2c_smbus acpi_ipmi drm_kms_helper syscopyarea ipmi_ssif sysfillrect sysimgblt spi_nor mtd ipmi_devintf coresight_stm coresight_tmc stm_core coresight_funnel ipmi_msghandler coresight cppc_cpufreq auth_rpcgss drm sunrpc fuse xfs libcrc32c sg crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce mpt3sas sbsa_gwdt nvme nvme_core raid_class scsi_transport_sas nvme_common tls psample pci_hyperv_intf spi_tegra210_quad acpi_power_meter dm_mirror dm_region_hash dm_log dm_mod&lt;br/&gt;
[   75.780175] CPU: 12 PID: 1836 Comm: modprobe Tainted: G           OE     -------  &amp;#8212;  5.14.0-362.13.1.el9_3.aarch64+64k #1&lt;br/&gt;
[   75.791559] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 1S7GZ9Z0000/S7G MB (CG1), BIOS 3A06 10/05/2023&lt;br/&gt;
[   75.802765] pstate: 23400009 (nzCv daif +PAN &lt;del&gt;UAO +TCO +DIT -SSBS BTYPE=&lt;/del&gt;-)&lt;br/&gt;
[   75.809883] pc : mod_sysfs_setup+0x1a4/0x290&lt;br/&gt;
[   75.814249] lr : mod_sysfs_setup+0x174/0x290&lt;br/&gt;
[   75.818610] sp : ffff80003236fbc0&lt;br/&gt;
[   75.821992] x29: ffff80003236fbc0 x28: ffff80003236fd40 x27: ffffa0206a9c3948&lt;br/&gt;
[   75.829287] x26: ffffa01ff5d13308 x25: ffff80003236fd40 x24: ffffa01ff5ce7478&lt;br/&gt;
[   75.836582] x23: ffffa01ff5cf7578 x22: ffffa01ff5d12f98 x21: ffffa01ff5d12fd0&lt;br/&gt;
[   75.843876] x20: 0000000000000000 x19: ffffa01ff5d12f80 x18: 0000000000000001&lt;br/&gt;
[   75.851171] x17: 00000000000001a4 x16: ffffa01ff5cf0c58 x15: ffffa020687ba560&lt;br/&gt;
[   75.858465] x14: ffffa020687b9e00 x13: 0073656761705f6f x12: 74707972635f636f&lt;br/&gt;
[   75.865759] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa02069392e6c&lt;br/&gt;
[   75.873053] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872&lt;br/&gt;
[   75.880347] x5 : 0000000000000000 x4 : 0000000000000030 x3 : 0000000000000000&lt;br/&gt;
[   75.887642] x2 : ffffa01ff5d12f98 x1 : ffffa01ff5d12fd0 x0 : f94666a090000174&lt;br/&gt;
[   75.894935] Call trace:&lt;br/&gt;
[   75.897430]  mod_sysfs_setup+0x1a4/0x290&lt;br/&gt;
[   75.901434]  load_module+0xaec/0xc6c&lt;br/&gt;
[   75.905086]  __do_sys_finit_module+0xa4/0x110&lt;br/&gt;
[   75.909536]  __arm64_sys_finit_module+0x24/0x30&lt;br/&gt;
[   75.914163]  invoke_syscall.constprop.0+0x7c/0xd0&lt;br/&gt;
[   75.918974]  el0_svc_common.constprop.0+0x140/0x150&lt;br/&gt;
[   75.923957]  do_el0_svc+0x38/0xa0&lt;br/&gt;
[   75.927341]  el0_svc+0x38/0x18c&lt;br/&gt;
[   75.930555]  el0t_64_sync_handler+0xb4/0x130&lt;br/&gt;
[   75.934916]  el0t_64_sync+0x17c/0x180&lt;br/&gt;
[   75.938656] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406800)&lt;br/&gt;
[   75.944885] --&lt;del&gt;[ end trace 2b55dea9c9e19201 ]&lt;/del&gt;--&lt;br/&gt;
[   75.952014] Kernel panic - not syncing: Oops: Fatal exception&lt;br/&gt;
[   75.957888] SMP: stopping secondary CPUs&lt;br/&gt;
[   75.961907] Kernel Offset: 0x202060700000 from 0xffff800008000000&lt;br/&gt;
[   75.968133] PHYS_OFFSET: 0x80000000&lt;br/&gt;
[   75.971693] CPU features: 0x0000000,034016d8,c867fe03&lt;br/&gt;
[   75.976854] Memory Limit: none&lt;br/&gt;
[   75.982347] --&lt;del&gt;[ end Kernel panic - not syncing: Oops: Fatal exception ]&lt;/del&gt;--&lt;/p&gt;


&lt;p&gt;It&apos;s not quite true that I said the 4k kernel didn&apos;t have this issue. For 2.15.4 I did manage to build and use the kmod lustre client rpm successfully, but the 2.15.4 dkms lustre client rpm generated a panic. Rebuilt the tip of master against the 4k kernel and, with the same settings/options to avoid Infiniband as above, installing the kmod lustre client rpm and `modprobe lnet` gave this oops:&lt;/p&gt;

&lt;p&gt;^[[?2004l^M[  945.766464] libcfs: loading out-of-tree module taints kernel.^M&lt;br/&gt;
[  945.766633] libcfs: module verification failed: signature and/or required key missing - tainting kernel^M&lt;br/&gt;
[  945.767517] Unable to handle kernel paging request at virtual address 004666a0f0000144^M&lt;br/&gt;
[  945.791118] Mem abort info:^M&lt;br/&gt;
[  945.793968]   ESR = 0x0000000096000004^M&lt;br/&gt;
[  945.797798]   EC = 0x25: DABT (current EL), IL = 32 bits^M&lt;br/&gt;
[  945.803225]   SET = 0, FnV = 0^M&lt;br/&gt;
[  945.806344]   EA = 0, S1PTW = 0^M&lt;br/&gt;
[  945.809548]   FSC = 0x04: level 0 translation fault^M&lt;br/&gt;
[  945.814531] Data abort info:^M&lt;br/&gt;
[  945.817471]   ISV = 0, ISS = 0x00000004^M&lt;br/&gt;
[  945.821387]   CM = 0, WnR = 0^M&lt;br/&gt;
[  945.824415] &lt;span class=&quot;error&quot;&gt;&amp;#91;004666a0f0000144&amp;#93;&lt;/span&gt; address between user and kernel address ranges^M&lt;br/&gt;
[  945.831714] Internal error: Oops: 0000000096000004 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP^M&lt;br/&gt;
[  945.837412] Modules linked in: libcfs(OE+) 8021q garp mrp stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink vfat fat drm_display_helper ast cec drm_shmem_helper drm_ttm_helper ses enclosure ttm acpi_ipmi i2c_smbus i2c_algo_bit ipmi_ssif ipmi_devintf drm_kms_helper syscopyarea ipmi_msghandler sysfillrect coresight_stm spi_nor sysimgblt coresight_tmc stm_core mtd coresight_funnel coresight cppc_cpufreq auth_rpcgss drm sunrpc fuse xfs libcrc32c sg crct10dif_ce ghash_ce sha2_ce mpt3sas sha256_arm64 sha1_ce sbsa_gwdt nvme nvme_core tls raid_class scsi_transport_sas nvme_common psample pci_hyperv_intf spi_tegra210_quad acpi_power_meter dm_mirror dm_region_hash dm_log dm_mod^M&lt;br/&gt;
[  945.912831] CPU: 51 PID: 112745 Comm: modprobe Kdump: loaded Tainted: G           OE     -------  &amp;#8212;  5.14.0-362.13.1.el9_3.aarch64 #1^M&lt;br/&gt;
[  945.925281] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 1S7GZ9Z0000/S7G MB (CG1), BIOS 3A06 10/05/2023^M&lt;br/&gt;
[  945.936487] pstate: 23400009 (nzCv daif +PAN &lt;del&gt;UAO +TCO +DIT -SSBS BTYPE=&lt;/del&gt;-)^M&lt;br/&gt;
[  945.943605] pc : mod_sysfs_setup+0x1a4/0x290^M&lt;br/&gt;
[  945.947974] lr : mod_sysfs_setup+0x174/0x290^M&lt;br/&gt;
[  945.952336] sp : ffff800009053a30^M&lt;br/&gt;
[  945.955718] x29: ffff800009053a30 x28: ffff800009053bb0 x27: ffffa0206a8f9948^M&lt;br/&gt;
[  945.963014] x26: ffffa020639c7308 x25: ffff800009053bb0 x24: ffffa020639b8478^M&lt;br/&gt;
[  945.970308] x23: ffffa020639c1578 x22: ffffa020639c6f98 x21: ffffa020639c6fd0^M&lt;br/&gt;
[  945.977602] x20: 0000000000000000 x19: ffffa020639c6f80 x18: 0000000000000000^M&lt;br/&gt;
[  945.984897] x17: 00000000000001a4 x16: ffffa020639bac58 x15: ffffa0206889a430^M&lt;br/&gt;
[  945.992191] x14: ffffa02068899cd0 x13: 0073656761705f6f x12: 74707972635f636f^M&lt;br/&gt;
[  945.999487] x11: 0000000000000000 x10: 00000000000236bc x9 : ffffa0206947348c^M&lt;br/&gt;
[  946.006782] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872^M&lt;br/&gt;
[  946.014078] x5 : 0000000000000000 x4 : 0000000000000030 x3 : 0000000000000000^M&lt;br/&gt;
[  946.021374] x2 : ffffa020639c6f98 x1 : ffffa020639c6fd0 x0 : f94666a0f0000074^M&lt;br/&gt;
[  946.028669] Call trace:^M&lt;br/&gt;
[  946.031163]  mod_sysfs_setup+0x1a4/0x290^M&lt;br/&gt;
[  946.035168]  load_module+0xae8/0xc70^M&lt;br/&gt;
[  946.038822]  __do_sys_finit_module+0xa4/0x110^M&lt;br/&gt;
[  946.043272]  __arm64_sys_finit_module+0x24/0x30^M&lt;br/&gt;
[  946.047901]  invoke_syscall.constprop.0+0x7c/0xd0^M&lt;br/&gt;
[  946.052712]  el0_svc_common.constprop.0+0x140/0x150^M&lt;br/&gt;
[  946.057696]  do_el0_svc+0x38/0xa0^M&lt;br/&gt;
^M^M&lt;br/&gt;
^GM&lt;span class=&quot;error&quot;&gt;&amp;#91;e s s9a4g6e. 0f6r1o0m8 0s&amp;#93;&lt;/span&gt;y  el0_svc+0x38/0x18c^M&lt;br/&gt;
slogd@gh003 at Jan 12 11:06:40 ...^M^M&lt;br/&gt;
 kernel:Internal error: Oops: 0000000096000004 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP^M&lt;br/&gt;
^M[  946.074418]  el0t_64_sync_handler+0xb4/0x130^M&lt;br/&gt;
[  946.078779]  el0t_64_sync+0x17c/0x180^M&lt;br/&gt;
[  946.082520] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406800) ^M&lt;br/&gt;
[  946.088751] SMP: stopping secondary CPUs^M&lt;br/&gt;
[  946.093091] Starting crashdump kernel...^M&lt;br/&gt;
[  946.097099] Bye!^M&lt;/p&gt;

&lt;p&gt;Also tried rebuilding 2.15.4 on 4k kernel and now a modprobe lnet with the kmod rpm installed generate the panic when they didn&apos;t before, so I guess I just got lucky initially.&lt;/p&gt;</comment>
                            <comment id="399681" author="xinliang" created="Mon, 15 Jan 2024 08:51:40 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=bodgerer&quot; class=&quot;user-hover&quot; rel=&quot;bodgerer&quot;&gt;bodgerer&lt;/a&gt;&#160;,&#160;&lt;/p&gt;

&lt;p&gt;I didn&apos;t reproduce the oops issue on my aarch64 test VM for both 4K and 64K kernels.&lt;/p&gt;

&lt;p&gt;4K page size kernel, try ~10 times&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
...
rocky@rocky9-test-01 lustre-release]$ sudo modprobe libcfs
[rocky@rocky9-test-01 lustre-release]$ sudo modprobe lnet
[rocky@rocky9-test-01 lustre-release]$ lsmod |grep -E &lt;span class=&quot;code-quote&quot;&gt;&apos;libcfs|lnet&apos;&lt;/span&gt;
lnet &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;778240 &#160;0
libcfs &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;237568 &#160;1 lnet
sunrpc &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;626688 &#160;2 lnet
[rocky@rocky9-test-01 lustre-release]$ uname -r
5.14.0-362.13.1.el9_3.aarch64&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;64K page size kernel, try ~10 times&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
...
[rocky@rocky9-test-01 ~]$ sudo modprobe lnet &amp;amp;&amp;amp; lsmod | grep -E &lt;span class=&quot;code-quote&quot;&gt;&quot;(libcfs|lnet)&quot;&lt;/span&gt; &amp;amp;&amp;amp; sudo modprobe -r lnet
lnet &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;917504 &#160;0
libcfs &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;458752 &#160;1 lnet
sunrpc &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;851968 &#160;2 lnet
[rocky@rocky9-test-01 ~]$ uname -r
5.14.0-362.13.1.el9_3.aarch64+64k &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;I guess you are encountering a use-after-free or out-of-bounds or other memory-corrupted issue caused by another process (here is another ko).&lt;/p&gt;

&lt;p&gt;You can try the debug kernels and see what kernel outputs more, a.k.a the kernel-debug and kernel-64k-debug &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; (hoping KASAN and KFENCE&lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt; are enabled).&lt;/p&gt;

&lt;p&gt;Also,&#160; you can try to unload other ko one by one&lt;span class=&quot;error&quot;&gt;&amp;#91;3&amp;#93;&lt;/span&gt;, to find out which ko causes the problem, usually, it might be the driver ko.&lt;/p&gt;

&lt;p&gt;It seems a tough issue to troubleshoot. but there are ways to solve it, just take time.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://download.rockylinux.org/pub/rocky/9/BaseOS/aarch64/debug/tree/Packages/k/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://download.rockylinux.org/pub/rocky/9/BaseOS/aarch64/debug/tree/Packages/k/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://www.kernel.org/doc/html/latest/dev-tools/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.kernel.org/doc/html/latest/dev-tools/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;3&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://access.redhat.com/solutions/41278&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://access.redhat.com/solutions/41278&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0475r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>