[LU-16079] kernel BUG at mm/slub.c:4134! Created: 05/Aug/22  Updated: 10/Aug/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Peter Jones
Resolution: Unresolved Votes: 0
Labels: None
Environment:

redhat8 (4.18.0-348.2.1.el8_lustre.x86_64)
zfs-2.0.1

kmod-lustre-osd-ldiskfs-2.15.0-1.el8.x86_64
lustre-osd-ldiskfs-mount-2.15.0-1.el8.x86_64
lustre-osd-zfs-mount-2.15.0-1.el8.x86_64
lustre-iokit-2.15.0-1.el8.x86_64
kmod-lustre-2.15.0-1.el8.x86_64
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
lustre-2.15.0-1.el8.x86_64
libzfs4-devel-2.0.1-1.el8.x86_64
kmod-zfs-2.0.1-1.el8.x86_64
zfs-2.0.1-1.el8.x86_64
kmod-zfs-devel-2.0.1-1.el8.x86_64
lustre-osd-zfs-mount-2.15.0-1.el8.x86_64
libzfs4-2.0.1-1.el8.x86_64
zfs-dracut-2.0.1-1.el8.noarch
kmod-lustre-osd-zfs-2.15.0-1.el8.x86_64
python3-pyzfs-2.0.1-1.el8.noarch


Issue Links:
Related
is related to LU-16075 kernel update [RHEL8.6 4.18.0-372.19.... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running obdfilter on a raidz1 5x nvme zpool oss crashed. I can upload crash dump if needed.

zpool list -v
NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zpool        69.8T  5.83T  64.0T        -         -     0%     8%  1.00x    ONLINE  -
  raidz1     69.8T  5.83T  64.0T        -         -     0%  8.34%      -  ONLINE
    nvme2n1      -      -      -        -         -      -      -      -  ONLINE
    nvme3n1      -      -      -        -         -      -      -      -  ONLINE
    nvme4n1      -      -      -        -         -      -      -      -  ONLINE
    nvme5n1      -      -      -        -         -      -      -      -  ONLINE
    nvme6n1      -      -      -        -         -      -      -      -  ONLINE
PID: 82863  TASK: ffff95fbce823000  CPU: 20  COMMAND: "lctl"
 #0 [ffffb69d9fa5f800] machine_kexec at ffffffff888641ce
 #1 [ffffb69d9fa5f858] __crash_kexec at ffffffff8899df1d
 #2 [ffffb69d9fa5f920] crash_kexec at ffffffff8899ee0d
 #3 [ffffb69d9fa5f938] oops_end at ffffffff8882613d
 #4 [ffffb69d9fa5f958] do_trap at ffffffff888228c3
 #5 [ffffb69d9fa5f9a0] do_invalid_op at ffffffff88823256
 #6 [ffffb69d9fa5f9c0] invalid_op at ffffffff89200d64
    [exception RIP: kfree+717]
    RIP: ffffffff88afa98d  RSP: ffffb69d9fa5fa78  RFLAGS: 00010246
    RAX: fffff5a3047c2008  RBX: ffff95f191e40fb0  RCX: 0000000000000100
    RDX: 0000000000000000  RSI: 0000000000000008  RDI: ffff95f191e40fb0
    RBP: fffff5a31a479000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000200000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: ffff95e067b17700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb69d9fa5fab0] osd_bufs_get at ffffffffc19975e9 [osd_zfs]
 #8 [ffffb69d9fa5fb78] ofd_preprw_read at ffffffffc1f8b267 [ofd]
 #9 [ffffb69d9fa5fc20] ofd_preprw at ffffffffc1f8def7 [ofd]
#10 [ffffb69d9fa5fc90] echo_client_prep_commit at ffffffffc09d113e [obdecho]
#11 [ffffb69d9fa5fd70] echo_client_iocontrol at ffffffffc09dddfe [obdecho]
#12 [ffffb69d9fa5fdf0] class_handle_ioctl at ffffffffc1430776 [obdclass]
#13 [ffffb69d9fa5fe68] obd_class_ioctl at ffffffffc14310c7 [obdclass]
#14 [ffffb69d9fa5fe80] do_vfs_ioctl at ffffffff88b436a4
#15 [ffffb69d9fa5fef8] ksys_ioctl at ffffffff88b43ce0
#16 [ffffb69d9fa5ff30] __x64_sys_ioctl at ffffffff88b43d26
#17 [ffffb69d9fa5ff38] do_syscall_64 at ffffffff888042bb
#18 [ffffb69d9fa5ff50] entry_SYSCALL_64_after_hwframe at ffffffff892000ad
    RIP: 00007f18a99d372b  RSP: 00007fff25b5b1d8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f18aafc1af0  RCX: 00007f18a99d372b
    RDX: 00007fff25b5b500  RSI: 00000000c008667d  RDI: 0000000000000003
    RBP: 00000000c008667d   R8: 0000000000000240   R9: 00000000024826c2
    R10: 0000000000000000  R11: 0000000000000246  R12: 000055aff2bce353
    R13: 00007fff25b5b500  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b


 Comments   
Comment by Mahmoud Hanafi [ 10/Aug/22 ]

I think this is 

Kernel bug on mm/slub.c:314 (BZ#2102251)

Fixed in kernel-4.18.0-372.19.1

Comment by Peter Jones [ 10/Aug/22 ]

So are you able to just update to the latest RHEL 8.6 update with weak updates to avoid this issue? 

Generated at Sat Feb 10 03:23:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.