Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17745

a regression in patch for LU-16954 for old RHEL kernel

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 2
    • 9223372036854775807

    Description

      it seem that there is a regression in patch for LU-16954 for old RHEL kernel below. When Lustre is unmounted, the client gets a crash. This happened in RHEL8.2 kernel, but RHEL8.6 worked OK. when reverts patch https://review.whamcloud.com/#/c/fs/lustre-release/+/51955/, it avoided crash.

      [ 529.525487] Lustre: Unmounted scratch-client
      [ 529.525921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
      [ 529.534682] PGD 0 
      [ 529.536921] Oops: 0000 [#1] SMP NOPTI
      [ 529.541013] CPU: 11 PID: 1547 Comm: kworker/u289:8 Kdump: loaded Tainted: P OE --------- -t - 4.18.0-193.el8.x86_64 #1
      [ 529.554234] Hardware name: Supermicro SYS-420GP-TNAR-LC6-FL02T/X12DGO-6, BIOS 1.4 10/28/2022
      [ 529.563669] Workqueue: writeback wb_workfn
      [ 529.568247] RIP: 0010:wb_workfn+0x3b/0x400
      [ 529.572823] Code: 55 41 54 55 53 48 89 fb 48 83 ec 58 65 48 8b 04 25 28 00 00 00 48 89 44 24 50 31 c0 48 8b 87 70 fe ff ff 48 8b 80 18 04 00 00 <48> 8b 70 70 48 85 f6 75 04 48 8b 70 10 48 c7 c7 63 7b 8b a6 e8 4c
      [ 529.593814] RSP: 0018:ff5d96a64ff6fe10 EFLAGS: 00010246 
      [ 529.599654] RAX: 0000000000000000 RBX: ff426a0e091ee1e8 RCX: ff5d96a64e35bd68
      [ 529.607631] RDX: 0000000000000001 RSI: ff426a0e091ee1f0 RDI: ff426a0e091ee1e8
      [ 529.615608] RBP: ff42698fc7c1fc00 R08: 0000000000000008 R09: 000000000000006b
      [ 529.623584] R10: 8080808080808080 R11: 0000000000000010 R12: ff4269c83e451700
      [ 529.631560] R13: 0000000000000000 R14: ff426a0e091ee058 R15: ff426a0e091ee1f0
      [ 529.639535] FS: 0000000000000000(0000) GS:ff4269ce3f0c0000(0000) knlGS:0000000000000000
      [ 529.648579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 529.654999] CR2: 0000000000000070 CR3: 000000397d40a003 CR4: 0000000000761ee0
      [ 529.662975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 529.670951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 529.678926] PKRU: 55555554
      [ 529.681938] Call Trace:
      [ 529.684665] ? __switch_to_asm+0x41/0x70 
      [ 529.689049] ? __switch_to_asm+0x35/0x70
      [ 529.693432] ? __switch_to_asm+0x41/0x70
      [ 529.697817] ? __switch_to+0x7a/0x3f0
      [ 529.701909] process_one_work+0x1a7/0x3b0
      [ 529.706390] worker_thread+0x1cf/0x390
      [ 529.710580] ? create_worker+0x1a0/0x1a0
      [ 529.714962] kthread+0x112/0x130
      [ 529.718567] ? kthread_flush_work_fn+0x10/0x10
      [ 529.723534] ret_from_fork+0x1f/0x40 
      [ 529.727528] Modules linked in: mgc(OE) lustre(OE) mdc(OE) fid(OE) lov(OE) osc(OE) lmv(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) nft_chain_route_ipv4 xt_CHECKSUM nft_chain_nat_ipv4 ipt_MASQUERADE nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 tun bridge stp llc nvidia_peermem(POE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) socwatch2_15(OE) vtsspp(OE) sep5(OE) socperf3(OE) nvidia_drm(POE) pax(OE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload esp6 esp4_offload esp4 intel_rapl_msr intel_rapl_common bonding nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nf_log_ipv4 nf_log_common kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass iTCO_wdt iTCO_vendor_support nft_limit ipmi_ssif ses crct10dif_pclmul enclosure crc32_pclmul ghash_clmulni_intel joydev pcspkr nft_counter xt_LOG mei_me xt_limit i2c_i801 mei xt_state ioatdma xt_conntrack nf_conntrack ipmi_si ipmi_devintf nft_compat ipmi_msghandler nf_tables nfnetlink
      [ 529.727555] acpi_power_meter acpi_pad acpi_cpufreq sunrpc vfat fat knem(OE) ip_tables xfs libcrc32c mlx5_ib(OE) rndis_host cdc_ether usbnet mii ib_uverbs(OE) ib_core(OE) sg ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) crc32c_intel fb_sys_fops ttm ahci libahci nvme mlxfw(OE) drm mpt3sas igb libata tls(t) nvme_core raid_class dca psample scsi_transport_sas i2c_algo_bit mlx_compat(OE)
      [ 529.868285] CR2: 0000000000000070
      

      Attachments

        Activity

          [LU-17745] a regression in patch for LU-16954 for old RHEL kernel
          pjones Peter Jones added a comment -

          Merged for 2.16

          pjones Peter Jones added a comment - Merged for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54850/
          Subject: LU-17745 llite: fix the umount panic due to BDI unregister
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: facff17860ff9a577bad0bf8fb932e869475e011

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54850/ Subject: LU-17745 llite: fix the umount panic due to BDI unregister Project: fs/lustre-release Branch: master Current Patch Set: Commit: facff17860ff9a577bad0bf8fb932e869475e011
          [root@rl82 ~]# uname -r
          4.18.0-193.el8.x86_64
          [root@rl82 ~]# modprobe lustre
          [root@rl82 ~]# lctl get_param version 
          version=2.15.62_23_gbf21cf0
          [root@rl82 ~]# mount -t lustre 10.0.11.238@o2ib12:10.0.11.239@o2ib12:/exafs /exafs
          [root@rl82 ~]# umount -t lustre -a
          [root@rl82 ~]# lustre_rmmod  

          qian_wc patch (patchset3) worked and at least avoided crash on RHEL82 kernel when its unmount fileystem

          sihara Shuichi Ihara added a comment - [root@rl82 ~]# uname -r 4.18.0-193.el8.x86_64 [root@rl82 ~]# modprobe lustre [root@rl82 ~]# lctl get_param version  version=2.15.62_23_gbf21cf0 [root@rl82 ~]# mount -t lustre 10.0.11.238@o2ib12:10.0.11.239@o2ib12:/exafs /exafs [root@rl82 ~]# umount -t lustre -a [root@rl82 ~]# lustre_rmmod  qian_wc patch (patchset3) worked and at least avoided crash on RHEL82 kernel when its unmount fileystem

          "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54850
          Subject: LU-17745 llite: fix the umount panic due to BDI unregister
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 46deee76526a100eb64f4ea5fb06763fd7b1fcdf

          gerrit Gerrit Updater added a comment - "Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54850 Subject: LU-17745 llite: fix the umount panic due to BDI unregister Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 46deee76526a100eb64f4ea5fb06763fd7b1fcdf

          qian_wc  at least, we knew https://review.whamcloud.com/#/c/fs/lustre-release/+/51955/ is not needed for rhel8.2 kernel, why not check LC_HAVE_BDI_DEBUG_STATS with rhel8.2 kernel in autoconf whether if it's workable or not?

          sihara Shuichi Ihara added a comment - qian_wc   at least, we knew https://review.whamcloud.com/#/c/fs/lustre-release/+/51955/ is not needed for rhel8.2 kernel, why not check LC_HAVE_BDI_DEBUG_STATS with rhel8.2 kernel in autoconf whether if it's workable or not?

          People

            qian_wc Qian Yingjin
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: