Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16129

on umount: BUG: unable to handle kernel NULL pointer dereference at 0000000000000030

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.15.0, Lustre 2.15.1
    • None
    • Kernel: 4.18.0-372.19.1.el8_6.x86_64
    • 3
    • 9223372036854775807

    Description

      On some clients we started to see crashes like this one:

      [ 3245.563036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [ 3245.563067] PGD 0 P4D 0 
      [ 3245.563075] Oops: 0000 [#1] SMP NOPTI
      [ 3245.563085] CPU: 0 PID: 21272 Comm: ldlm_bl_05 Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-372.19.1.el8_6.x86_64 #1
      [ 3245.563110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      [ 3245.563130] RIP: 0010:ll_lock_cancel_bits+0x34f/0x920 [lustre]
      [ 3245.563167] Code: af d8 48 89 c5 48 85 c0 74 10 48 89 c7 e8 59 fa ff ff 48 89 ef e8 f1 a3 af d8 48 8b 04 24 a8 11 74 24 48 8b 43 28 48 8b 40 68 <48> 3b 58 30 74 0e 48 89 df e8 93 8e fb ff f6 04 24 11 74 08 48 89
      [ 3245.563201] RSP: 0018:ffffb1cb07e5fd20 EFLAGS: 00010202
      [ 3245.563213] RAX: 0000000000000000 RBX: ffff970add7f5ca0 RCX: 0000000000000000
      [ 3245.563227] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff970add7f5d28
      [ 3245.563240] RBP: ffff970add7f5c00 R08: ffffb1cb07e5faa0 R09: 0000000000000000
      [ 3245.563253] R10: 0000000000000000 R11: ffff970a8602a800 R12: 0000000000000012
      [ 3245.563266] R13: 0000000000000000 R14: ffff970d7445a400 R15: ffff970d74458cf8
      [ 3245.563281] FS:  0000000000000000(0000) GS:ffff970dafc00000(0000) knlGS:0000000000000000
      [ 3245.563296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3245.563308] CR2: 0000000000000030 CR3: 0000000091410003 CR4: 0000000000770ef0
      [ 3245.563324] PKRU: 55555554
      [ 3245.563331] Call Trace:
      [ 3245.563342]  ? __wake_up_common_lock+0x89/0xc0
      [ 3245.563354]  ll_md_blocking_ast+0x198/0x2f0 [lustre]
      [ 3245.563384]  ldlm_cancel_callback+0x7b/0x250 [ptlrpc]
      [ 3245.563446]  ldlm_cli_cancel_local+0xcb/0x440 [ptlrpc]
      [ 3245.563506]  ldlm_cli_cancel_list_local+0x108/0x300 [ptlrpc]
      [ 3245.563575]  ldlm_bl_thread_main+0x832/0x920 [ptlrpc]
      [ 3245.563636]  ? finish_wait+0x80/0x80
      [ 3245.563645]  ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc]
      [ 3245.563704]  kthread+0x10a/0x120
      [ 3245.563733]  ? set_kthread_struct+0x40/0x40
      [ 3245.563744]  ret_from_fork+0x35/0x40
      [ 3245.563755] Modules linked in: binfmt_misc mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ptlrpc(OE) ksocklnd(OE) obdclass(OE) lnet(OE) libcfs(OE) sunrpc intel_rapl_msr intel_rapl_common amd_energy kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 ext4 mbcache jbd2 xfs libcrc32c sr_mod cdrom ata_generic bochs_drm drm_vram_helper sd_mod drm_kms_helper t10_pi sg syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm drm ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_console failover virtio_scsi dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE)
      [ 3245.563904] CR2: 0000000000000030 

      This seems to happen when umount is executed, but I'm not 100% sure about that.

      Attachments

        Activity

          People

            wc-triage WC Triage
            rredl Robert Redl
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: