[LU-16129] on umount: BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 Created: 31/Aug/22 Updated: 25/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0, Lustre 2.15.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Robert Redl | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Kernel: 4.18.0-372.19.1.el8_6.x86_64 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
On some clients we started to see crashes like this one: [ 3245.563036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [ 3245.563067] PGD 0 P4D 0 [ 3245.563075] Oops: 0000 [#1] SMP NOPTI [ 3245.563085] CPU: 0 PID: 21272 Comm: ldlm_bl_05 Kdump: loaded Tainted: P OE --------- - - 4.18.0-372.19.1.el8_6.x86_64 #1 [ 3245.563110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 3245.563130] RIP: 0010:ll_lock_cancel_bits+0x34f/0x920 [lustre] [ 3245.563167] Code: af d8 48 89 c5 48 85 c0 74 10 48 89 c7 e8 59 fa ff ff 48 89 ef e8 f1 a3 af d8 48 8b 04 24 a8 11 74 24 48 8b 43 28 48 8b 40 68 <48> 3b 58 30 74 0e 48 89 df e8 93 8e fb ff f6 04 24 11 74 08 48 89 [ 3245.563201] RSP: 0018:ffffb1cb07e5fd20 EFLAGS: 00010202 [ 3245.563213] RAX: 0000000000000000 RBX: ffff970add7f5ca0 RCX: 0000000000000000 [ 3245.563227] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff970add7f5d28 [ 3245.563240] RBP: ffff970add7f5c00 R08: ffffb1cb07e5faa0 R09: 0000000000000000 [ 3245.563253] R10: 0000000000000000 R11: ffff970a8602a800 R12: 0000000000000012 [ 3245.563266] R13: 0000000000000000 R14: ffff970d7445a400 R15: ffff970d74458cf8 [ 3245.563281] FS: 0000000000000000(0000) GS:ffff970dafc00000(0000) knlGS:0000000000000000 [ 3245.563296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3245.563308] CR2: 0000000000000030 CR3: 0000000091410003 CR4: 0000000000770ef0 [ 3245.563324] PKRU: 55555554 [ 3245.563331] Call Trace: [ 3245.563342] ? __wake_up_common_lock+0x89/0xc0 [ 3245.563354] ll_md_blocking_ast+0x198/0x2f0 [lustre] [ 3245.563384] ldlm_cancel_callback+0x7b/0x250 [ptlrpc] [ 3245.563446] ldlm_cli_cancel_local+0xcb/0x440 [ptlrpc] [ 3245.563506] ldlm_cli_cancel_list_local+0x108/0x300 [ptlrpc] [ 3245.563575] ldlm_bl_thread_main+0x832/0x920 [ptlrpc] [ 3245.563636] ? finish_wait+0x80/0x80 [ 3245.563645] ? ldlm_handle_bl_callback+0x3f0/0x3f0 [ptlrpc] [ 3245.563704] kthread+0x10a/0x120 [ 3245.563733] ? set_kthread_struct+0x40/0x40 [ 3245.563744] ret_from_fork+0x35/0x40 [ 3245.563755] Modules linked in: binfmt_misc mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ptlrpc(OE) ksocklnd(OE) obdclass(OE) lnet(OE) libcfs(OE) sunrpc intel_rapl_msr intel_rapl_common amd_energy kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr joydev i2c_piix4 ext4 mbcache jbd2 xfs libcrc32c sr_mod cdrom ata_generic bochs_drm drm_vram_helper sd_mod drm_kms_helper t10_pi sg syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm drm ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_console failover virtio_scsi dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) [ 3245.563904] CR2: 0000000000000030 This seems to happen when umount is executed, but I'm not 100% sure about that. |
| Comments |
| Comment by Robert Redl [ 31/Aug/22 ] |
|
I just saw that this looks a lot like |
| Comment by Etienne Aujames [ 01/Sep/22 ] |
|
Hello, The patch https://review.whamcloud.com/47086 (" |
| Comment by Robert Redl [ 01/Sep/22 ] |
|
Dear Etienne, thanks for pointing out that the patch for |
| Comment by James A Simmons [ 25/Aug/23 ] |
|
|