[LU-15085] racer: BUG: unable to handle kernel NULL pointer dereference at 000000000000 RIP: 0010:lod_striping_load+0x3d3/0x740 [lod] Created: 12/Oct/21  Updated: 23/Jun/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Elena Gryaznova Assignee: WC Triage
Resolution: Unresolved Votes: 1
Labels: None

Attachments: Zip Archive 1633982628-racer-FOFB-dectet_L300-1.zip.zip    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[ 4536.955717] LustreError: 14849:0:(qsd_handler.c:900:qsd_op_begin()) lustre-MDT0000: more than 8 qids enforced for a transaction?
[ 4536.961710] LustreError: 14849:0:(qsd_handler.c:900:qsd_op_begin()) Skipped 1 previous similar message
[ 4537.027189] LustreError: 14836:0:(osp_object.c:629:osp_attr_get()) lustre-MDT0002-osp-MDT0000: osp_attr_get update error [0x280000bd8:0x4b:0x0]: rc = -5
[ 4537.036021] LustreError: 14836:0:(osp_object.c:629:osp_attr_get()) Skipped 3 previous similar messages
[ 4537.107275] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 4537.110734] PGD 0 P4D 0 
[ 4537.112695] Oops: 0000 [#1] SMP PTI
[ 4537.114872] CPU: 0 PID: 14843 Comm: mdt00_010 Tainted: G           OE    ---------r-t - 4.18.0-193.19.1.el8_2.x86_64 #1
[ 4537.118612] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 4537.121327] RIP: 0010:lod_striping_load+0x3d3/0x740 [lod]
[ 4537.123810] Code: c7 c6 98 b8 90 c1 48 c7 c7 80 f5 91 c1 c7 05 78 1d 05 00 01 00 00 00 49 89 c9 49 89 c8 e8 75 92 26 ff e9 53 fd ff ff 49 8b 06 <81> 38 d0 0c d5 0c 0f 84 8b 01 00 00 83 fb 37 0f 8e 3d 01 00 00 49
[ 4537.130556] RSP: 0018:ffffa9ed40b37aa0 EFLAGS: 00010286
[ 4537.132983] RAX: 0000000000000000 RBX: 00000000fffffffb RCX: 0000000000028306
[ 4537.136141] RDX: 0000000000028305 RSI: 0000000000000002 RDI: ffff932b47c03200
[ 4537.138996] RBP: ffff932b43e58da8 R08: 000000000002e0e0 R09: ffffffffc19f5f45
[ 4537.141823] R10: fffff1afc4248a00 R11: ffff932b45bd6dd2 R12: ffff932b7a7edb00
[ 4537.144620] R13: ffff932b43e58de8 R14: ffff932b4297c000 R15: ffff932b4297c010
[ 4537.147470] FS:  0000000000000000(0000) GS:ffff932b7ba00000(0000) knlGS:0000000000000000
[ 4537.150482] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4537.152909] CR2: 0000000000000000 CR3: 0000000109118000 CR4: 00000000000006f0
[ 4537.155682] Call Trace:
[ 4537.157420]  lod_index_try+0x93/0x300 [lod]
[ 4537.159689]  dt_try_as_dir+0x33/0x50 [obdclass]
[ 4537.161872]  __mdd_lookup.isra.19+0x267/0x3e0 [mdd]
[ 4537.164067]  mdd_parent_fid+0x120/0x450 [mdd]
[ 4537.166095]  mdd_is_subdir+0x298/0x3e0 [mdd]
[ 4537.168415]  mdt_reint_rename+0x765/0x1ff0 [mdt]
[ 4537.171185]  ? tgt_check_lookup_req+0xf0/0x240 [ptlrpc]
[ 4537.173665]  mdt_reint_rec+0x117/0x270 [mdt]
[ 4537.175836]  mdt_reint_internal+0x4bc/0x7d0 [mdt]
[ 4537.178119]  mdt_reint+0x5d/0x110 [mdt]
[ 4537.180164]  tgt_request_handle+0xc93/0x1a00 [ptlrpc]
[ 4537.182550]  ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
[ 4537.184125]  ptlrpc_main+0xc06/0x1550 [ptlrpc]
[ 4537.185406]  ? __schedule+0x257/0x650
[ 4537.186567]  ? ptlrpc_wait_event+0x500/0x500 [ptlrpc]
[ 4537.187918]  kthread+0x112/0x130
[ 4537.188953]  ? kthread_flush_work_fn+0x10/0x10
[ 4537.190177]  ret_from_fork+0x35/0x40
[ 4537.191319] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) crc32_generic rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma(OE) xprtrdma(OE) ib_isert(OE) ib_iser(OE) ib_srpt(OE) ib_srp(OE) rdma_ucm(OE) ib_ipoib(OE) ib_ucm(OE) rdma_cm(OE) ib_cm(OE) iw_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) tls(t) mlx_compat(OE) mlxfw cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev pcspkr virtio_balloon i2c_piix4 sunrpc binfmt_misc ip_tables ext4 mbcache jbd2 ata_generic ata_piix libata e1000 serio_raw virtio_blk [last unloaded: libcfs]
[ 4537.208022] CR2: 0000000000000000
[ 4537.209102] ---[ end trace 1998d379c48b58a5 ]---
[ 4537.210372] RIP: 0010:lod_striping_load+0x3d3/0x740 [lod]
[ 4537.211836] Code: c7 c6 98 b8 90 c1 48 c7 c7 80 f5 91 c1 c7 05 78 1d 05 00 01 00 00 00 49 89 c9 49 89 c8 e8 75 92 26 ff e9 53 fd ff ff 49 8b 06 <81> 38 d0 0c d5 0c 0f 84 8b 01 00 00 83 fb 37 0f 8e 3d 01 00 00 49
[ 4537.215933] RSP: 0018:ffffa9ed40b37aa0 EFLAGS: 00010286
[ 4537.217276] RAX: 0000000000000000 RBX: 00000000fffffffb RCX: 0000000000028306
[ 4537.218932] RDX: 0000000000028305 RSI: 0000000000000002 RDI: ffff932b47c03200
[ 4537.220606] RBP: ffff932b43e58da8 R08: 000000000002e0e0 R09: ffffffffc19f5f45
[ 4537.222259] R10: fffff1afc4248a00 R11: ffff932b45bd6dd2 R12: ffff932b7a7edb00
[ 4537.223915] R13: ffff932b43e58de8 R14: ffff932b4297c000 R15: ffff932b4297c010
[ 4537.225558] FS:  0000000000000000(0000) GS:ffff932b7ba00000(0000) knlGS:0000000000000000
[ 4537.227333] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4537.228727] CR2: 0000000000000000 CR3: 0000000109118000 CR4: 00000000000006f0
[ 4537.230353] Kernel panic - not syncing: Fatal exception
[ 4537.232028] Kernel Offset: 0x6600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 4537.234256] ---[ end Kernel panic - not syncing: Fatal exception ]---



 Comments   
Comment by Robert Redl [ 23/Jun/22 ]

I get theĀ 

LustreError: 14849:0:(qsd_handler.c:900:qsd_op_begin()) lustre-MDT0000: more than 8 qids enforced for a transaction?

when i try to delete some files. Not all files are affected, but only a small number. it seems to be random which are affected and which not. But those affected are permanently affected. The following PANIC of the MDT is not happening for me.

System: Lustre 2.15.0 with ZFS 2.0.7 backend.

The client sees following error:

rm affected-file
rm: remove regular file 'affected-file'? y
rm: cannot remove 'affected-file': Invalid argument
Generated at Sat Feb 10 03:15:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.