Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.14.0
-
3
-
9223372036854775807
Description
We’ve only seen this crash twice; only in parallel-scale-nfsv4 test compilebench:
2021-01-22: x86_64 clients - https://testing.whamcloud.com/test_sets/68ed07f9-eb1d-459c-b327-269bd996d449
2021-02-11: ARM clients - https://testing.whamcloud.com/test_sets/6aae8467-c5e3-4547-aefa-04f220cf4042
Looking at the first failure above, we see in the kernel-crash
[47446.646161] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench ============================================== 19:11:27 (1611342687) [47447.194275] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.1394887 -i 2 -r 2 --makej [47447.620295] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.1394887 -i 2 -r 2 --makej [48153.651976] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [48153.653435] PGD 0 P4D 0 [48153.653864] Oops: 0000 [#1] SMP PTI [48153.654527] CPU: 0 PID: 1485996 Comm: qmt_reba_lustre Kdump: loaded Tainted: G OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [48153.656584] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [48153.657600] RIP: 0010:qmt_id_lock_cb+0x69/0x100 [lquota] [48153.658462] Code: 48 8b 53 20 8b 4a 0c 85 c9 74 74 89 c1 48 8b 42 18 83 78 10 02 75 0a 83 e1 01 b8 01 00 00 00 74 17 48 63 44 24 04 48 c1 e0 04 <48> 03 45 00 f6 40 08 0c 0f 95 c0 0f b6 c0 48 8b 4c 24 08 65 48 33 [48153.661475] RSP: 0018:ffffbf43c0c5bde8 EFLAGS: 00010246 [48153.662317] RAX: 0000000000000000 RBX: ffff9fbe4b55e000 RCX: 0000000000000000 [48153.663454] RDX: ffff9fbe71e8f7a0 RSI: 0000000000000000 RDI: ffff9fbe47c2e862 [48153.664587] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004 [48153.665717] R10: 0000000000000010 R11: f000000000000000 R12: ffff9fbe4b55e000 [48153.666855] R13: ffff9fbe3133be60 R14: ffff9fbe4ebacb98 R15: ffff9fbe4ebacb40 [48153.667999] FS: 0000000000000000(0000) GS:ffff9fbe7fc00000(0000) knlGS:0000000000000000 [48153.669283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [48153.670208] CR2: 0000000000000000 CR3: 0000000059c0a002 CR4: 00000000000606f0 [48153.671348] Call Trace: [48153.671787] ? cfs_cdebug_show.part.2.constprop.22+0x20/0x20 [lquota] [48153.672831] qmt_glimpse_lock.isra.19+0x27e/0xfb0 [lquota] [48153.673726] qmt_reba_thread+0x5da/0x9b0 [lquota] [48153.674503] ? qmt_glimpse_lock.isra.19+0xfb0/0xfb0 [lquota] [48153.675454] kthread+0x112/0x130 [48153.676009] ? kthread_flush_work_fn+0x10/0x10 [48153.676745] ret_from_fork+0x35/0x40 [48153.677349] Modules linked in: nfsd nfs_acl lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic 8139too ata_piix crc32c_intel libata serio_raw 8139cp mii virtio_blk [last unloaded: dm_flakey] [48153.688461] CR2: 0000000000000000