[LU-16448] sanity-quota test_3b: crashes in qsd_reint_qpool Created: 06/Jan/23  Updated: 25/Oct/23  Resolved: 25/Oct/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.2, Lustre 2.15.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Sergey Cheremencev
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-16341 unable to handle kernel NULL in qmt_s... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4d6919df-da79-4983-9c74-f50438167572

test_3b failed with the following error:

onyx-198vm4 crashed during sanity-quota test_3b

2.15.2 RC2 https://build.whamcloud.com/job/lustre-b2_15/47/

[ 2340.837056] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
[ 2340.838387] PGD 0 P4D 0 
[ 2340.838845] Oops: 0000 [#1] SMP PTI
[ 2340.839477] CPU: 1 PID: 99680 Comm: qsd_reint_qpool Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-348.23.1.el8_lustre.x86_64 #1
[ 2340.841484] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 2340.842614] RIP: 0010:qmt_site_recalc_cb+0x2f9/0x7a0 [lquota]
[ 2340.843553] Code: 8d a8 60 04 00 00 48 c7 c6 20 27 52 c1 48 89 ef e8 cc e3 7a ff 48 85 c0 0f 84 72 02 00 00 48 63 80 58 04 00 00 49 8b 44 c5 00 <48> 83 b8 f8 00 00 00 00 0f 84 d5 fd ff ff 48 c7 c6 20 27 52 c1 48
[ 2340.846433] RSP: 0018:ffffb57cc31bfd80 EFLAGS: 00010286
[ 2340.847299] RAX: 0000000000000000 RBX: ffff8d58318b9d00 RCX: 0000000000000000
[ 2340.848439] RDX: ffff8d58241b7600 RSI: ffffffffc1522720 RDI: ffffb57cc31bfe20
[ 2340.849569] RBP: ffffb57cc31bfe20 R08: 000000000000039f R09: ffff8d58321c4500
[ 2340.850702] R10: ffffb57cc31bfd30 R11: ffff8d582433139d R12: ffff8d5832926800
[ 2340.851841] R13: ffff8d5832ecec60 R14: 0000000000000001 R15: 0000000000000002
[ 2340.852976] FS:  0000000000000000(0000) GS:ffff8d58bfd00000(0000) knlGS:0000000000000000
[ 2340.854254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2340.855186] CR2: 00000000000000f8 CR3: 000000005d410006 CR4: 00000000001706e0
[ 2340.856323] Call Trace:
[ 2340.856855]  ? qmt_pool_lqes_lookup_spec+0x340/0x340 [lquota]
[ 2340.857951]  cfs_hash_for_each_tight+0x121/0x310 [libcfs]
[ 2340.858866]  qmt_pool_recalc+0x375/0xa70 [lquota]
[ 2340.859729]  ? __schedule+0x2c5/0x760
[ 2340.860363]  ? qmt_sarr_get_idx+0x80/0x80 [lquota]
[ 2340.861161]  ? qmt_sarr_get_idx+0x80/0x80 [lquota]
[ 2340.862023]  kthread+0x116/0x130
[ 2340.862612]  ? kthread_flush_work_fn+0x10/0x10
[ 2340.863370]  ret_from_fork+0x35/0x40
[ 2340.864024] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) intel_rapl_msr intel_rapl_common spl(OE) crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix virtio_net libata crc32c_intel virtio_blk serio_raw net_failover failover
[ 2340.872881] CR2: 00000000000000f8

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-quota test_3b - onyx-198vm4 crashed during sanity-quota test_3b



 Comments   
Comment by Peter Jones [ 06/Jan/23 ]

Sergei

Could you please investigate?

Thanks

Peter

Comment by Sergey Cheremencev [ 10/Jan/23 ]

I was looking at the crash dump. It is definitely LU-16341 that has already had a fix. There is -1 from maloo. I'l take a look what I can do to move it forward.

Comment by Peter Jones [ 25/Oct/23 ]

Duplicate of LU-16341

Generated at Sat Feb 10 03:27:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.