[LU-13587] sanity-quota test_68: Oops: RIP: qpi_state_seq_show+0x86/0xe0 [lquota] Created: 19/May/20  Updated: 22/Apr/22  Resolved: 06/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11023 OST Pool Quotas Resolved
is related to LU-15461 sanity-quota test_79: 'test_79 failed... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/e4263a98-dc1d-45bd-a14b-079369daac21

test_68 failed with the following error:

MDS crashed during sanity-quota test_68:

Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n qmt.lustre-QMT0000.dt-qpool1.info
BUG: unable to handle kernel NULL pointer dereference at           (null)         
IP: [<ffffffffc14b5216>] qpi_state_seq_show+0x86/0xe0 [lquota]
PGD 800000005ec8d067 PUD 79c98067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zlua(POE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt rpcrdma rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx4_ib sunrpc ib_uverbs ib_core dm_mod iosf_mbi crc32_pclmul ppdev ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev i2c_piix4 virtio_balloon pcspkr parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi mlx4_en
 ptp pps_core mlx4_core virtio_blk ata_piix 8139too crct10dif_pclmul crct10dif_common libata crc32c_intel devlink serio_raw virtio_pci virtio_ring virtio 8139cp mii floppy
CPU: 1 PID: 28101 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-1062.9.1.el7_lustre.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
task: ffff9f3b9a8ea0e0 ti: ffff9f3b9f36c000 task.ti: ffff9f3b9f36c000
RIP: 0010:[<ffffffffc14b5216>]  [<ffffffffc14b5216>] qpi_state_seq_show+0x86/0xe0 [lquota]
RSP: 0018:ffff9f3b9f36fe28  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
RDX: ffffffffc14c0b2d RSI: ffffffffc14c5e08 RDI: ffff9f3b8b4ef180
RBP: ffff9f3b9f36fe40 R08: 000000000000000a R09: 000000000000fffe
R10: 0000000000000000 R11: ffff9f3b9f36fcbe R12: ffff9f3b91286800
R13: ffff9f3b8b4ef180 R14: ffff9f3b9f36ff18 R15: ffff9f3b8b4ef180
FS:  00007f86d2e0e740(0000) GS:ffff9f3bbfd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000004bb4e000 CR4: 00000000000606e0
Call Trace:
 [<ffffffff8de72780>] seq_read+0x130/0x440
 [<ffffffff8dec2d00>] proc_reg_read+0x40/0x80
 [<ffffffff8de4a65f>] vfs_read+0x9f/0x170
 [<ffffffff8de4b51f>] SyS_read+0x7f/0xf0
 [<ffffffff8e38de21>] ? system_call_after_swapgs+0xae/0x146
 [<ffffffff8e38dede>] system_call_fastpath+0x25/0x2a
 [<ffffffff8e38de21>] ? system_call_after_swapgs+0xae/0x146
Code: 5c a8 01 00 00 41 8b 8c 1c c0 01 00 00 48 c7 c6 08 5e 4c c1 41 03 8c 1c cc 01 00 00 48 8b 94 1b e0 eb 4b c1 4c 89 ef 48 83 c3 04 <48> 8b 00 44 8b 40 40 31 c0 e8 1c cc 9b cc 48 83 fb 0c 75 be 5b
RIP  [<ffffffffc14b5216>] qpi_state_seq_show+0x86/0xe0 [lquota]
 RSP <ffff9f3b9f36fe28>
CR2: 0000000000000000

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-quota test_68 - trevis-25vm4 crashed during sanity-quota test_68



 Comments   
Comment by Jian Yu [ 19/May/20 ]

One more instance on master branch:
https://testing.whamcloud.com/test_sets/d220a3d4-9f74-4ef9-9b8a-ccbf2d9657cb

Comment by John Hammond [ 04/Feb/21 ]

This is very easy to reproduce by getting the pool info files in a loop and creating a new pool.

When the info proc file is registered (in qmt_pool_alloc()), the qpi_site pointers are still NULL. They are not set until later in qmt_pool_prepare().

The test was added by LU-11023 quota: quota pools for OSTs.

Comment by Gerrit Updater [ 11/Jun/21 ]

Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/43986
Subject: LU-13587 tests: create pool vs access its info
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 236d4cd8336b995f8a24f5c2a1240b79353fb7b6

Comment by Gerrit Updater [ 11/Jun/21 ]

Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/43987
Subject: LU-13587 quota: protect qpi in proc
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cdb742ff6383171cc23a003af2f2ccfdfd7a3528

Comment by Gerrit Updater [ 06/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43987/
Subject: LU-13587 quota: protect qpi in proc
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c9901b68b44c9c6b8713a74c28f78137dca314ce

Comment by Cory Spitz [ 06/Jan/22 ]

Fixed for 2.15.0.

https://review.whamcloud.com/#/c/43986/ remains, but it is to be abandoned since the test it added was included in https://review.whamcloud.com/#/c/43987/.

Generated at Sat Feb 10 03:02:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.