[LU-13845] Kernel crash on: lfs quota -u $(( (1<<32) -1)) Created: 31/Jul/20  Updated: 13/Jul/21  Resolved: 19/Oct/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.5
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Etienne Aujames Assignee: Etienne Aujames
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-13956 crash - kernel NULL pointer deference... Resolved
Related
is related to LU-12549 Lustre project PID 32-bit overflow Resolved
is related to LU-14740 LustreError: 76942:0:(qsd_entry.c:243... Resolved
Epic/Theme: quotas
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The following commands cause systematically a crash on the mds:

 

lfs quota -u|g|p $(( (1<<32) -1)) ... 
lfs set_quota -u|g|p $(( (1<<32) -1))...

The uid/gid value "(uit_t) -1" is a special value use for example as default value by chown(). The valid range of UID/GID is in range of 0 to 0xFFFFFFFE.
Ref: https://systemd.io/UIDS-GIDS/

 

So "lfs set_quota and "lfs quota" should consider 0xFFFFFFFF as an invalid quota id.

Here my crash console message:

[18075.268361] LustreError: Skipped 7 previous similar messages
[19623.238824] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e
[19623.239412] IP: [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0
[19623.240130] PGD 80000000d6a70067 PUD d6a71067 PMD 0
[19623.240633] Oops: 0000 [#1] SMP 
[19623.241189] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) mbcache jbd2 libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq aesni_intel snd_seq_device lrw snd_pcm gf128mul glue_helper ablk_helper cryptd snd_timer sg snd pcspkr vboxguest(OE) soundcore i2c_piix4 parport_pc video parport ip_tables xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm ata_piix libahci crct10dif_pclmul crct10dif_common libata e1000 crc32c_intel
[19623.245675] serio_raw drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dm_flakey]
[19623.246811] CPU: 1 PID: 11998 Comm: lquota_wb_lustr Kdump: loaded Tainted: G OE ------------ 3.10.0-1127.10.1.el7_lustre_ajmes.x86_64 #1
[19623.248020] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[19623.248674] task: ffff90bc594541c0 ti: ffff90bbae1ac000 task.ti: ffff90bbae1ac000
[19623.249400] RIP: 0010:[<ffffffffba4bdb8d>] [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0
[19623.250018] RSP: 0018:ffff90bbae1afcd0 EFLAGS: 00010246
[19623.250558] RAX: 0000000000000000 RBX: ffff90bb44b0dac8 RCX: 0000000000000000
[19623.251190] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff90bb44b0db38
[19623.251719] RBP: ffff90bbae1afce0 R08: ffff90bb9fc54000 R09: 0000000000000001
[19623.252364] R10: ffff90bc5fc9ff50 R11: ffffe6a5846571c0 R12: ffffffffffffffea
[19623.252942] R13: ffff90bb4103e200 R14: 00000000ffffffff R15: ffff90bb9fc5b000
[19623.253460] FS: 0000000000000000(0000) GS:ffff90bc5fc80000(0000) knlGS:0000000000000000
[19623.254049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19623.254555] CR2: 000000000000007e CR3: 00000000d8290000 CR4: 00000000000606e0
[19623.255146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19623.255638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19623.256120] Call Trace:
[19623.256597] [<ffffffffc108a945>] osd_acct_index_lookup+0x235/0x480 [osd_ldiskfs]
[19623.257148] [<ffffffffc0fe071f>] lquota_disk_read+0x12f/0x390 [lquota]
[19623.257621] [<ffffffffc0fe852a>] qsd_refresh_usage+0x6a/0x2f0 [lquota]
[19623.258142] [<ffffffffc0fe8970>] qsd_lqe_read+0x1c0/0x5f0 [lquota]
[19623.258592] [<ffffffffc0fe31e3>] lqe_locate_find+0x243/0x840 [lquota]
[19623.259087] [<ffffffffc0ff0563>] qsd_upd_thread+0x853/0xc50 [lquota]
[19623.259529] [<ffffffffc0fefd10>] ? qsd_upd_add+0xe0/0xe0 [lquota]
[19623.260011] [<ffffffffba2c6691>] kthread+0xd1/0xe0
[19623.260439] [<ffffffffba486ac0>] ? end_buffer_async_read+0x130/0x130
[19623.260919] [<ffffffffba2c65c0>] ? insert_kthread_work+0x40/0x40
[19623.261415] [<ffffffffba992d37>] ret_from_fork_nospec_begin+0x21/0x21
[19623.261885] [<ffffffffba2c65c0>] ? insert_kthread_work+0x40/0x40
[19623.262292] Code: 01 00 00 89 d1 31 c0 c1 e9 03 f6 c2 04 f3 48 ab 0f 85 20 01 00 00 f6 c2 02 0f 85 ff 00 00 00 83 e2 01 0f 85 e6 00 00 00 c6 03 01 <41> 83 bc 24 94 00 00 00 01 19 c0 83 e0 fd 83 c0 04 88 43 01 49
[19623.263625] RIP [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0
[19623.264099] RSP <ffff90bbae1afcd0>
[19623.264485] CR2: 000000000000007e

 

I don't think that I saw this crash when I developed the patch for the LU-12549 , I was on an another environnement.



 Comments   
Comment by Gerrit Updater [ 31/Jul/20 ]

Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/39559
Subject: LU-13845 utils: Quota id 0xFFFFFFFF is invalid
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 194c0fc37dced0ae4c17dabf958180a9aa577c71

Comment by Etienne Aujames [ 16/Sep/20 ]

Set the priority to 'Major ' to match the priority of LU-13956.

Comment by Gerrit Updater [ 19/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39559/
Subject: LU-13845 utils: Quota id 0xFFFFFFFF is invalid
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7b5c1f1404c32a922915742287371f2d137c6392

Comment by Peter Jones [ 19/Oct/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:04:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.