Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.13.0, Lustre 2.12.5
-
None
-
3
-
9223372036854775807
Description
The following commands cause systematically a crash on the mds:
lfs quota -u|g|p $(( (1<<32) -1)) ... lfs set_quota -u|g|p $(( (1<<32) -1))...
The uid/gid value "(uit_t) -1" is a special value use for example as default value by chown(). The valid range of UID/GID is in range of 0 to 0xFFFFFFFE.
Ref: https://systemd.io/UIDS-GIDS/
So "lfs set_quota and "lfs quota" should consider 0xFFFFFFFF as an invalid quota id.
Here my crash console message:
[18075.268361] LustreError: Skipped 7 previous similar messages [19623.238824] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e [19623.239412] IP: [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0 [19623.240130] PGD 80000000d6a70067 PUD d6a71067 PMD 0 [19623.240633] Oops: 0000 [#1] SMP [19623.241189] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) mbcache jbd2 libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel snd_intel8x0 snd_ac97_codec ac97_bus snd_seq aesni_intel snd_seq_device lrw snd_pcm gf128mul glue_helper ablk_helper cryptd snd_timer sg snd pcspkr vboxguest(OE) soundcore i2c_piix4 parport_pc video parport ip_tables xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm ata_piix libahci crct10dif_pclmul crct10dif_common libata e1000 crc32c_intel [19623.245675] serio_raw drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dm_flakey] [19623.246811] CPU: 1 PID: 11998 Comm: lquota_wb_lustr Kdump: loaded Tainted: G OE ------------ 3.10.0-1127.10.1.el7_lustre_ajmes.x86_64 #1 [19623.248020] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [19623.248674] task: ffff90bc594541c0 ti: ffff90bbae1ac000 task.ti: ffff90bbae1ac000 [19623.249400] RIP: 0010:[<ffffffffba4bdb8d>] [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0 [19623.250018] RSP: 0018:ffff90bbae1afcd0 EFLAGS: 00010246 [19623.250558] RAX: 0000000000000000 RBX: ffff90bb44b0dac8 RCX: 0000000000000000 [19623.251190] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffff90bb44b0db38 [19623.251719] RBP: ffff90bbae1afce0 R08: ffff90bb9fc54000 R09: 0000000000000001 [19623.252364] R10: ffff90bc5fc9ff50 R11: ffffe6a5846571c0 R12: ffffffffffffffea [19623.252942] R13: ffff90bb4103e200 R14: 00000000ffffffff R15: ffff90bb9fc5b000 [19623.253460] FS: 0000000000000000(0000) GS:ffff90bc5fc80000(0000) knlGS:0000000000000000 [19623.254049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19623.254555] CR2: 000000000000007e CR3: 00000000d8290000 CR4: 00000000000606e0 [19623.255146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [19623.255638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [19623.256120] Call Trace: [19623.256597] [<ffffffffc108a945>] osd_acct_index_lookup+0x235/0x480 [osd_ldiskfs] [19623.257148] [<ffffffffc0fe071f>] lquota_disk_read+0x12f/0x390 [lquota] [19623.257621] [<ffffffffc0fe852a>] qsd_refresh_usage+0x6a/0x2f0 [lquota] [19623.258142] [<ffffffffc0fe8970>] qsd_lqe_read+0x1c0/0x5f0 [lquota] [19623.258592] [<ffffffffc0fe31e3>] lqe_locate_find+0x243/0x840 [lquota] [19623.259087] [<ffffffffc0ff0563>] qsd_upd_thread+0x853/0xc50 [lquota] [19623.259529] [<ffffffffc0fefd10>] ? qsd_upd_add+0xe0/0xe0 [lquota] [19623.260011] [<ffffffffba2c6691>] kthread+0xd1/0xe0 [19623.260439] [<ffffffffba486ac0>] ? end_buffer_async_read+0x130/0x130 [19623.260919] [<ffffffffba2c65c0>] ? insert_kthread_work+0x40/0x40 [19623.261415] [<ffffffffba992d37>] ret_from_fork_nospec_begin+0x21/0x21 [19623.261885] [<ffffffffba2c65c0>] ? insert_kthread_work+0x40/0x40 [19623.262292] Code: 01 00 00 89 d1 31 c0 c1 e9 03 f6 c2 04 f3 48 ab 0f 85 20 01 00 00 f6 c2 02 0f 85 ff 00 00 00 83 e2 01 0f 85 e6 00 00 00 c6 03 01 <41> 83 bc 24 94 00 00 00 01 19 c0 83 e0 fd 83 c0 04 88 43 01 49 [19623.263625] RIP [<ffffffffba4bdb8d>] dquot_get_dqblk+0x6d/0x1f0 [19623.264099] RSP <ffff90bbae1afcd0> [19623.264485] CR2: 000000000000007e
I don't think that I saw this crash when I developed the patch for the LU-12549 , I was on an another environnement.
Attachments
Issue Links
- duplicates
-
LU-13956 crash - kernel NULL pointer deference when setting project id to 4294967295
-
- Resolved
-
- is related to
-
LU-14740 LustreError: 76942:0:(qsd_entry.c:243:qsd_refresh_usage()) $$$ failed to read disk usage, rc:-3 qsd:lustre-MDT0000 qtype:prj id:4294967295 enforced:0 granted: 0 pending:0 waiting:0 req:0 usage: 0 qunit:0 qtune:0 edquot:0 default:yes
-
- Resolved
-
- is related to
-
LU-12549 Lustre project PID 32-bit overflow
-
- Resolved
-