[LU-13956] crash - kernel NULL pointer deference when setting project id to 4294967295 Created: 11/Sep/20 Updated: 16/Sep/20 Resolved: 15/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Matt Rásó-Barnett (Inactive) | Assignee: | Wang Shilong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
kernel: 3.10.0-1127.8.2.el7_lustre |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Hello, I only stumbled on it, as I was curious what the upper limit of project IDs would be, so tried this value and get a MDS crash. I attach the vmcore-dmesg.txt file - I can supply a vmcore file too if requested. Is this a kernel issue rather than a lustre issue? Obviously this isn't a major issue, but I just thought I'd raise the bug report in case it's a simple fix. Cheers, |
| Comments |
| Comment by Peter Jones [ 11/Sep/20 ] |
|
Shilong As Matt suggests, this is relatively low priority but is likely a simple thing to tidy up Peter |
| Comment by Wang Shilong (Inactive) [ 11/Sep/20 ] |
|
Would you mind sharing steps to reproduce the problem: It looks working for me: |
| Comment by Matt Rásó-Barnett (Inactive) [ 11/Sep/20 ] |
|
Interesting, I didn't know about 'lfs project' - I get the same issue with that command as well though, just running what you showed exactly. Perhaps there is something with my setup then, I'm using RHEL 7.8, 3.10.0-1127.8.2.el7_lustre, Lustre 2.12.5. I'll redeploy this filesystem and see if the issue goes away. Thanks for checking it for me. |
| Comment by Andreas Dilger [ 12/Sep/20 ] |
|
Matt, there was a patch landed recently that may have hidden this? commit 3d9900e78e180a211c50ea1030fa147c5a330f22
Author: Etienne AUJAMES <eaujames@ddn.com>
LU-12549 utils: Check range of quota ID for "lfs" arguments
strtoul function return a 64bits value on a 64bits system, so an
overflow occurs when we store user value into a quota/project
structure.
This commit apply the same 32 bits verification for "lfs" project,
quota,setquota and find commands on uid, gid and project id arguments.
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6
Reviewed-on: https://review.whamcloud.com/38938
However, that only affects the user tools. If the MDS crashes due to bad input, that should be fixed as well. Could you please attach reproducer steps and a stack trace, so that the MDS can be suitably hardened. |
| Comment by Wang Shilong (Inactive) [ 12/Sep/20 ] |
|
Maybe i should try b2_12, at least i tried to revert " |
| Comment by Andreas Dilger [ 13/Sep/20 ] |
|
Sorry, I didn't see the vmcore file, it already has the stack: [ 1160.328702] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e [ 1160.336576] IP: [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0 [ 1160.343128] PGD 0 [ 1160.345166] Oops: 0000 [#1] SMP [ 1160.460913] CPU: 21 PID: 5663 Comm: mdt01_012 3.10.0-1127.8.2.el7_lustre.x86_64 #1 [ 1160.488092] RIP: 0010:[<ffffffffa14bbe26>] [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0 [ 1160.575912] Call Trace: [ 1160.578361] [<ffffffffa14bf1ac>] __dquot_transfer+0x32c/0x510 [ 1160.630492] [<ffffffffc16e354f>] osd_transfer_project+0x14f/0x1a0 [osd_ldiskfs] [ 1160.638470] [<ffffffffc16e3630>] osd_quota_transfer+0x90/0x230 [osd_ldiskfs] [ 1160.653446] [<ffffffffc16f0d3f>] osd_attr_set+0x11f/0xb90 [osd_ldiskfs] [ 1160.660718] [<ffffffffc198ab68>] lod_sub_attr_set+0x1c8/0x460 [lod] [ 1160.675739] [<ffffffffc197370a>] lod_attr_set+0xba/0x9e0 [lod] [ 1160.689502] [<ffffffffc19f24d0>] mdd_attr_set_internal+0x120/0x2a0 [mdd] [ 1160.696819] [<ffffffffc19f4f08>] mdd_attr_set+0x928/0xda0 [mdd] [ 1160.711153] [<ffffffffc18a4bcb>] mdt_reint_setattr+0x9db/0x1290 [mdt] [ 1160.718202] [<ffffffffc18a6963>] mdt_reint_rec+0x83/0x210 [mdt] [ 1160.724713] [<ffffffffc1883273>] mdt_reint_internal+0x6e3/0xaf0 [mdt] [ 1160.731738] [<ffffffffc188e6e7>] mdt_reint+0x67/0x140 [mdt] [ 1160.737905] [<ffffffffc14799da>] tgt_request_handle+0xada/0x1570 [ptlrpc] [ 1160.760792] [<ffffffffc141e48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 1160.781758] [<ffffffffc1421df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] |
| Comment by Etienne Aujames [ 15/Sep/20 ] |
|
Hello, I have already created a ticket on the subject: (patch: https://review.whamcloud.com/39559) |
| Comment by Matt Rásó-Barnett (Inactive) [ 15/Sep/20 ] |
|
Ah thanks for pointing that out. Happy for this to be closed as dupe of that. |