[LU-13956] crash - kernel NULL pointer deference when setting project id to 4294967295 Created: 11/Sep/20  Updated: 16/Sep/20  Resolved: 15/Sep/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Matt Rásó-Barnett (Inactive) Assignee: Wang Shilong (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

kernel: 3.10.0-1127.8.2.el7_lustre
e2fsprogs:


Attachments: Text File vmcore-dmesg.txt    
Issue Links:
Duplicate
is duplicated by LU-13845 Kernel crash on: lfs quota -u $(( (1<... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Hello,
This is likely not very important as it's a contrived situation, but I am reliably able to crash an MDS running 2.12.5, by setting the project id on a file to '4294967295'.

I only stumbled on it, as I was curious what the upper limit of project IDs would be, so tried this value and get a MDS crash.

I attach the vmcore-dmesg.txt file - I can supply a vmcore file too if requested. Is this a kernel issue rather than a lustre issue?

Obviously this isn't a major issue, but I just thought I'd raise the bug report in case it's a simple fix.

Cheers,
Matt



 Comments   
Comment by Peter Jones [ 11/Sep/20 ]

Shilong

As Matt suggests, this is relatively low priority but is likely a simple thing to tidy up

Peter

Comment by Wang Shilong (Inactive) [ 11/Sep/20 ]

Would you mind sharing steps to reproduce the problem:

It looks working for me:
[root@server_el7_vm1 lustre]# lfs project -p 4294967295 file
[root@server_el7_vm1 lustre]# lfs project file
4294967295 - file

Comment by Matt Rásó-Barnett (Inactive) [ 11/Sep/20 ]

Interesting, I didn't know about 'lfs project' - I get the same issue with that command as well though, just running what you showed exactly.

Perhaps there is something with my setup then, I'm using RHEL 7.8, 3.10.0-1127.8.2.el7_lustre, Lustre 2.12.5. I'll redeploy this filesystem and see if the issue goes away.

Thanks for checking it for me.

Comment by Andreas Dilger [ 12/Sep/20 ]

Matt, there was a patch landed recently that may have hidden this?

commit 3d9900e78e180a211c50ea1030fa147c5a330f22
Author:     Etienne AUJAMES <eaujames@ddn.com>

    LU-12549 utils: Check range of quota ID for "lfs" arguments
    
    strtoul function return a 64bits value on a 64bits system, so an
    overflow occurs when we store user value into a quota/project
    structure.
    
    This commit apply the same 32 bits verification for "lfs" project,
    quota,setquota and find commands on uid, gid and project id arguments.
    
    Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
    Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6
    Reviewed-on: https://review.whamcloud.com/38938

However, that only affects the user tools. If the MDS crashes due to bad input, that should be fixed as well.

Could you please attach reproducer steps and a stack trace, so that the MDS can be suitably hardened.

Comment by Wang Shilong (Inactive) [ 12/Sep/20 ]

Maybe i should try b2_12, at least i tried to revert " LU-12549 utils: Check range of quota ID for "lfs" arguments" on master, could not reproduce the problem.

Comment by Andreas Dilger [ 13/Sep/20 ]

Sorry, I didn't see the vmcore file, it already has the stack:

[ 1160.328702] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e
[ 1160.336576] IP: [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0
[ 1160.343128] PGD 0 
[ 1160.345166] Oops: 0000 [#1] SMP 
[ 1160.460913] CPU: 21 PID: 5663 Comm: mdt01_012  3.10.0-1127.8.2.el7_lustre.x86_64 #1
[ 1160.488092] RIP: 0010:[<ffffffffa14bbe26>]  [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0
[ 1160.575912] Call Trace:
[ 1160.578361]  [<ffffffffa14bf1ac>] __dquot_transfer+0x32c/0x510
[ 1160.630492]  [<ffffffffc16e354f>] osd_transfer_project+0x14f/0x1a0 [osd_ldiskfs]
[ 1160.638470]  [<ffffffffc16e3630>] osd_quota_transfer+0x90/0x230 [osd_ldiskfs]
[ 1160.653446]  [<ffffffffc16f0d3f>] osd_attr_set+0x11f/0xb90 [osd_ldiskfs]
[ 1160.660718]  [<ffffffffc198ab68>] lod_sub_attr_set+0x1c8/0x460 [lod]
[ 1160.675739]  [<ffffffffc197370a>] lod_attr_set+0xba/0x9e0 [lod]
[ 1160.689502]  [<ffffffffc19f24d0>] mdd_attr_set_internal+0x120/0x2a0 [mdd]
[ 1160.696819]  [<ffffffffc19f4f08>] mdd_attr_set+0x928/0xda0 [mdd]
[ 1160.711153]  [<ffffffffc18a4bcb>] mdt_reint_setattr+0x9db/0x1290 [mdt]
[ 1160.718202]  [<ffffffffc18a6963>] mdt_reint_rec+0x83/0x210 [mdt]
[ 1160.724713]  [<ffffffffc1883273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[ 1160.731738]  [<ffffffffc188e6e7>] mdt_reint+0x67/0x140 [mdt]
[ 1160.737905]  [<ffffffffc14799da>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[ 1160.760792]  [<ffffffffc141e48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[ 1160.781758]  [<ffffffffc1421df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
Comment by Etienne Aujames [ 15/Sep/20 ]

Hello,

I have already created a ticket on the subject: LU-13845

(patch: https://review.whamcloud.com/39559)

Comment by Matt Rásó-Barnett (Inactive) [ 15/Sep/20 ]

Ah thanks for pointing that out. Happy for this to be closed as dupe of that.
Cheers,
Matt

Generated at Sat Feb 10 03:05:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.