Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13956

crash - kernel NULL pointer deference when setting project id to 4294967295

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.12.5
    • None
    • kernel: 3.10.0-1127.8.2.el7_lustre
      e2fsprogs:
    • 9223372036854775807

    Description

      Hello,
      This is likely not very important as it's a contrived situation, but I am reliably able to crash an MDS running 2.12.5, by setting the project id on a file to '4294967295'.

      I only stumbled on it, as I was curious what the upper limit of project IDs would be, so tried this value and get a MDS crash.

      I attach the vmcore-dmesg.txt file - I can supply a vmcore file too if requested. Is this a kernel issue rather than a lustre issue?

      Obviously this isn't a major issue, but I just thought I'd raise the bug report in case it's a simple fix.

      Cheers,
      Matt

      Attachments

        Issue Links

          Activity

            [LU-13956] crash - kernel NULL pointer deference when setting project id to 4294967295

            Ah thanks for pointing that out. Happy for this to be closed as dupe of that.
            Cheers,
            Matt

            mrb Matt Rásó-Barnett (Inactive) added a comment - Ah thanks for pointing that out. Happy for this to be closed as dupe of that. Cheers, Matt
            eaujames Etienne Aujames added a comment - - edited

            Hello,

            I have already created a ticket on the subject: LU-13845

            (patch: https://review.whamcloud.com/39559)

            eaujames Etienne Aujames added a comment - - edited Hello, I have already created a ticket on the subject: LU-13845 (patch: https://review.whamcloud.com/39559 )

            Sorry, I didn't see the vmcore file, it already has the stack:

            [ 1160.328702] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e
            [ 1160.336576] IP: [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0
            [ 1160.343128] PGD 0 
            [ 1160.345166] Oops: 0000 [#1] SMP 
            [ 1160.460913] CPU: 21 PID: 5663 Comm: mdt01_012  3.10.0-1127.8.2.el7_lustre.x86_64 #1
            [ 1160.488092] RIP: 0010:[<ffffffffa14bbe26>]  [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0
            [ 1160.575912] Call Trace:
            [ 1160.578361]  [<ffffffffa14bf1ac>] __dquot_transfer+0x32c/0x510
            [ 1160.630492]  [<ffffffffc16e354f>] osd_transfer_project+0x14f/0x1a0 [osd_ldiskfs]
            [ 1160.638470]  [<ffffffffc16e3630>] osd_quota_transfer+0x90/0x230 [osd_ldiskfs]
            [ 1160.653446]  [<ffffffffc16f0d3f>] osd_attr_set+0x11f/0xb90 [osd_ldiskfs]
            [ 1160.660718]  [<ffffffffc198ab68>] lod_sub_attr_set+0x1c8/0x460 [lod]
            [ 1160.675739]  [<ffffffffc197370a>] lod_attr_set+0xba/0x9e0 [lod]
            [ 1160.689502]  [<ffffffffc19f24d0>] mdd_attr_set_internal+0x120/0x2a0 [mdd]
            [ 1160.696819]  [<ffffffffc19f4f08>] mdd_attr_set+0x928/0xda0 [mdd]
            [ 1160.711153]  [<ffffffffc18a4bcb>] mdt_reint_setattr+0x9db/0x1290 [mdt]
            [ 1160.718202]  [<ffffffffc18a6963>] mdt_reint_rec+0x83/0x210 [mdt]
            [ 1160.724713]  [<ffffffffc1883273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
            [ 1160.731738]  [<ffffffffc188e6e7>] mdt_reint+0x67/0x140 [mdt]
            [ 1160.737905]  [<ffffffffc14799da>] tgt_request_handle+0xada/0x1570 [ptlrpc]
            [ 1160.760792]  [<ffffffffc141e48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
            [ 1160.781758]  [<ffffffffc1421df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
            
            adilger Andreas Dilger added a comment - Sorry, I didn't see the vmcore file, it already has the stack: [ 1160.328702] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e [ 1160.336576] IP: [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0 [ 1160.343128] PGD 0 [ 1160.345166] Oops: 0000 [#1] SMP [ 1160.460913] CPU: 21 PID: 5663 Comm: mdt01_012 3.10.0-1127.8.2.el7_lustre.x86_64 #1 [ 1160.488092] RIP: 0010:[<ffffffffa14bbe26>] [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0 [ 1160.575912] Call Trace: [ 1160.578361] [<ffffffffa14bf1ac>] __dquot_transfer+0x32c/0x510 [ 1160.630492] [<ffffffffc16e354f>] osd_transfer_project+0x14f/0x1a0 [osd_ldiskfs] [ 1160.638470] [<ffffffffc16e3630>] osd_quota_transfer+0x90/0x230 [osd_ldiskfs] [ 1160.653446] [<ffffffffc16f0d3f>] osd_attr_set+0x11f/0xb90 [osd_ldiskfs] [ 1160.660718] [<ffffffffc198ab68>] lod_sub_attr_set+0x1c8/0x460 [lod] [ 1160.675739] [<ffffffffc197370a>] lod_attr_set+0xba/0x9e0 [lod] [ 1160.689502] [<ffffffffc19f24d0>] mdd_attr_set_internal+0x120/0x2a0 [mdd] [ 1160.696819] [<ffffffffc19f4f08>] mdd_attr_set+0x928/0xda0 [mdd] [ 1160.711153] [<ffffffffc18a4bcb>] mdt_reint_setattr+0x9db/0x1290 [mdt] [ 1160.718202] [<ffffffffc18a6963>] mdt_reint_rec+0x83/0x210 [mdt] [ 1160.724713] [<ffffffffc1883273>] mdt_reint_internal+0x6e3/0xaf0 [mdt] [ 1160.731738] [<ffffffffc188e6e7>] mdt_reint+0x67/0x140 [mdt] [ 1160.737905] [<ffffffffc14799da>] tgt_request_handle+0xada/0x1570 [ptlrpc] [ 1160.760792] [<ffffffffc141e48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 1160.781758] [<ffffffffc1421df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]

            Maybe i should try b2_12, at least i tried to revert " LU-12549 utils: Check range of quota ID for "lfs" arguments" on master, could not reproduce the problem.

            wshilong Wang Shilong (Inactive) added a comment - Maybe i should try b2_12, at least i tried to revert " LU-12549 utils: Check range of quota ID for "lfs" arguments" on master, could not reproduce the problem.

            Matt, there was a patch landed recently that may have hidden this?

            commit 3d9900e78e180a211c50ea1030fa147c5a330f22
            Author:     Etienne AUJAMES <eaujames@ddn.com>
            
                LU-12549 utils: Check range of quota ID for "lfs" arguments
                
                strtoul function return a 64bits value on a 64bits system, so an
                overflow occurs when we store user value into a quota/project
                structure.
                
                This commit apply the same 32 bits verification for "lfs" project,
                quota,setquota and find commands on uid, gid and project id arguments.
                
                Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
                Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6
                Reviewed-on: https://review.whamcloud.com/38938
            

            However, that only affects the user tools. If the MDS crashes due to bad input, that should be fixed as well.

            Could you please attach reproducer steps and a stack trace, so that the MDS can be suitably hardened.

            adilger Andreas Dilger added a comment - Matt, there was a patch landed recently that may have hidden this? commit 3d9900e78e180a211c50ea1030fa147c5a330f22 Author: Etienne AUJAMES <eaujames@ddn.com> LU-12549 utils: Check range of quota ID for "lfs" arguments strtoul function return a 64bits value on a 64bits system, so an overflow occurs when we store user value into a quota/project structure. This commit apply the same 32 bits verification for "lfs" project, quota,setquota and find commands on uid, gid and project id arguments. Signed-off-by: Etienne AUJAMES <eaujames@ddn.com> Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6 Reviewed-on: https://review.whamcloud.com/38938 However, that only affects the user tools. If the MDS crashes due to bad input, that should be fixed as well. Could you please attach reproducer steps and a stack trace, so that the MDS can be suitably hardened.

            Interesting, I didn't know about 'lfs project' - I get the same issue with that command as well though, just running what you showed exactly.

            Perhaps there is something with my setup then, I'm using RHEL 7.8, 3.10.0-1127.8.2.el7_lustre, Lustre 2.12.5. I'll redeploy this filesystem and see if the issue goes away.

            Thanks for checking it for me.

            mrb Matt Rásó-Barnett (Inactive) added a comment - Interesting, I didn't know about 'lfs project' - I get the same issue with that command as well though, just running what you showed exactly. Perhaps there is something with my setup then, I'm using RHEL 7.8, 3.10.0-1127.8.2.el7_lustre, Lustre 2.12.5. I'll redeploy this filesystem and see if the issue goes away. Thanks for checking it for me.

            Would you mind sharing steps to reproduce the problem:

            It looks working for me:
            [root@server_el7_vm1 lustre]# lfs project -p 4294967295 file
            [root@server_el7_vm1 lustre]# lfs project file
            4294967295 - file

            wshilong Wang Shilong (Inactive) added a comment - Would you mind sharing steps to reproduce the problem: It looks working for me: [root@server_el7_vm1 lustre] # lfs project -p 4294967295 file [root@server_el7_vm1 lustre] # lfs project file 4294967295 - file
            pjones Peter Jones added a comment -

            Shilong

            As Matt suggests, this is relatively low priority but is likely a simple thing to tidy up

            Peter

            pjones Peter Jones added a comment - Shilong As Matt suggests, this is relatively low priority but is likely a simple thing to tidy up Peter

            People

              wshilong Wang Shilong (Inactive)
              mrb Matt Rásó-Barnett (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: