Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13956

crash - kernel NULL pointer deference when setting project id to 4294967295

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.12.5
    • None
    • kernel: 3.10.0-1127.8.2.el7_lustre
      e2fsprogs:
    • 9223372036854775807

    Description

      Hello,
      This is likely not very important as it's a contrived situation, but I am reliably able to crash an MDS running 2.12.5, by setting the project id on a file to '4294967295'.

      I only stumbled on it, as I was curious what the upper limit of project IDs would be, so tried this value and get a MDS crash.

      I attach the vmcore-dmesg.txt file - I can supply a vmcore file too if requested. Is this a kernel issue rather than a lustre issue?

      Obviously this isn't a major issue, but I just thought I'd raise the bug report in case it's a simple fix.

      Cheers,
      Matt

      Attachments

        Issue Links

          Activity

            [LU-13956] crash - kernel NULL pointer deference when setting project id to 4294967295
            eaujames Etienne Aujames made changes -
            Link New: This issue is duplicated by LU-13845 [ LU-13845 ]
            pjones Peter Jones made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Ah thanks for pointing that out. Happy for this to be closed as dupe of that.
            Cheers,
            Matt

            mrb Matt Rásó-Barnett (Inactive) added a comment - Ah thanks for pointing that out. Happy for this to be closed as dupe of that. Cheers, Matt
            eaujames Etienne Aujames added a comment - - edited

            Hello,

            I have already created a ticket on the subject: LU-13845

            (patch: https://review.whamcloud.com/39559)

            eaujames Etienne Aujames added a comment - - edited Hello, I have already created a ticket on the subject: LU-13845 (patch: https://review.whamcloud.com/39559 )

            Sorry, I didn't see the vmcore file, it already has the stack:

            [ 1160.328702] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e
            [ 1160.336576] IP: [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0
            [ 1160.343128] PGD 0 
            [ 1160.345166] Oops: 0000 [#1] SMP 
            [ 1160.460913] CPU: 21 PID: 5663 Comm: mdt01_012  3.10.0-1127.8.2.el7_lustre.x86_64 #1
            [ 1160.488092] RIP: 0010:[<ffffffffa14bbe26>]  [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0
            [ 1160.575912] Call Trace:
            [ 1160.578361]  [<ffffffffa14bf1ac>] __dquot_transfer+0x32c/0x510
            [ 1160.630492]  [<ffffffffc16e354f>] osd_transfer_project+0x14f/0x1a0 [osd_ldiskfs]
            [ 1160.638470]  [<ffffffffc16e3630>] osd_quota_transfer+0x90/0x230 [osd_ldiskfs]
            [ 1160.653446]  [<ffffffffc16f0d3f>] osd_attr_set+0x11f/0xb90 [osd_ldiskfs]
            [ 1160.660718]  [<ffffffffc198ab68>] lod_sub_attr_set+0x1c8/0x460 [lod]
            [ 1160.675739]  [<ffffffffc197370a>] lod_attr_set+0xba/0x9e0 [lod]
            [ 1160.689502]  [<ffffffffc19f24d0>] mdd_attr_set_internal+0x120/0x2a0 [mdd]
            [ 1160.696819]  [<ffffffffc19f4f08>] mdd_attr_set+0x928/0xda0 [mdd]
            [ 1160.711153]  [<ffffffffc18a4bcb>] mdt_reint_setattr+0x9db/0x1290 [mdt]
            [ 1160.718202]  [<ffffffffc18a6963>] mdt_reint_rec+0x83/0x210 [mdt]
            [ 1160.724713]  [<ffffffffc1883273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
            [ 1160.731738]  [<ffffffffc188e6e7>] mdt_reint+0x67/0x140 [mdt]
            [ 1160.737905]  [<ffffffffc14799da>] tgt_request_handle+0xada/0x1570 [ptlrpc]
            [ 1160.760792]  [<ffffffffc141e48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
            [ 1160.781758]  [<ffffffffc1421df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
            
            adilger Andreas Dilger added a comment - Sorry, I didn't see the vmcore file, it already has the stack: [ 1160.328702] BUG: unable to handle kernel NULL pointer dereference at 000000000000007e [ 1160.336576] IP: [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0 [ 1160.343128] PGD 0 [ 1160.345166] Oops: 0000 [#1] SMP [ 1160.460913] CPU: 21 PID: 5663 Comm: mdt01_012 3.10.0-1127.8.2.el7_lustre.x86_64 #1 [ 1160.488092] RIP: 0010:[<ffffffffa14bbe26>] [<ffffffffa14bbe26>] check_idq.constprop.23+0x16/0x1c0 [ 1160.575912] Call Trace: [ 1160.578361] [<ffffffffa14bf1ac>] __dquot_transfer+0x32c/0x510 [ 1160.630492] [<ffffffffc16e354f>] osd_transfer_project+0x14f/0x1a0 [osd_ldiskfs] [ 1160.638470] [<ffffffffc16e3630>] osd_quota_transfer+0x90/0x230 [osd_ldiskfs] [ 1160.653446] [<ffffffffc16f0d3f>] osd_attr_set+0x11f/0xb90 [osd_ldiskfs] [ 1160.660718] [<ffffffffc198ab68>] lod_sub_attr_set+0x1c8/0x460 [lod] [ 1160.675739] [<ffffffffc197370a>] lod_attr_set+0xba/0x9e0 [lod] [ 1160.689502] [<ffffffffc19f24d0>] mdd_attr_set_internal+0x120/0x2a0 [mdd] [ 1160.696819] [<ffffffffc19f4f08>] mdd_attr_set+0x928/0xda0 [mdd] [ 1160.711153] [<ffffffffc18a4bcb>] mdt_reint_setattr+0x9db/0x1290 [mdt] [ 1160.718202] [<ffffffffc18a6963>] mdt_reint_rec+0x83/0x210 [mdt] [ 1160.724713] [<ffffffffc1883273>] mdt_reint_internal+0x6e3/0xaf0 [mdt] [ 1160.731738] [<ffffffffc188e6e7>] mdt_reint+0x67/0x140 [mdt] [ 1160.737905] [<ffffffffc14799da>] tgt_request_handle+0xada/0x1570 [ptlrpc] [ 1160.760792] [<ffffffffc141e48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 1160.781758] [<ffffffffc1421df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]

            Maybe i should try b2_12, at least i tried to revert " LU-12549 utils: Check range of quota ID for "lfs" arguments" on master, could not reproduce the problem.

            wshilong Wang Shilong (Inactive) added a comment - Maybe i should try b2_12, at least i tried to revert " LU-12549 utils: Check range of quota ID for "lfs" arguments" on master, could not reproduce the problem.
            adilger Andreas Dilger made changes -
            Priority Original: Minor [ 4 ] New: Major [ 3 ]

            Matt, there was a patch landed recently that may have hidden this?

            commit 3d9900e78e180a211c50ea1030fa147c5a330f22
            Author:     Etienne AUJAMES <eaujames@ddn.com>
            
                LU-12549 utils: Check range of quota ID for "lfs" arguments
                
                strtoul function return a 64bits value on a 64bits system, so an
                overflow occurs when we store user value into a quota/project
                structure.
                
                This commit apply the same 32 bits verification for "lfs" project,
                quota,setquota and find commands on uid, gid and project id arguments.
                
                Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
                Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6
                Reviewed-on: https://review.whamcloud.com/38938
            

            However, that only affects the user tools. If the MDS crashes due to bad input, that should be fixed as well.

            Could you please attach reproducer steps and a stack trace, so that the MDS can be suitably hardened.

            adilger Andreas Dilger added a comment - Matt, there was a patch landed recently that may have hidden this? commit 3d9900e78e180a211c50ea1030fa147c5a330f22 Author: Etienne AUJAMES <eaujames@ddn.com> LU-12549 utils: Check range of quota ID for "lfs" arguments strtoul function return a 64bits value on a 64bits system, so an overflow occurs when we store user value into a quota/project structure. This commit apply the same 32 bits verification for "lfs" project, quota,setquota and find commands on uid, gid and project id arguments. Signed-off-by: Etienne AUJAMES <eaujames@ddn.com> Change-Id: I809e9ac55d4bc676c20b18c6c198a69eaba9cff6 Reviewed-on: https://review.whamcloud.com/38938 However, that only affects the user tools. If the MDS crashes due to bad input, that should be fixed as well. Could you please attach reproducer steps and a stack trace, so that the MDS can be suitably hardened.

            Interesting, I didn't know about 'lfs project' - I get the same issue with that command as well though, just running what you showed exactly.

            Perhaps there is something with my setup then, I'm using RHEL 7.8, 3.10.0-1127.8.2.el7_lustre, Lustre 2.12.5. I'll redeploy this filesystem and see if the issue goes away.

            Thanks for checking it for me.

            mrb Matt Rásó-Barnett (Inactive) added a comment - Interesting, I didn't know about 'lfs project' - I get the same issue with that command as well though, just running what you showed exactly. Perhaps there is something with my setup then, I'm using RHEL 7.8, 3.10.0-1127.8.2.el7_lustre, Lustre 2.12.5. I'll redeploy this filesystem and see if the issue goes away. Thanks for checking it for me.

            Would you mind sharing steps to reproduce the problem:

            It looks working for me:
            [root@server_el7_vm1 lustre]# lfs project -p 4294967295 file
            [root@server_el7_vm1 lustre]# lfs project file
            4294967295 - file

            wshilong Wang Shilong (Inactive) added a comment - Would you mind sharing steps to reproduce the problem: It looks working for me: [root@server_el7_vm1 lustre] # lfs project -p 4294967295 file [root@server_el7_vm1 lustre] # lfs project file 4294967295 - file

            People

              wshilong Wang Shilong (Inactive)
              mrb Matt Rásó-Barnett (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: