Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      In the latest version of lustre file system, the mdt module has a null pointer dereference bug due to the lack of validation for specific fields of packets sent by client.

      The kernel panic:

       

      [93830.742518] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [93830.745373] IP: [<ffffffffc101cc2a>] mdt_object_lock_internal+0x4a/0x360 [mdt]
      [93830.748121] PGD 0
      [93830.750311] Oops: 0000 [#1] SMP
      [93830.752631] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) loop lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crc_t10dif crct10dif_generic ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod nfit libnvdimm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ppdev ablk_helper parport_pc cryptd parport virtio_balloon joydev i2c_piix4 pcspkr ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_net virtio_console virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix crct10dif_pclmul drm crct10dif_common crc32c_intel libata serio_raw virtio_pci virtio_ring virtio drm_panel_orientation_quirks floppy [last unloaded: stap_f794b15c5a4eaa0a0f39cf777c459f09_5532]
      [93830.772472]
      [93830.774428] CPU: 0 PID: 5222 Comm: mdt00_004 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
      [93830.779447] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
      [93830.782131] task: ffff9e93fb009040 ti: ffff9e90369d8000 task.ti: ffff9e90369d8000
      [93830.784762] RIP: 0010:[<ffffffffc101cc2a>]  [<ffffffffc101cc2a>] mdt_object_lock_internal+0x4a/0x360 [mdt]
      [93830.787609] RSP: 0018:ffff9e90369dba80  EFLAGS: 00010246
      [93830.789999] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9e90369dbae8
      [93830.792546] RDX: ffff9e93ef99a0f0 RSI: 0000000000000000 RDI: ffff9e93ef99a000
      [93830.795054] RBP: ffff9e90369dbad0 R08: 0000000000000000 R09: 0000000000000000
      [93830.797536] R10: ffff9e9400aa3c00 R11: 0000000000000020 R12: ffff9e93ef99a0f0
      [93830.799980] R13: ffff9e93ef99a000 R14: ffff9e90369dbae8 R15: 0000000000000000
      [93830.802402] FS:  0000000000000000(0000) GS:ffff9e943fc00000(0000) knlGS:0000000000000000
      [93830.804901] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [93830.807152] CR2: 0000000000000050 CR3: 0000000422b0e000 CR4: 00000000003606f0
      [93830.809521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [93830.811846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [93830.814138] Call Trace:
      [93830.816037]  [<ffffffffc094c2dc>] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc]
      [93830.818297]  [<ffffffffc1025f33>] mdt_intent_getxattr+0xb3/0x2c0 [mdt]
      [93830.820529]  [<ffffffffc10229e4>] mdt_intent_policy+0x2d4/0xdd0 [mdt]
      [93830.822702]  [<ffffffffc1025e80>] ? mdt_intent_getattr+0x480/0x480 [mdt]
      [93830.824902]  [<ffffffffc08ffc66>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
      [93830.827040]  [<ffffffffc05da3d3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
      [93830.829184]  [<ffffffffc05dd96e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [93830.831278]  [<ffffffffc0928587>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [93830.833470]  [<ffffffffc0950520>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [93830.835699]  [<ffffffffc09b1082>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [93830.837786]  [<ffffffffc09b72ca>] tgt_request_handle+0x91a/0x15c0 [ptlrpc]
      [93830.839875]  [<ffffffffc05d6fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [93830.841956]  [<ffffffffc095a88e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
      [93830.844066]  [<ffffffffb1ccbadb>] ? __wake_up_common+0x5b/0x90
      [93830.846014]  [<ffffffffc095e384>] ptlrpc_main+0xbb4/0x20f0 [ptlrpc]
      [93830.847946]  [<ffffffffb1cd08c0>] ? finish_task_switch+0x50/0x1c0
      [93830.849866]  [<ffffffffc095d7d0>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc]
      [93830.851842]  [<ffffffffb1cc1c71>] kthread+0xd1/0xe0
      [93830.853559]  [<ffffffffb1cc1ba0>] ? insert_kthread_work+0x40/0x40
      [93830.855362]  [<ffffffffb2375c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [93830.857178]  [<ffffffffb1cc1ba0>] ? insert_kthread_work+0x40/0x40
      [93830.858951] Code: 89 f3 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 f6 05 bd a3 5d ff 01 74 0d f6 05 b8 a3 5d ff 04 0f 85 7e 00 00 00 <48> 8b 43 50 f6 40 1c 02 0f 85 d6 00 00 00 45 0f b6 c9 4d 89 f8 
      [93830.864078] RIP  [<ffffffffc101cc2a>] mdt_object_lock_internal+0x4a/0x360 [mdt]
      [93830.866025]  RSP <ffff9e90369dba80>
      [93830.867566] CR2: 0000000000000050
      

      In function mdt_object_remote(), there is no check about the pointer o, if the value of o is null, a null pointer dereference bug will happen.

      static inline int mdt_object_remote(const struct mdt_object *o){ 
      return lu_object_remote(&o->mot_obj);
      }
      

      Attachments

        Issue Links

          Activity

            [LU-12615] Lustre mdt_object_remote() bug

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35869/
            Subject: LU-12615 mdt: check mdt_object
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 41e2bd9752851a7f3989c6edfe2134742d0ab59f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35869/ Subject: LU-12615 mdt: check mdt_object Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 41e2bd9752851a7f3989c6edfe2134742d0ab59f

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35869
            Subject: LU-12615 mdt: check mdt_object
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: f562e6cea802aa408bd94c84427e983eeae55730

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35869 Subject: LU-12615 mdt: check mdt_object Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: f562e6cea802aa408bd94c84427e983eeae55730
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35764/
            Subject: LU-12615 mdt: check mdt_object
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e5e0bdb7a5c2d47ceaa2d1c190806d1be4999129

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35764/ Subject: LU-12615 mdt: check mdt_object Project: fs/lustre-release Branch: master Current Patch Set: Commit: e5e0bdb7a5c2d47ceaa2d1c190806d1be4999129

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35764
            Subject: LU-12615 mdt: check mdt_object
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9ad3ae7465eaedcc86772c2518e88def4b18b8e8

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35764 Subject: LU-12615 mdt: check mdt_object Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9ad3ae7465eaedcc86772c2518e88def4b18b8e8

            Hongchao, this is being induced by specially-crafted RPC requests to test error handling on the server.

            It looks like this is via:

            mdt_intent_getxattr()
              ->mdt_object_lock(info->mti_object)
                ->mdt_object_lock_internal()
                  ->mdt_object_remote()
                    ->lu_object_remote(&o->mot_obj)
            

            The info->mti_object field is assigned in mdt_body_unpack(), but that is checking !IS_ERR(obj) before assigning it. It looks like the problem is caused by the request not containing the OBD_MD_FLID flag to indicate that the mbo_fid1 field is valid, which causes mdt_body_unpack() to return without setting mti_object, without returning an error. For some RPCs (e.g. mdt_sync()) there is an MDT_BODY field, but is not a requirement for OBD_MD_FLFID to be set because it can be used for both whole-device sync or single-file sync, so this is correct behavior for mdt_body_unpack(). The other place that mti_object is assigned is tsi2mdt_info() using tsi_corpus, but that is itself set in mdt_body_unpack() so is the same root cause.

            I checked all of the mdt_reint_* routines and they appear to handle the error from mdt_object_find() properly, so you should focus on other places that are using mti_object directly. Looking at mdt_sync(), the handling is a bit strange because it is checking if mbo_fid1 is unset for the whole-device sync, then assumes mti_object is OK otherwise (which it may not be if OBD_MD_FLFID is unset).

            I think there are two potential ways to handle this:

            • add checks for mti_object == NULL at the start of high-level RPC handling routines mdt_intent_getxattr(), mdt_getattr(), mdt_getxattr(), mdt_swap_layouts(), and mdt_sync() (some of which LASSERT(obj != NULL) or equivalent, and should instead just return -EPROTO in that case)
            • add checks in low-level routines like mdt_object_remote(), mdt_object_exists(), mdt_object_child(), mdt_object_fid() that are accessing mot_obj, but those functions are called in a lot of places and if mti_object is checked once at the top level then we don't need to check it repeatedly, so I'd rather not do this.
            adilger Andreas Dilger added a comment - Hongchao, this is being induced by specially-crafted RPC requests to test error handling on the server. It looks like this is via: mdt_intent_getxattr() ->mdt_object_lock(info->mti_object) ->mdt_object_lock_internal() ->mdt_object_remote() ->lu_object_remote(&o->mot_obj) The info->mti_object field is assigned in mdt_body_unpack() , but that is checking !IS_ERR(obj) before assigning it. It looks like the problem is caused by the request not containing the OBD_MD_FLID flag to indicate that the mbo_fid1 field is valid, which causes mdt_body_unpack() to return without setting mti_object , without returning an error. For some RPCs (e.g. mdt_sync() ) there is an MDT_BODY field, but is not a requirement for OBD_MD_FLFID to be set because it can be used for both whole-device sync or single-file sync, so this is correct behavior for mdt_body_unpack() . The other place that mti_object is assigned is tsi2mdt_info() using tsi_corpus , but that is itself set in mdt_body_unpack() so is the same root cause. I checked all of the mdt_reint_* routines and they appear to handle the error from mdt_object_find() properly, so you should focus on other places that are using mti_object directly. Looking at mdt_sync() , the handling is a bit strange because it is checking if mbo_fid1 is unset for the whole-device sync, then assumes mti_object is OK otherwise (which it may not be if OBD_MD_FLFID is unset). I think there are two potential ways to handle this: add checks for mti_object == NULL at the start of high-level RPC handling routines mdt_intent_getxattr() , mdt_getattr() , mdt_getxattr() , mdt_swap_layouts() , and mdt_sync() (some of which LASSERT(obj != NULL) or equivalent, and should instead just return -EPROTO in that case) add checks in low-level routines like mdt_object_remote() , mdt_object_exists() , mdt_object_child() , mdt_object_fid() that are accessing mot_obj , but those functions are called in a lot of places and if mti_object is checked once at the top level then we don't need to check it repeatedly, so I'd rather not do this.

            Hi,
            What is the client version?
            if the mdt_object is NULL, there could be two cases, the first is there is no MDT_BODY field in the request,
            the second is no OBD_MD_FLID flag is set on "body->mbo_valid".

            btw, is the kernel dump available?
            Thanks!

            hongchao.zhang Hongchao Zhang added a comment - Hi, What is the client version? if the mdt_object is NULL, there could be two cases, the first is there is no MDT_BODY field in the request, the second is no OBD_MD_FLID flag is set on "body->mbo_valid". btw, is the kernel dump available? Thanks!
            pjones Peter Jones added a comment -

            Hongchao

            Could you please investigate

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please investigate Thanks Peter

            People

              hongchao.zhang Hongchao Zhang
              yunye.ry Alibaba Cloud (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: