Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12603

Lustre ldlm_request_cancel() bug

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.13.0
    • Fix Version/s: Lustre 2.13.0, Lustre 2.12.3
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      In the latest version of lustre file system, ptlrpc module has a out-of-read bug due to the lack of validation for specific fields of packets sent by client.

      The kernel panic:

      [ 7424.506777] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      [ 7424.509002] IP: [<ffffffffa7b6b46c>] _raw_spin_lock+0xc/0x30
      [ 7424.510926] PGD 0 
      [ 7424.512466] Oops: 0002 [#1] SMP 
      [ 7424.514098] Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) loop lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crc_t10dif crct10dif_generic ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod nfit libnvdimm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_net virtio_console virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common drm crc32c_intel ata_piix libata
      [ 7424.529892]  serio_raw virtio_pci virtio_ring virtio drm_panel_orientation_quirks floppy
      [ 7424.532037] CPU: 3 PID: 7206 Comm: mdt00_000 Kdump: loaded Tainted: G           OEL ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
      [ 7424.536069] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
      [ 7424.538254] task: ffff9447f62f8000 ti: ffff944babae8000 task.ti: ffff944babae8000
      [ 7424.540415] RIP: 0010:[<ffffffffa7b6b46c>]  [<ffffffffa7b6b46c>] _raw_spin_lock+0xc/0x30
      [ 7424.542720] RSP: 0018:ffff944babaebb78  EFLAGS: 00010246
      [ 7424.544683] RAX: 0000000000000000 RBX: ffff944bdb285000 RCX: ffff944bdb2850b0
      [ 7424.546806] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000001c
      [ 7424.548912] RBP: ffff944babaebb88 R08: 000000000000000a R09: 000000000000fff3
      [ 7424.551009] R10: 0000000000000000 R11: ffff944babaeb91e R12: 0000000000000000
      [ 7424.553101] R13: ffff944be46c6b78 R14: 0000000000000000 R15: 0000000000000000
      [ 7424.555177] FS:  0000000000000000(0000) GS:ffff944bffd80000(0000) knlGS:0000000000000000
      [ 7424.557345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 7424.559285] CR2: 000000000000001c CR3: 0000000421bb4000 CR4: 00000000003606e0
      [ 7424.561342] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 7424.563385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 7424.565400] Call Trace:
      [ 7424.567049]  [<ffffffffc094602c>] ? lock_res_and_lock+0x2c/0x50 [ptlrpc]
      [ 7424.569109]  [<ffffffffc09478ea>] __ldlm_handle2lock+0x7a/0x3f0 [ptlrpc]
      [ 7424.571135]  [<ffffffffc096fdf2>] ldlm_request_cancel+0x1c2/0x740 [ptlrpc]
      [ 7424.573145]  [<ffffffffc0975bc8>] ldlm_handle_enqueue0+0x88/0x15a0 [ptlrpc]
      [ 7424.575133]  [<ffffffffc099e520>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [ 7424.577175]  [<ffffffffc09ff082>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [ 7424.579070]  [<ffffffffc0a052ca>] tgt_request_handle+0x91a/0x15c0 [ptlrpc]
      [ 7424.581008]  [<ffffffffc05d5fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 7424.582942]  [<ffffffffc09a888e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
      [ 7424.584940]  [<ffffffffa74cbadb>] ? __wake_up_common+0x5b/0x90
      [ 7424.586760]  [<ffffffffc09ac384>] ptlrpc_main+0xbb4/0x20f0 [ptlrpc]
      [ 7424.588600]  [<ffffffffa74d08c0>] ? finish_task_switch+0x50/0x1c0
      [ 7424.590425]  [<ffffffffc09ab7d0>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc]
      [ 7424.592348]  [<ffffffffa74c1c71>] kthread+0xd1/0xe0
      [ 7424.594029]  [<ffffffffa74c1ba0>] ? insert_kthread_work+0x40/0x40
      [ 7424.595810]  [<ffffffffa7b75c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 7424.597620]  [<ffffffffa74c1ba0>] ? insert_kthread_work+0x40/0x40
      [ 7424.599397] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 40 1b ff ff 5d 
      [ 7424.604505] RIP  [<ffffffffa7b6b46c>] _raw_spin_lock+0xc/0x30
      [ 7424.606300]  RSP <ffff944babaebb78>
      [ 7424.607857] CR2: 000000000000001c
      

      In the 'ldl_request_cancel' function of the 'ptlrpc' module, the 'lock_count' parameter obtained from the 'dlm_req' structure don't have a boundary check, and directly accesses the 'lock_handle' array as a index, resulting in an out-of-bounds access of the array.

      The 'lock_count' parameter is derived from the lustre request packet. The attacker can modify the 'Lock Count' parameter in the 'ldlm request' section of the lustre packet whose request is 'LDLM_ENQUEUE' to a larger value (such as 0x41414141), causing server crash.

      The backtrace:

      ptlrpc_main-> ptlrpc_main -> ptlrpc_sever_handle_request -> tgt_request_handle -> tgt_enqueue -> ldlm_handle_enqueue0 -> ldlm_request_cancel

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                green Oleg Drokin
                Reporter:
                yunye.ry Alibaba Cloud
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: