[LU-12603] Lustre ldlm_request_cancel() bug Created: 29/Jul/19  Updated: 12/Sep/19  Resolved: 07/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: Alibaba Cloud Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File image-2019-07-29-17-31-44-862.png    
Issue Links:
Related
is related to LU-12605 Lustre target_handle_connect() bug Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In the latest version of lustre file system, ptlrpc module has a out-of-read bug due to the lack of validation for specific fields of packets sent by client.

The kernel panic:

[ 7424.506777] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[ 7424.509002] IP: [<ffffffffa7b6b46c>] _raw_spin_lock+0xc/0x30
[ 7424.510926] PGD 0 
[ 7424.512466] Oops: 0002 [#1] SMP 
[ 7424.514098] Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) loop lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crc_t10dif crct10dif_generic ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod nfit libnvdimm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_net virtio_console virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common drm crc32c_intel ata_piix libata
[ 7424.529892]  serio_raw virtio_pci virtio_ring virtio drm_panel_orientation_quirks floppy
[ 7424.532037] CPU: 3 PID: 7206 Comm: mdt00_000 Kdump: loaded Tainted: G           OEL ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
[ 7424.536069] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
[ 7424.538254] task: ffff9447f62f8000 ti: ffff944babae8000 task.ti: ffff944babae8000
[ 7424.540415] RIP: 0010:[<ffffffffa7b6b46c>]  [<ffffffffa7b6b46c>] _raw_spin_lock+0xc/0x30
[ 7424.542720] RSP: 0018:ffff944babaebb78  EFLAGS: 00010246
[ 7424.544683] RAX: 0000000000000000 RBX: ffff944bdb285000 RCX: ffff944bdb2850b0
[ 7424.546806] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000001c
[ 7424.548912] RBP: ffff944babaebb88 R08: 000000000000000a R09: 000000000000fff3
[ 7424.551009] R10: 0000000000000000 R11: ffff944babaeb91e R12: 0000000000000000
[ 7424.553101] R13: ffff944be46c6b78 R14: 0000000000000000 R15: 0000000000000000
[ 7424.555177] FS:  0000000000000000(0000) GS:ffff944bffd80000(0000) knlGS:0000000000000000
[ 7424.557345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7424.559285] CR2: 000000000000001c CR3: 0000000421bb4000 CR4: 00000000003606e0
[ 7424.561342] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7424.563385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7424.565400] Call Trace:
[ 7424.567049]  [<ffffffffc094602c>] ? lock_res_and_lock+0x2c/0x50 [ptlrpc]
[ 7424.569109]  [<ffffffffc09478ea>] __ldlm_handle2lock+0x7a/0x3f0 [ptlrpc]
[ 7424.571135]  [<ffffffffc096fdf2>] ldlm_request_cancel+0x1c2/0x740 [ptlrpc]
[ 7424.573145]  [<ffffffffc0975bc8>] ldlm_handle_enqueue0+0x88/0x15a0 [ptlrpc]
[ 7424.575133]  [<ffffffffc099e520>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
[ 7424.577175]  [<ffffffffc09ff082>] tgt_enqueue+0x62/0x210 [ptlrpc]
[ 7424.579070]  [<ffffffffc0a052ca>] tgt_request_handle+0x91a/0x15c0 [ptlrpc]
[ 7424.581008]  [<ffffffffc05d5fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 7424.582942]  [<ffffffffc09a888e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[ 7424.584940]  [<ffffffffa74cbadb>] ? __wake_up_common+0x5b/0x90
[ 7424.586760]  [<ffffffffc09ac384>] ptlrpc_main+0xbb4/0x20f0 [ptlrpc]
[ 7424.588600]  [<ffffffffa74d08c0>] ? finish_task_switch+0x50/0x1c0
[ 7424.590425]  [<ffffffffc09ab7d0>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc]
[ 7424.592348]  [<ffffffffa74c1c71>] kthread+0xd1/0xe0
[ 7424.594029]  [<ffffffffa74c1ba0>] ? insert_kthread_work+0x40/0x40
[ 7424.595810]  [<ffffffffa7b75c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 7424.597620]  [<ffffffffa74c1ba0>] ? insert_kthread_work+0x40/0x40
[ 7424.599397] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 40 1b ff ff 5d 
[ 7424.604505] RIP  [<ffffffffa7b6b46c>] _raw_spin_lock+0xc/0x30
[ 7424.606300]  RSP <ffff944babaebb78>
[ 7424.607857] CR2: 000000000000001c

In the 'ldl_request_cancel' function of the 'ptlrpc' module, the 'lock_count' parameter obtained from the 'dlm_req' structure don't have a boundary check, and directly accesses the 'lock_handle' array as a index, resulting in an out-of-bounds access of the array.

The 'lock_count' parameter is derived from the lustre request packet. The attacker can modify the 'Lock Count' parameter in the 'ldlm request' section of the lustre packet whose request is 'LDLM_ENQUEUE' to a larger value (such as 0x41414141), causing server crash.

The backtrace:

ptlrpc_main-> ptlrpc_main -> ptlrpc_sever_handle_request -> tgt_request_handle -> tgt_enqueue -> ldlm_handle_enqueue0 -> ldlm_request_cancel

 



 Comments   
Comment by Andreas Dilger [ 01/Aug/19 ]

Please add "Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>" to the patch commit message.

Comment by Gerrit Updater [ 17/Aug/19 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35806
Subject: LU-12603 ldlm: Check cancel lock count for correctness
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4236dc5437ecb194b47249fb5b38fe5c0199df20

Comment by Gerrit Updater [ 07/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35806/
Subject: LU-12603 ldlm: Check cancel lock count for correctness
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7cc43aef98f6a759cbc5ae572123b44803c0ccd2

Comment by Peter Jones [ 07/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 09/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36108
Subject: LU-12603 ldlm: Check cancel lock count for correctness
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: afd58cde602d0099c9ba96fd1ab8234fa7a07994

Comment by Gerrit Updater [ 12/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36108/
Subject: LU-12603 ldlm: Check cancel lock count for correctness
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 8dd7cf809c08ac8bdb93599bb3c4ea84693941a3

Generated at Sat Feb 10 02:54:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.