[LU-12614] Lustre ldlm_cancel_hpreq_check() bug Created: 30/Jul/19  Updated: 16/Mar/20  Resolved: 03/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: Alibaba Cloud Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In the latest version of lustre file system, ptlrpc module has a out-of-access bug due to the lack of validation for specific fields of packets sent by client.

The kernel panic:

[50920.983447] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[50920.986242] IP: [<ffffffffa6b6b46c>] _raw_spin_lock+0xc/0x30
[50920.988767] PGD 0 
[50920.990411] Oops: 0002 [#1] SMP 
[50920.992116] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) loop lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crc_t10dif crct10dif_generic ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc dm_mod nfit libnvdimm iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev joydev virtio_balloon parport_pc parport i2c_piix4 pcspkr ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_console virtio_net virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common ata_piix drm crc32c_intel libata
[50921.008352]  serio_raw virtio_pci virtio_ring virtio drm_panel_orientation_quirks floppy
[50921.010518] CPU: 0 PID: 15708 Comm: ldlm_cn00_000 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
[50921.014624] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
[50921.016833] task: ffff8bfdb0600000 ti: ffff8bfde97a4000 task.ti: ffff8bfde97a4000
[50921.019010] RIP: 0010:[<ffffffffa6b6b46c>]  [<ffffffffa6b6b46c>] _raw_spin_lock+0xc/0x30
[50921.021257] RSP: 0018:ffff8bfde97a7d88  EFLAGS: 00010246
[50921.023228] RAX: 0000000000000000 RBX: ffff8bfdd238cc00 RCX: ffff8bfdd238ccb0
[50921.025367] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000001c
[50921.027495] RBP: ffff8bfde97a7d98 R08: 00000000000000b8 R09: 00000000000000e0
[50921.029607] R10: ffffffffc0756100 R11: 0000000000000000 R12: 0000000000000000
[50921.031710] R13: ffff8bfde92da0e0 R14: ffff8bfde9e87500 R15: ffff8bfde63c5c00
[50921.033793] FS:  0000000000000000(0000) GS:ffff8bfdffc00000(0000) knlGS:0000000000000000
[50921.035965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50921.037907] CR2: 000000000000001c CR3: 0000000427206000 CR4: 00000000003606f0
[50921.039980] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[50921.042023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[50921.044034] Call Trace:
[50921.045680]  [<ffffffffc092002c>] ? lock_res_and_lock+0x2c/0x50 [ptlrpc]
[50921.047707]  [<ffffffffc09218ea>] __ldlm_handle2lock+0x7a/0x3f0 [ptlrpc]
[50921.049728]  [<ffffffffc0949aa7>] ldlm_cancel_hpreq_check+0x97/0x220 [ptlrpc]
[50921.051774]  [<ffffffffc0986f79>] ptlrpc_main+0x17a9/0x20f0 [ptlrpc]
[50921.053690]  [<ffffffffa64d08c0>] ? finish_task_switch+0x50/0x1c0
[50921.055619]  [<ffffffffc09857d0>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc]
[50921.057616]  [<ffffffffa64c1c71>] kthread+0xd1/0xe0
[50921.059378]  [<ffffffffa64c1ba0>] ? insert_kthread_work+0x40/0x40
[50921.061220]  [<ffffffffa6b75c1d>] ret_from_fork_nospec_begin+0x7/0x21
[50921.063094]  [<ffffffffa64c1ba0>] ? insert_kthread_work+0x40/0x40
[50921.064927] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 40 1b ff ff 5d 
[50921.070141] RIP  [<ffffffffa6b6b46c>] _raw_spin_lock+0xc/0x30
[50921.071981]  RSP <ffff8bfde97a7d88>
[50921.073587] CR2: 000000000000001c

In function ldlm_cancel_hpreq_check(), there is no boundary check about the 'lock_count' parameter obtained from the 'dlm_req' structure, and 'lock_count' is taken as an index to directly access the 'lock_handle' array, resulting in an out-of-bounds access.

 

 dlm_req = req_capsule_client_get(&req->rq_pill, &RMF_DLM_REQ); 
 if (dlm_req == NULL) RETURN(-EFAULT);
 for (i = 0; i < dlm_req->lock_count; i++) { 
  struct ldlm_lock *lock;
 lock = ldlm_handle2lock(&dlm_req->lock_handle[i]); 
...

 



 Comments   
Comment by Peter Jones [ 31/Jul/19 ]

Oleg

Could you please investigate

Peter

Comment by Gerrit Updater [ 17/Aug/19 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35807
Subject: LU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b61051a5287a33be879a7e6c8783af2d172918e2

Comment by Gerrit Updater [ 03/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35807/
Subject: LU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2b7af478bdbf5c6701e0e49aefe34597bdee3126

Comment by Peter Jones [ 03/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 09/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36107
Subject: LU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 338d53e2401d1a853082f6e377c1d39fdcd756a5

Comment by Gerrit Updater [ 18/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36107/
Subject: LU-12614 ldlm: ldlm_cancel_hpreq_check should check lock count
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 4c4c4ca3a6c1c1e62a74fc25f76dd1dfa81e5265

Generated at Sat Feb 10 02:54:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.