[LU-12590] Lustre lustre_msg_hdr_size_v2() bug Created: 26/Jul/19  Updated: 18/Sep/19  Resolved: 07/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: Alibaba Cloud Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Red hat 7


Issue Links:
Related
is related to LU-12605 Lustre target_handle_connect() bug Resolved
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

In the latest version of lustre file system, ptlrpc module has a out of read bug due to the lack of validation for specific fields of packets sent by client.

 

The kenrel panic:

CPU: 0 PID: 3002 Comm: ll_mgs_0002
Kdump: loaded
Tainted: G OE ------------ 3.10.0-957.10.1.el7_lustre.x86_64 #1
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 99a222b 04/01/2014
task: ffff986c19a85140 ti: ffff986c22a88000 task.ti: ffff986c22a88000
RIP: 0010:[<ffffffffc077a480>] [<ffffffffc077a480>] __lustre_unpack_msg+0x100/0x430 [ptlrpc]
RSP: 0018:ffff986c22a8bda0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff986c2ed0e000 RCX: 00000000786c35f8 
RDX: 00000000000000e0 RSI: 00000000044bc7f8 RDI: ffff986c2ed0e000 
RBP: ffff986c22a8bdb8 R08: 00000000044bc7f8 R09: 0000000000000008 
R10: 00000000ffffff10 R11: 0000000000000005 R12: ffff986c2ed0e000
R13: ffff986c19bf77c0 R14: ffff986c2aa24700 R15: ffff986c19ea9000
FS: 0000000000000000(0000) GS:ffff986c3fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff986c40000000 CR3: 000000042277c000 CR4: 00000000003606f0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
Call Trace:
[<ffffffffc07ad7d3>] sptlrpc_svc_unwrap_request+0x73/0x600 [ptlrpc] 
[<ffffffffc078e236>] ptlrpc_main+0xa66/0x20f0 [ptlrpc]
[<ffffffff9e2c1c71>] kthread+0xd1/0xe0 
[<ffffffff9e975c1d>] ret_from_fork_nospec_begin+0x7/0x21
Code: 
RIP [<ffffffffc077a480>] __lustre_unpack_msg+0x100/0x430 [ptlrpc]
RSP <ffff986c22a8bda0>
CR2: ffff986c40000000

 

In the 'sptlrpc_svc_unwrap_request' function of ptlrpc module, lustre_msg_hdr_size_v2() parses lustre_msg sent by client, but does not check the value, which results in out-of-bounds read.

static inline __u32 lustre_msg_hdr_size_v2(__u32 count)

{
          return cfs_size_round(offsetof(struct lustre_msg_v2, lm_buflens[count]));
}

 

We can trigger this bug by sending a malformed lustre packet and modifying the lm_bufcount field.

 

 

 

 

 



 Comments   
Comment by Peter Jones [ 26/Jul/19 ]

Emoly

Could you please assist with this one?

Thanks

Peter

Comment by Andreas Dilger [ 26/Jul/19 ]

The most simple fix would be to check the count files in lustre_msg_hdr_size_v2(), but it is likely that we would need to add similar checks all over the code. Instead, it makes sense to do a higher-level validation of the RPC format in __lustre_unpack_msg() (lm_bufcount, etc.) before any of these fields are used.

Comment by Andreas Dilger [ 01/Aug/19 ]

Please add "Reported-by: Alibaba Cloud <yunye.ry@alibaba-inc.com>" to the patch commit message.

Comment by Gerrit Updater [ 13/Aug/19 ]

Emoly Liu (emoly@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35783
Subject: LU-12590 ptlrpc: check lm_bufcount for lustre_msg_hdr_size_v2()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fd0533b2f934ffc644f818ece41fc9349ea447fb

Comment by Andreas Dilger [ 27/Aug/19 ]

Alibaba Cloud, are you willing to share your testing tool? That would save us development effort.

Otherwise, my thought is to add a fail_loc that causes an outgoing RPC message buffer to be randomly corrupted in some small way like a fuzzer tool. This would mean changing eg. one byte in the header or body of the message at some frequency below 100%, otherwise it may be that the client would just be evicted if all of its messages are broken.

Comment by Gerrit Updater [ 07/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35783/
Subject: LU-12590 ptlrpc: check lm_bufcount and lm_buflen
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 268edb13d769994c4841864034d72f0bd7b36e12

Comment by Peter Jones [ 07/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 09/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36119
Subject: LU-12590 ptlrpc: check lm_bufcount and lm_buflen
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 4a346abc8cf45088e4b27dda0ce5100039b60eea

Comment by Gerrit Updater [ 18/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36119/
Subject: LU-12590 ptlrpc: check lm_bufcount and lm_buflen
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 2257e1ed7ce6a449fdc52ca7a492b8320289e9db

Generated at Sat Feb 10 02:53:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.