[LU-16634] Null pointer dereference in lustre_set_wire_obdo Created: 12/Mar/23 Updated: 31/Jan/24 Resolved: 31/May/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Tao Lyu | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre commit: 9ddcdee2c8b9ec14986b93cf3180d946cd4869f7 Three server nodes and one client. Kernel version: Ubuntu-5.4.0-90.101
|
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
[ 279.518552] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 279.520881] general protection fault: 0000 1 SMP KASAN NOPTI [ 279.523366] CPU: 1 PID: 555 Comm: test Tainted: G O 5.4.148+ #7 [ 279.527232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 279.530776] RIP: 0010:lustre_set_wire_obdo+0x7e/0x570 [obdclass] [ 279.556223] Call Trace: [ 279.556666] osc_getattr+0x1eb/0x950 [osc] [ 279.558046] osc_iocontrol+0x4f1/0xe80 [osc] [ 279.559241] lov_iocontrol+0x4ba/0x5de0 [lov] [ 279.567254] ll_dir_ioctl+0x2834/0x17cc0 [lustre] [ 279.590353] do_vfs_ioctl+0x405/0x660 [ 279.590784] ksys_ioctl+0x5e/0x90 |
| Comments |
| Comment by Andreas Dilger [ 12/Mar/23 ] |
|
Hello tao.lyu, this bug needs to have more information in it before there is any chance someone could identify the issue. At a minimum, a stack trace and any console logs from before the crash. Preferably also "gdb> list *(lustre_set_wire_obdo+NNN)" that shows which line in that function the crash happened. |
| Comment by Tao Lyu [ 12/Mar/23 ] |
|
Hi, sorry for the incomplete information. This crashed location is lustre/obdclass/obdo.c:182 root@dfs:~/lustre-cserver# gdb /lib/modules/5.4.148+/updates/fs/obdclass.ko
Reading symbols from /lib/modules/5.4.148+/updates/fs/obdclass.ko...
(gdb) list *(lustre_set_wire_obdo+0x7e)
0xaa7ee is in lustre_set_wire_obdo (/root/lustre-cserver/debian/tmp/modules-deb/usr_src/modules/lustre/lustre/obdclass/obdo.c:182).
177 */
178 void lustre_set_wire_obdo(const struct obd_connect_data *ocd,
179 struct obdo *wobdo,
180 const struct obdo *lobdo)
181 {
182 *wobdo = *lobdo;
183 if (ocd == NULL)
184 return;
185
186 if (!(wobdo->o_valid & OBD_MD_FLUID))
The stack trace is shown below. [ 279.516647] kasan: CONFIG_KASAN_INLINE enabled [ 279.518552] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 279.520881] general protection fault: 0000 [#1] SMP KASAN NOPTI [ 279.523366] CPU: 1 PID: 555 Comm: test Tainted: G O 5.4.148+ #7 [ 279.527232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 279.530776] RIP: 0010:lustre_set_wire_obdo+0x7e/0x570 [obdclass] [ 279.533624] Code: 40 84 c6 0f 85 78 04 00 00 84 c9 0f 95 c2 0f 9e c0 84 c2 0f 85 68 04 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 e0 48 c1 e8 03 <0f> b6 0c 10 49 8d 84 24 cf 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 [ 279.540387] RSP: 0018:ff110002456ff678 EFLAGS: 00010206 [ 279.541933] RAX: 0000000000000003 RBX: ff1100024a8462a0 RCX: 0000000000000000 [ 279.543988] RDX: dffffc0000000000 RSI: 1fe22000495ac501 RDI: ff1100024a8462a0 [ 279.545662] RBP: ff1100024ad628e8 R08: ff11000245455ac0 R09: ffe21c0049508c52 [ 279.547314] R10: ffe21c0049508c51 R11: ff1100024a84628b R12: 0000000000000018 [ 279.548570] R13: ff1100024964b000 R14: 0000000000000018 R15: ff110002509536d8 [ 279.549797] FS: 00007ffff7fc4540(0000) GS:ff11000257280000(0000) knlGS:0000000000000000 [ 279.551165] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 279.552129] CR2: 00007ffff7f4fffc CR3: 0000000247498001 CR4: 0000000000761ee0 [ 279.553346] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 279.554550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 279.555759] PKRU: 55555554 [ 279.556223] Call Trace: [ 279.556666] osc_getattr+0x1eb/0x950 [osc] [ 279.558046] osc_iocontrol+0x4f1/0xe80 [osc] [ 279.559241] lov_iocontrol+0x4ba/0x5de0 [lov] [ 279.567254] ll_dir_ioctl+0x2834/0x17cc0 [lustre] [ 279.590353] do_vfs_ioctl+0x405/0x660 [ 279.590784] ksys_ioctl+0x5e/0x90 [ 279.591175] __x64_sys_ioctl+0x16/0x20 [ 279.591626] do_syscall_64+0x48/0x140 [ 279.592038] entry_SYSCALL_64_after_hwframe+0x44/0xa9 |
| Comment by Tao Lyu [ 12/Mar/23 ] |
|
After setup the file system, executing the binary compiled from attached poc.c triggers this bug. |
| Comment by Andreas Dilger [ 13/Mar/23 ] |
|
Tao, thanks for filing your report.
Sorry, I don't see any poc.c attachment on this ticket. Alternately, if you consider this to be a security issue, you can email me the poc.c program at adilger@whamcloud.com and I can make it available internally for review until after we release a fix. Looking at the stack trace and command being run it looks like you are running some kind of ioctl/syscall fuzzer for a research project? It would be helpful to disclose this kind of information in the initial bug report. It will be interesting to see what the fuzzer is doing. |
| Comment by Tao Lyu [ 13/Mar/23 ] |
|
Hi Andreas, Oh, I'm sorry. I forgot to upload the POC. I've sent it to you. Yes, we are testing lustre through a distributed fuzzing framework. It generally tests lustre by feeding lots of mutated syscall sequences. |
| Comment by Tao Lyu [ 13/Mar/23 ] |
|
By the way, does lustre support to run servers and clients in the same node? |
| Comment by Andreas Dilger [ 13/Mar/23 ] |
This is just a misunderstanding of how the packaging is done. If you build and install the "server" package it can also mount the exported filesystem as a client. The "client" package can only mount the exported filesystem and cannot be a server. The client packages have reduced functionality to make them more easily portable to different kernels/distros. |
| Comment by Tao Lyu [ 14/Mar/23 ] |
|
Okay, got it. Thanks a lot for your explaination. |
| Comment by Gerrit Updater [ 16/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50314 |
| Comment by Andreas Dilger [ 16/Mar/23 ] |
|
Hello tao.lyu, I've created a patch that should address the class of issues that you uncovered. The Gerrit link is above, and I will also attach the current version of the patch against the master branch to this ticket. There is one cosmetic defect in the current version that is causing regression test failures (returning -EINVAL instead of -ENOTTY in one test case), but it should be ok for your testing. Please let me know if this resolved the issue you hit. Once it has been reviewed and passes regression testing the patch will be landed to master and backpirted to the maintenance branches. Please file a separate Jira ticket you uncover other unrelated issues. |
| Comment by Andreas Dilger [ 16/Mar/23 ] |
|
PS: I will add "Reported-by: Tao Lyu <tao.lyu@epfl.ch>" in the updated version of the patch. |
| Comment by Tao Lyu [ 16/Mar/23 ] |
|
Thanks, I'll test it later and reply you. Could you please take a look at another two issues I reported: Thanks! |
| Comment by Gerrit Updater [ 18/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50331 |
| Comment by Gerrit Updater [ 19/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50333 |
| Comment by Gerrit Updater [ 19/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50334 |
| Comment by Gerrit Updater [ 20/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50335 |
| Comment by Gerrit Updater [ 23/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50390 |
| Comment by Gerrit Updater [ 04/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50331/ |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50333/ |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50334/ |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50335/ |
| Comment by Andreas Dilger [ 15/Apr/23 ] |
|
tao.lyu, have you been able to validate that the provided patch (or the latest patch https://review.whamcloud.com/50314 on master) fixes the issue reported here and in |
| Comment by Gerrit Updater [ 01/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50314/ |
| Comment by Gerrit Updater [ 31/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50390/ |
| Comment by Peter Jones [ 31/May/23 ] |
|
All patches seem to have landed for 2.16 |
| Comment by Gerrit Updater [ 06/Jul/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51596 |
| Comment by Gerrit Updater [ 14/Jul/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51596/ |