[LU-8968] Use after free in osp_precreate_thread() Created: 23/Dec/16  Updated: 12/Nov/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I am hitting this relatively frequently now:

[101711.727214] Lustre: DEBUG MARKER: == replay-dual test 19: resend of open request ======================================================= 15:24:11 (1482438251)
[101712.093258] Turning device loop0 (0x700000) read-only
[101712.111290] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[101712.116120] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
[101712.627330] LustreError: 4778:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880074df8c40 x1554448289689696/t0(0) o13->lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
[101712.638417] BUG: unable to handle kernel paging request at ffff88006dfc7954
[101712.638943] IP: [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150
[101712.639385] PGD 2e75067 PUD bcc1a067 PMD bcaaa067 PTE 800000006dfc7060
[101712.639786] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[101712.640160] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper ata_generic pata_acpi drm i2c_piix4 ata_piix serio_raw pcspkr i2c_core virtio_balloon virtio_console libata virtio_blk floppy nfsd ip_tables [last unloaded: libcfs]
[101712.645165] CPU: 3 PID: 15844 Comm: osp-pre-1-0 Tainted: G           OE  ------------   3.10.0-debug #1
[101712.646327] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[101712.646947] task: ffff88008f3f41c0 ti: ffff88007c3c4000 task.ti: ffff88007c3c4000
[101712.648175] RIP: 0010:[<ffffffff8138ea39>]  [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150
[101712.649364] RSP: 0018:ffff88007c3c7cb0  EFLAGS: 00010096
[101712.649975] RAX: ffff88008f3f41c0 RBX: ffff88006dfc7950 RCX: 0000000000000000
[101712.651091] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006dfc7950
[101712.652193] RBP: ffff88007c3c7cc8 R08: 0000000000000001 R09: 0000000000000000
[101712.653453] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000296
[101712.654575] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
[101712.655989] FS:  0000000000000000(0000) GS:ffff8800bc6c0000(0000) knlGS:0000000000000000
[101712.657101] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[101712.657657] CR2: ffff88006dfc7954 CR3: 0000000001c0e000 CR4: 00000000000006e0
[101712.658676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[101712.659685] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[101712.660691] Stack:
[101712.661165]  ffff88006dfc7950 0000000000000296 0000000000000003 ffff88007c3c7cf0
[101712.662216]  ffffffff81706b5c ffffffff810af503 ffff88006dfc7950 ffffffff00000000
[101712.663304]  ffff88007c3c7d28 ffffffff810af503 ffff88008f3f41c0 ffffffff00000000
[101712.664346] Call Trace:
[101712.664841]  [<ffffffff81706b5c>] _raw_spin_lock_irqsave+0x5c/0x70
[101712.665405]  [<ffffffff810af503>] ? __wake_up+0x23/0x50
[101712.665943]  [<ffffffff810af503>] __wake_up+0x23/0x50
[101712.666495]  [<ffffffffa0d00efe>] osp_precreate_thread+0x2be/0x1230 [osp]
[101712.667061]  [<ffffffff810af941>] ? finish_task_switch+0x81/0x180
[101712.667631]  [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20
[101712.668177]  [<ffffffffa0d00c40>] ? osp_init_pre_fid+0x5f0/0x5f0 [osp]
[101712.668787]  [<ffffffff810a2eda>] kthread+0xea/0xf0
[101712.669342]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
[101712.669915]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
[101712.670530]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
[101712.671095] Code: 48 89 03 48 c7 c0 ff ff ff ff 48 89 43 10 89 43 0c 5b 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 <81> 7f 04 ad 4e ad de 48 89 fb 0f 85 0b 01 00 00 65 48 8b 04 25 
[101712.674005] RIP  [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150
[101712.674575]  RSP <ffff88007c3c7cb0>
[101712.675079] CR2: ffff88006dfc7954
(gdb) l *(osp_precreate_thread+0x2ba)
0x11f2a is in osp_precreate_thread (/home/green/git/lustre-release/lustre/osp/osp_precreate.c:1268).
1263			}
1264		}
1265	
1266		thread->t_flags = SVC_STOPPED;
1267		lu_env_fini(&env);
1268		wake_up(&thread->t_ctl_waitq);
1269	
1270		RETURN(0);
1271	}
1272

It looks like this is use after free and the osp device got freed under us because it is not apparently refcounted by the osp thread, or is it?
Would lu env pin it and so the lu_env_fini should be moved after the wake_up call?
or should the refcounting be added I wonder?



 Comments   
Comment by Oleg Drokin [ 29/Nov/17 ]

Still a regular problem.

Comment by Gerrit Updater [ 11/Nov/19 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36730
Subject: LU-8968 osp: protect t_flags at stopping
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 920400e965e2c969778402556bbc184c7851047c

Comment by Alex Zhuravlev [ 12/Nov/19 ]

while the patch above does seem to cure the problem I doubt it's a correct solution.
at the moment I'm not sure how the correct solution should look like.
so basically the problem is that osp_statfs_fini(), running in umount thread, finds t_flags=SVC_STOPPED before schedule() in wait_event() and just keep going toward freeing structure containing opd_pre_thread.

th1, umount                           th2,osp_pcrecreate_thread()
t_flags=SVC_STOPPING
wake_up()
<--- e.g. interrupt --->        t_flags=SVC_STOPPED
                                              <---- e.g. interuupt --->
wait_event()
kfree(opd_pre_thread)
                                               wake_up(opd_pre_thread.t_ctl_waitq) -> use-after-free

something like refcounter in opd_pre_thread or kfree() via RCU should help, I guess.
any other option?

also notice, OSP isn't the only place using this technique. in many places this problem seem to be masked by cycles to kfree()

Generated at Sat Feb 10 02:22:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.