[LU-8968] Use after free in osp_precreate_thread() Created: 23/Dec/16 Updated: 12/Nov/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
I am hitting this relatively frequently now: [101711.727214] Lustre: DEBUG MARKER: == replay-dual test 19: resend of open request ======================================================= 15:24:11 (1482438251) [101712.093258] Turning device loop0 (0x700000) read-only [101712.111290] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [101712.116120] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 [101712.627330] LustreError: 4778:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880074df8c40 x1554448289689696/t0(0) o13->lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [101712.638417] BUG: unable to handle kernel paging request at ffff88006dfc7954 [101712.638943] IP: [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150 [101712.639385] PGD 2e75067 PUD bcc1a067 PMD bcaaa067 PTE 800000006dfc7060 [101712.639786] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [101712.640160] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper ata_generic pata_acpi drm i2c_piix4 ata_piix serio_raw pcspkr i2c_core virtio_balloon virtio_console libata virtio_blk floppy nfsd ip_tables [last unloaded: libcfs] [101712.645165] CPU: 3 PID: 15844 Comm: osp-pre-1-0 Tainted: G OE ------------ 3.10.0-debug #1 [101712.646327] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [101712.646947] task: ffff88008f3f41c0 ti: ffff88007c3c4000 task.ti: ffff88007c3c4000 [101712.648175] RIP: 0010:[<ffffffff8138ea39>] [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150 [101712.649364] RSP: 0018:ffff88007c3c7cb0 EFLAGS: 00010096 [101712.649975] RAX: ffff88008f3f41c0 RBX: ffff88006dfc7950 RCX: 0000000000000000 [101712.651091] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006dfc7950 [101712.652193] RBP: ffff88007c3c7cc8 R08: 0000000000000001 R09: 0000000000000000 [101712.653453] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000296 [101712.654575] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 [101712.655989] FS: 0000000000000000(0000) GS:ffff8800bc6c0000(0000) knlGS:0000000000000000 [101712.657101] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [101712.657657] CR2: ffff88006dfc7954 CR3: 0000000001c0e000 CR4: 00000000000006e0 [101712.658676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [101712.659685] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [101712.660691] Stack: [101712.661165] ffff88006dfc7950 0000000000000296 0000000000000003 ffff88007c3c7cf0 [101712.662216] ffffffff81706b5c ffffffff810af503 ffff88006dfc7950 ffffffff00000000 [101712.663304] ffff88007c3c7d28 ffffffff810af503 ffff88008f3f41c0 ffffffff00000000 [101712.664346] Call Trace: [101712.664841] [<ffffffff81706b5c>] _raw_spin_lock_irqsave+0x5c/0x70 [101712.665405] [<ffffffff810af503>] ? __wake_up+0x23/0x50 [101712.665943] [<ffffffff810af503>] __wake_up+0x23/0x50 [101712.666495] [<ffffffffa0d00efe>] osp_precreate_thread+0x2be/0x1230 [osp] [101712.667061] [<ffffffff810af941>] ? finish_task_switch+0x81/0x180 [101712.667631] [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20 [101712.668177] [<ffffffffa0d00c40>] ? osp_init_pre_fid+0x5f0/0x5f0 [osp] [101712.668787] [<ffffffff810a2eda>] kthread+0xea/0xf0 [101712.669342] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [101712.669915] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [101712.670530] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [101712.671095] Code: 48 89 03 48 c7 c0 ff ff ff ff 48 89 43 10 89 43 0c 5b 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 <81> 7f 04 ad 4e ad de 48 89 fb 0f 85 0b 01 00 00 65 48 8b 04 25 [101712.674005] RIP [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150 [101712.674575] RSP <ffff88007c3c7cb0> [101712.675079] CR2: ffff88006dfc7954 (gdb) l *(osp_precreate_thread+0x2ba) 0x11f2a is in osp_precreate_thread (/home/green/git/lustre-release/lustre/osp/osp_precreate.c:1268). 1263 } 1264 } 1265 1266 thread->t_flags = SVC_STOPPED; 1267 lu_env_fini(&env); 1268 wake_up(&thread->t_ctl_waitq); 1269 1270 RETURN(0); 1271 } 1272 It looks like this is use after free and the osp device got freed under us because it is not apparently refcounted by the osp thread, or is it? |
| Comments |
| Comment by Oleg Drokin [ 29/Nov/17 ] |
|
Still a regular problem. |
| Comment by Gerrit Updater [ 11/Nov/19 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36730 |
| Comment by Alex Zhuravlev [ 12/Nov/19 ] |
|
while the patch above does seem to cure the problem I doubt it's a correct solution.
th1, umount th2,osp_pcrecreate_thread()
t_flags=SVC_STOPPING
wake_up()
<--- e.g. interrupt ---> t_flags=SVC_STOPPED
<---- e.g. interuupt --->
wait_event()
kfree(opd_pre_thread)
wake_up(opd_pre_thread.t_ctl_waitq) -> use-after-free
something like refcounter in opd_pre_thread or kfree() via RCU should help, I guess. also notice, OSP isn't the only place using this technique. in many places this problem seem to be masked by cycles to kfree() |