Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.0, Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
I am hitting this relatively frequently now:
[101711.727214] Lustre: DEBUG MARKER: == replay-dual test 19: resend of open request ======================================================= 15:24:11 (1482438251) [101712.093258] Turning device loop0 (0x700000) read-only [101712.111290] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [101712.116120] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 [101712.627330] LustreError: 4778:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880074df8c40 x1554448289689696/t0(0) o13->lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [101712.638417] BUG: unable to handle kernel paging request at ffff88006dfc7954 [101712.638943] IP: [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150 [101712.639385] PGD 2e75067 PUD bcc1a067 PMD bcaaa067 PTE 800000006dfc7060 [101712.639786] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [101712.640160] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper ata_generic pata_acpi drm i2c_piix4 ata_piix serio_raw pcspkr i2c_core virtio_balloon virtio_console libata virtio_blk floppy nfsd ip_tables [last unloaded: libcfs] [101712.645165] CPU: 3 PID: 15844 Comm: osp-pre-1-0 Tainted: G OE ------------ 3.10.0-debug #1 [101712.646327] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [101712.646947] task: ffff88008f3f41c0 ti: ffff88007c3c4000 task.ti: ffff88007c3c4000 [101712.648175] RIP: 0010:[<ffffffff8138ea39>] [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150 [101712.649364] RSP: 0018:ffff88007c3c7cb0 EFLAGS: 00010096 [101712.649975] RAX: ffff88008f3f41c0 RBX: ffff88006dfc7950 RCX: 0000000000000000 [101712.651091] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006dfc7950 [101712.652193] RBP: ffff88007c3c7cc8 R08: 0000000000000001 R09: 0000000000000000 [101712.653453] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000296 [101712.654575] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 [101712.655989] FS: 0000000000000000(0000) GS:ffff8800bc6c0000(0000) knlGS:0000000000000000 [101712.657101] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [101712.657657] CR2: ffff88006dfc7954 CR3: 0000000001c0e000 CR4: 00000000000006e0 [101712.658676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [101712.659685] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [101712.660691] Stack: [101712.661165] ffff88006dfc7950 0000000000000296 0000000000000003 ffff88007c3c7cf0 [101712.662216] ffffffff81706b5c ffffffff810af503 ffff88006dfc7950 ffffffff00000000 [101712.663304] ffff88007c3c7d28 ffffffff810af503 ffff88008f3f41c0 ffffffff00000000 [101712.664346] Call Trace: [101712.664841] [<ffffffff81706b5c>] _raw_spin_lock_irqsave+0x5c/0x70 [101712.665405] [<ffffffff810af503>] ? __wake_up+0x23/0x50 [101712.665943] [<ffffffff810af503>] __wake_up+0x23/0x50 [101712.666495] [<ffffffffa0d00efe>] osp_precreate_thread+0x2be/0x1230 [osp] [101712.667061] [<ffffffff810af941>] ? finish_task_switch+0x81/0x180 [101712.667631] [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20 [101712.668177] [<ffffffffa0d00c40>] ? osp_init_pre_fid+0x5f0/0x5f0 [osp] [101712.668787] [<ffffffff810a2eda>] kthread+0xea/0xf0 [101712.669342] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [101712.669915] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [101712.670530] [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140 [101712.671095] Code: 48 89 03 48 c7 c0 ff ff ff ff 48 89 43 10 89 43 0c 5b 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 <81> 7f 04 ad 4e ad de 48 89 fb 0f 85 0b 01 00 00 65 48 8b 04 25 [101712.674005] RIP [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150 [101712.674575] RSP <ffff88007c3c7cb0> [101712.675079] CR2: ffff88006dfc7954
(gdb) l *(osp_precreate_thread+0x2ba) 0x11f2a is in osp_precreate_thread (/home/green/git/lustre-release/lustre/osp/osp_precreate.c:1268). 1263 } 1264 } 1265 1266 thread->t_flags = SVC_STOPPED; 1267 lu_env_fini(&env); 1268 wake_up(&thread->t_ctl_waitq); 1269 1270 RETURN(0); 1271 } 1272
It looks like this is use after free and the osp device got freed under us because it is not apparently refcounted by the osp thread, or is it?
Would lu env pin it and so the lu_env_fini should be moved after the wake_up call?
or should the refcounting be added I wonder?