Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8968

Use after free in osp_precreate_thread()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      I am hitting this relatively frequently now:

      [101711.727214] Lustre: DEBUG MARKER: == replay-dual test 19: resend of open request ======================================================= 15:24:11 (1482438251)
      [101712.093258] Turning device loop0 (0x700000) read-only
      [101712.111290] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      [101712.116120] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
      [101712.627330] LustreError: 4778:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880074df8c40 x1554448289689696/t0(0) o13->lustre-OST0001-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [101712.638417] BUG: unable to handle kernel paging request at ffff88006dfc7954
      [101712.638943] IP: [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150
      [101712.639385] PGD 2e75067 PUD bcc1a067 PMD bcaaa067 PTE 800000006dfc7060
      [101712.639786] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [101712.640160] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm drm_kms_helper ata_generic pata_acpi drm i2c_piix4 ata_piix serio_raw pcspkr i2c_core virtio_balloon virtio_console libata virtio_blk floppy nfsd ip_tables [last unloaded: libcfs]
      [101712.645165] CPU: 3 PID: 15844 Comm: osp-pre-1-0 Tainted: G           OE  ------------   3.10.0-debug #1
      [101712.646327] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [101712.646947] task: ffff88008f3f41c0 ti: ffff88007c3c4000 task.ti: ffff88007c3c4000
      [101712.648175] RIP: 0010:[<ffffffff8138ea39>]  [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150
      [101712.649364] RSP: 0018:ffff88007c3c7cb0  EFLAGS: 00010096
      [101712.649975] RAX: ffff88008f3f41c0 RBX: ffff88006dfc7950 RCX: 0000000000000000
      [101712.651091] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006dfc7950
      [101712.652193] RBP: ffff88007c3c7cc8 R08: 0000000000000001 R09: 0000000000000000
      [101712.653453] R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000296
      [101712.654575] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
      [101712.655989] FS:  0000000000000000(0000) GS:ffff8800bc6c0000(0000) knlGS:0000000000000000
      [101712.657101] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [101712.657657] CR2: ffff88006dfc7954 CR3: 0000000001c0e000 CR4: 00000000000006e0
      [101712.658676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [101712.659685] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [101712.660691] Stack:
      [101712.661165]  ffff88006dfc7950 0000000000000296 0000000000000003 ffff88007c3c7cf0
      [101712.662216]  ffffffff81706b5c ffffffff810af503 ffff88006dfc7950 ffffffff00000000
      [101712.663304]  ffff88007c3c7d28 ffffffff810af503 ffff88008f3f41c0 ffffffff00000000
      [101712.664346] Call Trace:
      [101712.664841]  [<ffffffff81706b5c>] _raw_spin_lock_irqsave+0x5c/0x70
      [101712.665405]  [<ffffffff810af503>] ? __wake_up+0x23/0x50
      [101712.665943]  [<ffffffff810af503>] __wake_up+0x23/0x50
      [101712.666495]  [<ffffffffa0d00efe>] osp_precreate_thread+0x2be/0x1230 [osp]
      [101712.667061]  [<ffffffff810af941>] ? finish_task_switch+0x81/0x180
      [101712.667631]  [<ffffffff810b7ce0>] ? wake_up_state+0x20/0x20
      [101712.668177]  [<ffffffffa0d00c40>] ? osp_init_pre_fid+0x5f0/0x5f0 [osp]
      [101712.668787]  [<ffffffff810a2eda>] kthread+0xea/0xf0
      [101712.669342]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
      [101712.669915]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
      [101712.670530]  [<ffffffff810a2df0>] ? kthread_create_on_node+0x140/0x140
      [101712.671095] Code: 48 89 03 48 c7 c0 ff ff ff ff 48 89 43 10 89 43 0c 5b 41 5c 41 5d 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 <81> 7f 04 ad 4e ad de 48 89 fb 0f 85 0b 01 00 00 65 48 8b 04 25 
      [101712.674005] RIP  [<ffffffff8138ea39>] do_raw_spin_lock+0x9/0x150
      [101712.674575]  RSP <ffff88007c3c7cb0>
      [101712.675079] CR2: ffff88006dfc7954
      
      (gdb) l *(osp_precreate_thread+0x2ba)
      0x11f2a is in osp_precreate_thread (/home/green/git/lustre-release/lustre/osp/osp_precreate.c:1268).
      1263			}
      1264		}
      1265	
      1266		thread->t_flags = SVC_STOPPED;
      1267		lu_env_fini(&env);
      1268		wake_up(&thread->t_ctl_waitq);
      1269	
      1270		RETURN(0);
      1271	}
      1272
      

      It looks like this is use after free and the osp device got freed under us because it is not apparently refcounted by the osp thread, or is it?
      Would lu env pin it and so the lu_env_fini should be moved after the wake_up call?
      or should the refcounting be added I wonder?

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: