Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14024

ofd_inconsistency_verification_main use after free on shutdown.

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      It seems LU-12564 patch is exposing a weakness in ofd_inconsistency_verification_main:

              thread_set_flags(thread, SVC_STOPPED);
              wake_up_all(&thread->t_ctl_waitq);
              spin_unlock(&ofd->ofd_inconsistency_lock);
              lu_env_fini(&env);
      

      the spi-unlock then proceeds to crash on unmapped memory:

      [405815.935072] BUG: unable to handle kernel paging request at ffff8802d78127f4
      [405815.937427] IP: [<ffffffff8140a0e5>] do_raw_spin_unlock+0x5/0x90
      [405815.953412] PGD 241c067 PUD 33e9f9067 PMD 33e93c067 PTE 80000002d7812063
      [405815.955679] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [405815.957829] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod pcc_cpufreq loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 virtio_console virtio_balloon pcspkr ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix crct10dif_pclmul drm_panel_orientation_quirks crct10dif_common virtio_blk crc32c_intel libata serio_raw i2c_core floppy [last unloaded: libcfs]
      [405816.028386] 
      [405816.030183] CPU: 4 PID: 4908 Comm: inconsistency_v Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.7-debug #1
      [405816.048472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [405816.050687] task: ffff8802dc486d00 ti: ffff8802ca4f8000 task.ti: ffff8802ca4f8000
      [405816.139729] RIP: 0010:[<ffffffff8140a0e5>]  [<ffffffff8140a0e5>] do_raw_spin_unlock+0x5/0x90
      [405816.154191] RSP: 0018:ffff8802ca4fbd60  EFLAGS: 00010292
      [405816.156260] RAX: 0000000000000000 RBX: ffff8802d78127e0 RCX: dead000000000200
      [405816.166949] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffff8802d78127f0
      [405816.171118] RBP: ffff8802ca4fbd68 R08: ffff8800ab47bb48 R09: ffffffff8221eb80
      [405816.175495] R10: 0000000000000000 R11: 0000000000000400 R12: ffff8802d7812000
      [405816.195126] R13: ffff8802d78127f0 R14: ffff8802dc486d00 R15: ffff88032514b680
      [405816.204188] FS:  0000000000000000(0000) GS:ffff88033db00000(0000) knlGS:0000000000000000
      [405816.209047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [405816.218345] CR2: ffff8802d78127f4 CR3: 0000000001c10000 CR4: 00000000001607e0
      [405816.234013] Call Trace:
      [405816.236025]  [<ffffffff817d662e>] _raw_spin_unlock+0xe/0x20
      [405816.252333]  [<ffffffffa0fd5472>] ofd_inconsistency_verification_main+0xd52/0xde0 [ofd]
      [405816.259324]  [<ffffffff8140a129>] ? do_raw_spin_unlock+0x49/0x90
      [405816.261588]  [<ffffffff810b93f0>] ? wake_up_atomic_t+0x30/0x30
      [405816.263625]  [<ffffffffa0fd4720>] ? ofd_cb_soft_sync+0x240/0x240 [ofd]
      [405816.265897]  [<ffffffff810b8254>] kthread+0xe4/0xf0
      [405816.268022]  [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
      [405816.270246]  [<ffffffff817e0ddd>] ret_from_fork_nospec_begin+0x7/0x21
      [405816.272514]  [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
      

      I am not 100% sure how it unfolds but at the time of crash two other CPUs are running vfree from delayed work

      It almost sounds like the parallel ofd_fini thread does the vfree that's kicked out to the delayed work that has a better chance to run than both the ofd_fini and the inconsistency threads for some reason.

      It seems we really should do that unlock before the wake up call though.

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: