Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
It seems LU-12564 patch is exposing a weakness in ofd_inconsistency_verification_main:
thread_set_flags(thread, SVC_STOPPED); wake_up_all(&thread->t_ctl_waitq); spin_unlock(&ofd->ofd_inconsistency_lock); lu_env_fini(&env);
the spi-unlock then proceeds to crash on unmapped memory:
[405815.935072] BUG: unable to handle kernel paging request at ffff8802d78127f4 [405815.937427] IP: [<ffffffff8140a0e5>] do_raw_spin_unlock+0x5/0x90 [405815.953412] PGD 241c067 PUD 33e9f9067 PMD 33e93c067 PTE 80000002d7812063 [405815.955679] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [405815.957829] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod pcc_cpufreq loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 virtio_console virtio_balloon pcspkr ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm ata_piix crct10dif_pclmul drm_panel_orientation_quirks crct10dif_common virtio_blk crc32c_intel libata serio_raw i2c_core floppy [last unloaded: libcfs] [405816.028386] [405816.030183] CPU: 4 PID: 4908 Comm: inconsistency_v Kdump: loaded Tainted: P OE ------------ 3.10.0-7.7-debug #1 [405816.048472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [405816.050687] task: ffff8802dc486d00 ti: ffff8802ca4f8000 task.ti: ffff8802ca4f8000 [405816.139729] RIP: 0010:[<ffffffff8140a0e5>] [<ffffffff8140a0e5>] do_raw_spin_unlock+0x5/0x90 [405816.154191] RSP: 0018:ffff8802ca4fbd60 EFLAGS: 00010292 [405816.156260] RAX: 0000000000000000 RBX: ffff8802d78127e0 RCX: dead000000000200 [405816.166949] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffff8802d78127f0 [405816.171118] RBP: ffff8802ca4fbd68 R08: ffff8800ab47bb48 R09: ffffffff8221eb80 [405816.175495] R10: 0000000000000000 R11: 0000000000000400 R12: ffff8802d7812000 [405816.195126] R13: ffff8802d78127f0 R14: ffff8802dc486d00 R15: ffff88032514b680 [405816.204188] FS: 0000000000000000(0000) GS:ffff88033db00000(0000) knlGS:0000000000000000 [405816.209047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [405816.218345] CR2: ffff8802d78127f4 CR3: 0000000001c10000 CR4: 00000000001607e0 [405816.234013] Call Trace: [405816.236025] [<ffffffff817d662e>] _raw_spin_unlock+0xe/0x20 [405816.252333] [<ffffffffa0fd5472>] ofd_inconsistency_verification_main+0xd52/0xde0 [ofd] [405816.259324] [<ffffffff8140a129>] ? do_raw_spin_unlock+0x49/0x90 [405816.261588] [<ffffffff810b93f0>] ? wake_up_atomic_t+0x30/0x30 [405816.263625] [<ffffffffa0fd4720>] ? ofd_cb_soft_sync+0x240/0x240 [ofd] [405816.265897] [<ffffffff810b8254>] kthread+0xe4/0xf0 [405816.268022] [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140 [405816.270246] [<ffffffff817e0ddd>] ret_from_fork_nospec_begin+0x7/0x21 [405816.272514] [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
I am not 100% sure how it unfolds but at the time of crash two other CPUs are running vfree from delayed work
It almost sounds like the parallel ofd_fini thread does the vfree that's kicked out to the delayed work that has a better chance to run than both the ofd_fini and the inconsistency threads for some reason.
It seems we really should do that unlock before the wake up call though.
Attachments
Issue Links
- is related to
-
LU-12564 ptlrpcd daemon sleeps while holding imp_lock spinlock
- Resolved