[LU-13146] crash in lod_sub_recovery_thread Created: 16/Jan/20  Updated: 16/Jan/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

hit this oops in my boilpot:

[195498.474047] BUG: unable to handle kernel paging request at ffff8802fd732fc4
[195498.474922] IP: [<ffffffffa1283d70>] lod_sub_recovery_thread+0x4b0/0xd00 [lod]
[195498.474922] PGD 241c067 PUD 33e9f9067 PMD 33e80d067 PTE 80000002fd732060
[195498.474922] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[195498.474922] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) pcc_cpufreq zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon pcspkr virtio_console i2c_piix4 ip_tables rpcsec_gss_krb5 ata_generic drm_kms_helper pata_acpi ttm drm crct10dif_pclmul crct10dif_common ata_piix drm_panel_orientation_quirks crc32c_intel serio_raw virtio_blk i2c_core libata floppy [last unloaded: libcfs]
[195498.474922] CPU: 6 PID: 16186 Comm: lod0000_rec0002 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.7-debug #1
[195498.474922] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[195498.474922] task: ffff88009f09c4c0 ti: ffff880323750000 task.ti: ffff880323750000
[195498.474922] RIP: 0010:[<ffffffffa1283d70>]  [<ffffffffa1283d70>] lod_sub_recovery_thread+0x4b0/0xd00 [lod]
[195498.474922] RSP: 0018:ffff880323753e10  EFLAGS: 00010286
[195498.474922] RAX: ffff8802fd732f00 RBX: 0000000000000000 RCX: 0000000000000000
[195498.474922] RDX: ffff8802c085b0b8 RSI: 000000000000006b RDI: 0000000000000286
[195498.474922] RBP: ffff880323753ea0 R08: 0000000000000010 R09: ffff8802b63aa900
[195498.474922] R10: 0000000000000000 R11: 000000000000000f R12: ffff8802edc34000
[195498.474922] R13: ffff8802dd5b2000 R14: ffff8802b63aa900 R15: 00000000fffffffb
[195498.474922] FS:  0000000000000000(0000) GS:ffff88033db80000(0000) knlGS:0000000000000000
[195498.474922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[195498.474922] CR2: ffff8802fd732fc4 CR3: 00000002b6bb6000 CR4: 00000000001607e0
[195498.474922] Call Trace:
[195498.474922]  [<ffffffffa12838c0>] ? lod_trans_stop+0x340/0x340 [lod]
[195498.474922]  [<ffffffff810b8254>] kthread+0xe4/0xf0
[195498.474922]  [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
[195498.474922]  [<ffffffff817e0ddd>] ret_from_fork_nospec_begin+0x7/0x21
[195498.474922]  [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
[195498.474922] Code: f6 05 70 dd 5c ff 04 0f 85 f6 03 00 00 4c 89 f7 e8 86 3d f9 df 48 8b 45 88 48 8b 95 78 ff ff ff c7 40 18 01 00 00 00 48 8b 42 48 <f0> ff 88 c4 00 00 00 48 8b 7a 48 31 c9 ba 01 00 00 00 be 03 00 

it's in this code:

0x2d9e is in lod_sub_recovery_thread (/home/green/git/lustre-release/lustre/lod/lod_dev.c:478).
473		EXIT;
474	
475	out:
476		OBD_FREE_PTR(lrd);
477		thread->t_flags = SVC_STOPPED;
478		atomic_dec(&lut->lut_tdtd->tdtd_recovery_threads_count);
479		wake_up(&lut->lut_tdtd->tdtd_recovery_threads_waitq);
480		wake_up(&thread->t_ctl_waitq);
481		lu_env_fini(&env);
482		return rc;

examining logs it's a use after free:

:
00000004:00000010:6.0:1578997461.530712:0:16186:0:(lod_dev.c:476:lod_sub_recover
y_thread()) kfreed 'lrd': 32 at ffff8802b63aa900.
00000004:00000010:1.0:1578997461.538400:0:17416:0:(lod_dev.c:869:lod_fini_distribute_txn()) kfreed 'lut->lut_tdtd': 256 at ffff8802fd732f00.

the oops time converted from since bootup to wallclock is 1578997461.539770

the call to lod_fini_distribute_txn looks like this:

                lod_sub_stop_recovery_threads(env, lod);
                lod_fini_distribute_txn(env, lod);

and lod_sub_stop_recovery_threads:

        lod_getref(&lod->lod_mdt_descs);
        lod_foreach_mdt(lod, mdt) {
                thread = mdt->ltd_recovery_thread;
                if (thread && thread->t_flags & SVC_RUNNING) {
                        thread->t_flags = SVC_STOPPING;
                        wake_up(&thread->t_ctl_waitq);
                        wait_event(thread->t_ctl_waitq,
                                   thread->t_flags & SVC_STOPPED);
                        OBD_FREE_PTR(mdt->ltd_recovery_thread);
                        mdt->ltd_recovery_thread = NULL;
                }
        }
        lod_putref(lod, &lod->lod_mdt_descs);

there's no mdt->ltd_recovery_thread freeing in the logs from lod_sub_stop_recovery_threads so it appears by the time we got to the flags check the lod_sub_recovery_thread has already set it to stopped (it appears that the allocation would also be leaked in this case as I don't see anything else that would free it)

hm, the fix is pobably going to be annoying if this code could be entered more than once.


Generated at Sat Feb 10 02:58:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.