[LU-2167] Race in ptlrpcd_stop/fini vs ptlrpcd leaving to use after free Created: 13/Oct/12  Updated: 19/Apr/13  Resolved: 23/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5198

 Description   

I met this oops on my testing system several times and thought it did not make any sense initially, but after looking into the code I see that it does make sense after all.

[11898.846141] BUG: unable to handle kernel paging request at ffff88008fb39800
[11898.846670] IP: [<ffffffffa1368c93>] ptlrpcd+0x2c3/0x3a0 [ptlrpc]
[11898.847291] PGD 1a26063 PUD 501067 PMD 57f067 PTE 800000008fb39160
[11898.847789] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[11898.848237] last sysfs file: /sys/devices/system/cpu/possible
[11898.848678] CPU 2 
[11898.848743] Modules linked in: ptlrpc(-) obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: fld]
[11898.850099] 
[11898.850099] Pid: 30737, comm: ptlrpcd_rcv Not tainted 2.6.32-debug #6 Bochs Bochs
[11898.850099] RIP: 0010:[<ffffffffa1368c93>]  [<ffffffffa1368c93>] ptlrpcd+0x2c3/0x3a0 [ptlrpc]
[11898.850099] RSP: 0018:ffff880088529e70  EFLAGS: 00010286
[11898.850099] RAX: 0000000000000002 RBX: ffff88008fb39800 RCX: 0000000000000001
[11898.850099] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
[11898.850099] RBP: ffff880088529f40 R08: 0000000000000000 R09: 0000000000000001
[11898.850099] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880088529ea0
[11898.850099] R13: ffff880083785f30 R14: 0000000000000002 R15: ffff880088529ee0
[11898.850099] FS:  00007f449616c700(0000) GS:ffff880006300000(0000) knlGS:0000000000000000
[11898.850099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[11898.850099] CR2: ffff88008fb39800 CR3: 00000000b3178000 CR4: 00000000000006e0
[11898.850099] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[11898.850099] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[11898.850099] Process ptlrpcd_rcv (pid: 30737, threadinfo ffff880088528000, task ffff880070208500)
[11898.850099] Stack:
[11898.850099]  ffff880070208500 ffffffffffffffff ffff880083785f40 0000000000000000
[11898.850099] <d> ffff880070208500 0000000000000000 00000004b0000008 0000000000000000
[11898.850099] <d> 0000000000000000 ffff880088529eb8 ffff880088529eb8 0000000000000054
[11898.850099] Call Trace:
[11898.850099]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
[11898.850099]  [<ffffffffa13689d0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
[11898.850099]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[11898.850099]  [<ffffffffa13689d0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
[11898.850099]  [<ffffffffa13689d0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
[11898.850099]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
[11898.850099] Code: e9 ba fe ff ff 0f 1f 00 49 8d 45 40 49 39 45 40 74 08 4c 89 ef e8 8e f6 fc ff 4c 89 e7 e8 46 1f ea ff 48 8d 7b 50 e8 8d 91 ce df <f0> 80 23 fd f0 80 23 fb f0 80 23 ef f0 80 63 02 fe e9 17 fe ff 
[11898.850099] RIP  [<ffffffffa1368c93>] ptlrpcd+0x2c3/0x3a0 [ptlrpc]
[11898.850099]  RSP <ffff880088529e70>
[11898.850099] CR2: ffff88008fb39800

Looking at the code, in ptlrpcd() we see:

        if (!cfs_list_empty(&set->set_requests))
                ptlrpc_set_wait(set);
        lu_context_fini(&env.le_ctx);
        cfs_complete(&pc->pc_finishing);

        cfs_clear_bit(LIOD_START, &pc->pc_flags);  <=== Crash Here
        cfs_clear_bit(LIOD_STOP, &pc->pc_flags);
        cfs_clear_bit(LIOD_FORCE, &pc->pc_flags);

And in ptlrpcd_stop/fini():

{
...
#ifdef __KERNEL__
        cfs_wait_for_completion(&pc->pc_finishing);
#else
        liblustre_deregister_wait_callback(pc->pc_wait_callback);
        liblustre_deregister_idle_callback(pc->pc_idle_callback);
#endif
        lu_context_fini(&pc->pc_env.le_ctx);
...
        EXIT;
}
static void ptlrpcd_fini(void)
{
        int i;
        ENTRY;

        if (ptlrpcds != NULL) {
                for (i = 0; i < ptlrpcds->pd_nthreads; i++)
                        ptlrpcd_stop(&ptlrpcds->pd_threads[i], 0);
                ptlrpcd_stop(&ptlrpcds->pd_thread_rcv, 0);
lost race =>>   OBD_FREE(ptlrpcds, ptlrpcds->pd_size);
                ptlrpcds = NULL;
        }
}

I think the proper solution to avoid the race should be to move cfs_clear_bit()s to before signalling the completion.



 Comments   
Comment by Oleg Drokin [ 13/Oct/12 ]

Suggested patch in http://review.whamcloud.com/4264

Comment by Oleg Drokin [ 23/Oct/12 ]

Patch landed, resolving

Generated at Sat Feb 10 01:22:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.