[LU-1114] ptrlrpd thread spinning since Lustre start on Client Created: 17/Feb/12  Updated: 19/Nov/12  Resolved: 02/Mar/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alexandre Louvet Assignee: nasf (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6452

 Description   

Example of affected thread :
============================
PID: 7947 TASK: ffff881030721850 CPU: 1 COMMAND: "ptlrpcd_4"
#0 [ffff880044e27e90] crash_nmi_callback at ffffffff8101fd06
0000001 [ffff880044e27ea0] notifier_call_chain at ffffffff814837f5
0000002 [ffff880044e27ee0] atomic_notifier_call_chain at ffffffff8148385a
0000003 [ffff880044e27ef0] notify_die at ffffffff8108026e
0000004 [ffff880044e27f20] do_nmi at ffffffff81481443
0000005 [ffff880044e27f50] nmi at ffffffff81480d50
[exception RIP: _spin_lock+30]
RIP: ffffffff8148062e RSP: ffff881030757da0 RFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff881030632540 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88103c45e3d0 RDI: ffff88103c45e498
RBP: ffff881030757da0 R8: ebc0de0100000000 R9: ffffffff00000100
R10: 0000000000000000 R11: 000000000000000f R12: ffff881030632540
R13: ffff881030632570 R14: ffff88103c45e3d0 R15: ffff88103c45e498
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
0000006 [ffff881030757da0] _spin_lock at ffffffff8148062e
0000007 [ffff881030757da8] ptlrpcd_check at ffffffffa05cfecc [ptlrpc]
0000008 [ffff881030757e38] ptlrpcd at ffffffffa05d03ff [ptlrpc]
0000009 [ffff881030757f48] kernel_thread at ffffffff810041aa
============================

Concerned "partner" ptlrpcds->pd_threads[]->pc_lock spin_lock seems not initialized causing current ptlrpcd thread to spin for-ever !!!

A possible fix for this problem should be to wait for all ptlrpcd/partner threads to fully initialize prior to start operations ....



 Comments   
Comment by Peter Jones [ 17/Feb/12 ]

Bruno

Could you please confirm what version of the code you are running? You have marked this as a 2.1 issue but this seems to be related to mult-threaded ptlrpc which is a 2.2 feature

Thanks

Peter

Comment by Peter Jones [ 21/Feb/12 ]

Bruno?

Comment by Aurelien Degremont (Inactive) [ 21/Feb/12 ]

In fact this is Lustre 2.1 + patch ORNL-22 applied.

Comment by Peter Jones [ 21/Feb/12 ]

Fanyong

Could you please comment on this issue?

Thanks

Peter

Comment by Bruno Faccini (Inactive) [ 21/Feb/12 ]

Sorry for my silence Peter, but in-between others problems debugging, I was waiting for the confirmation from Bull R&D of the infos/details that Aurelien added ...

Comment by Bruno Faccini (Inactive) [ 23/Feb/12 ]

Concerning the scenario/conditions to trigger this issue we have no real nor interesting infos ...

The only thing we can imagine/explain is that each ptlrpcd thread when starting, and according to PDB_POLICY_NODE, has only one partner choosen as "next core in the same NUMA node" but unfortunatelly this 2nd guy startup/init was not completed when the 1st guy tried to access the 2nd one's private data and to protect itself via the other's ptlrpcd_ctl.pc_lock which was found un-initialized, hence the dead-lock !!!

As a possible fix, may be some kind of a "barrier" (with some additional+specific "started" flag) should be implemented to synchronize/ensure all ptlrpcd threads/partners start in ptlrpcd_init() ??...

Comment by Gregoire Pichon [ 27/Feb/12 ]

The PDB_POLICY_NODE is a ptlrpcd binding policy implemented by Bull that is NUMA aware.
I have uploaded the patch into Gerrit http://review.whamcloud.com/2212 so the source code is available and can be integrated in the master.

Note that there was a bug in the version running on the customer cluster.

"lustre/ptlrpc/ptlrpcd.c line 600"
for (i = index+1;
     i != index;
     i = (i+1)%max) {

which has been fixed in the uploaded patched:

for (i = (index+1)%max;
     i != index;
     i = (i+1)%max) {
Comment by Peter Jones [ 27/Feb/12 ]

Gregoire

Could you please resubmit your patch with LU-1114 as the identified ticket number? We are trying to ensure that all landings to master have public LU tickets so that interested parties can read the full details relating to the change

Thanks

Peter

Comment by Gregoire Pichon [ 27/Feb/12 ]

I have created a separate ticket LU-1144 to track the implementation of the ptlrpcd binding policy PDB_POLICY_NODE.

Comment by Gregoire Pichon [ 29/Feb/12 ]

I have uploaded a patch that ensures partner thread control structure is accessed only when it is completely initialized: http://review.whamcloud.com/2227.

Comment by nasf (Inactive) [ 01/Mar/12 ]

The failure should not happend on original Lustre-2.2 branch if without other bind mode introduced. Because in original implementation, there is order control, when the partnership is established, private data for the partner should has been initialized already. For example: for the default bind mode "PDB_POLICY_PAIR", "<0,1>" are partners for each other, the partnership between "0" and "1" are established after "1"'s private data initialized. And according to ptlrpcd threads starting order, "0" should started before "1". So it can guarantee that: when <0,1> pair is established, both "0"'s and "1"'s private are ready.

        pc->pc_index = index;
        cfs_init_completion(&pc->pc_starting);
        cfs_init_completion(&pc->pc_finishing);
        cfs_spin_lock_init(&pc->pc_lock);
        strncpy(pc->pc_name, name, sizeof(pc->pc_name) - 1);
        pc->pc_set = ptlrpc_prep_set();
        if (pc->pc_set == NULL)
                GOTO(out, rc = -ENOMEM);
        /*
         * So far only "client" ptlrpcd uses an environment. In the future,
         * ptlrpcd thread (or a thread-set) has to be given an argument,
         * describing its "scope".
         */
        rc = lu_context_init(&pc->pc_env.le_ctx, LCT_CL_THREAD|LCT_REMEMBER);
        if (rc != 0)
                GOTO(out, rc);

        env = 1;
#ifdef __KERNEL__
        if (index >= 0) {
/* XXX: When "1" comes here, "1"'s private data has been initialized, "0" is ready before "1" started. So here, we can establish the partnership between "0" and "1".*/
                rc = ptlrpcd_bind(index, max);
                if (rc < 0)
                        GOTO(out, rc);
        }

Comment by nasf (Inactive) [ 01/Mar/12 ]

So consider my above comment, what's your thought? If it is introduced by the new bind mode "PDB_POLICY_NODE", do you think whether it is better to fix the issue inside such "PDB_POLICY_NODE" implementation patch?

Comment by Gregoire Pichon [ 02/Mar/12 ]

Thanks for looking.
You are right the problem is specific to the PDB_POLICY_NODE which currently does not take care of establishing partnership only with ptlrpcd threads that are initialized.

Therefore, I think this ticket can be closed (or marked duplicate of LU-1144) and I will take this into account in the new version of the PDB_POLICY_NODE implementation I will post under LU-1144.

Comment by Peter Jones [ 02/Mar/12 ]

ok thanks Gregoire!

Generated at Sat Feb 10 01:13:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.