[LU-2624] Stop of ptlrpcd threads is long Created: 16/Jan/13  Updated: 23/Feb/13  Resolved: 23/Feb/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.5
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Gregoire Pichon Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre 2.1.3
bullxlinux 6.1 (based on redhat 6.1)
machine with 32 CPUs


Attachments: File lctl.dk.lu2624    
Severity: 3
Epic: ptlrpc
Rank (Obsolete): 6141

 Description   

The stop of ptlrpcd threads lasts more than one second per thread. On hardware with large number of cores this lead to a few minutes to unload ptlrpc module.

# lscpu | grep "^CPU(s)"
CPU(s):                32
# ps -ef | grep ptlrpcd
root      7301     2  0 10:58 ?        00:00:00 [ptlrpcd_rcv]
root      7302     2  0 10:58 ?        00:00:00 [ptlrpcd_0]
root      7303     2  0 10:58 ?        00:00:00 [ptlrpcd_1]
root      7304     2  0 10:58 ?        00:00:00 [ptlrpcd_2]
root      7305     2  0 10:58 ?        00:00:00 [ptlrpcd_3]
root      7306     2  0 10:58 ?        00:00:00 [ptlrpcd_4]
root      7307     2  0 10:58 ?        00:00:00 [ptlrpcd_5]
root      7308     2  0 10:58 ?        00:00:00 [ptlrpcd_6]
root      7309     2  0 10:58 ?        00:00:00 [ptlrpcd_7]
root      7310     2  0 10:58 ?        00:00:00 [ptlrpcd_8]
root      7311     2  0 10:58 ?        00:00:00 [ptlrpcd_9]
root      7312     2  0 10:58 ?        00:00:00 [ptlrpcd_10]
root      7313     2  0 10:58 ?        00:00:00 [ptlrpcd_11]
root      7314     2  0 10:58 ?        00:00:00 [ptlrpcd_12]
root      7315     2  0 10:58 ?        00:00:00 [ptlrpcd_13]
root      7316     2  0 10:58 ?        00:00:00 [ptlrpcd_14]
root      7317     2  0 10:58 ?        00:00:00 [ptlrpcd_15]
root      7318     2  0 10:58 ?        00:00:00 [ptlrpcd_16]
root      7319     2  0 10:58 ?        00:00:00 [ptlrpcd_17]
root      7320     2  0 10:58 ?        00:00:00 [ptlrpcd_18]
root      7321     2  0 10:58 ?        00:00:00 [ptlrpcd_19]
root      7322     2  0 10:58 ?        00:00:00 [ptlrpcd_20]
root      7323     2  0 10:58 ?        00:00:00 [ptlrpcd_21]
root      7324     2  0 10:58 ?        00:00:00 [ptlrpcd_22]
root      7325     2  0 10:58 ?        00:00:00 [ptlrpcd_23]
root      7326     2  0 10:58 ?        00:00:00 [ptlrpcd_24]
root      7327     2  0 10:58 ?        00:00:00 [ptlrpcd_25]
root      7328     2  0 10:58 ?        00:00:00 [ptlrpcd_26]
root      7329     2  0 10:58 ?        00:00:00 [ptlrpcd_27]
root      7330     2  0 10:58 ?        00:00:00 [ptlrpcd_28]
root      7331     2  0 10:58 ?        00:00:00 [ptlrpcd_29]
root      7332     2  0 10:58 ?        00:00:00 [ptlrpcd_30]
root      7333     2  0 10:58 ?        00:00:00 [ptlrpcd_31]
# time modprobe -r ptlrpc
real	1m7.204s
user	0m0.000s
sys	0m0.030s


 Comments   
Comment by Gregoire Pichon [ 16/Jan/13 ]

I am going to post a fix.

With the fix, stop of ptlrpcd threads is dramatically reduced:

# time modprobe -r ptlrpc

real	0m6.675s
user	0m0.000s
sys	0m0.030s
Comment by Gregoire Pichon [ 16/Jan/13 ]

http://review.whamcloud.com/5039

Comment by Bruno Faccini (Inactive) [ 16/Jan/13 ]

Hello Gregoire !!
Thank's for the report and the fix proposal already.
Will review it and give you feedback soon.
Bye.

Comment by Andreas Dilger [ 16/Jan/13 ]

Do you have any idea why it takes do long to stop each thread? This might also be a sign of something else wrong (e.g. if there are structures being kept around to long and needing to be freed at the end).

Comment by Gregoire Pichon [ 17/Jan/13 ]

I don't think there is something wrong.

In ptlrpcd() routine, the ptlrpcd thread loops waiting for work to do. The wait condition has a timeout of 1 second when the set is idle. When the thread is notified to stop, it performs one more loop before exiting. That explains why it lasts at least one second for each thread to stop.

        do {
                struct l_wait_info lwi;
                int timeout;

                rc = lu_env_refill(&env);
                if (rc != 0) {
                        /*
                         * XXX This is very awkward situation, because
                         * execution can neither continue (request
                         * interpreters assume that env is set up), nor repeat
                         * the loop (as this potentially results in a tight
                         * loop of -ENOMEM's).
                         *
                         * Fortunately, refill only ever does something when
                         * new modules are loaded, i.e., early during boot up.
                         */
                        CERROR("Failure to refill session: %d\n", rc);
                        continue;
                }

                timeout = ptlrpc_set_next_timeout(set);
                lwi = LWI_TIMEOUT(cfs_time_seconds(timeout ? timeout : 1),
                                  ptlrpc_expired_set, set);

                lu_context_enter(&env.le_ctx);
                l_wait_event(set->set_waitq,
                             ptlrpcd_check(&env, pc), &lwi);
                lu_context_exit(&env.le_ctx);

                /*
                 * Abort inflight rpcs for forced stop case.
                 */
                if (cfs_test_bit(LIOD_STOP, &pc->pc_flags)) {
                        if (cfs_test_bit(LIOD_FORCE, &pc->pc_flags))
                                ptlrpc_abort_set(set);
                        exit++;
                }

                /*
                 * Let's make one more loop to make sure that ptlrpcd_check()
                 * copied all raced new rpcs into the set so we can kill them.
                 */
        } while (exit < 2);
Comment by Bruno Faccini (Inactive) [ 17/Jan/13 ]

Hello Gregoire, I agree with Andreas, there must be something else to explain the >1mn for the rmmod.

BTW, each or at least groups of the ptlrpcd thread would execute in parallel on multiple Cores (depending on their scheduling policy) thus the timing you get would be the max of all execution-sets which looks too long for me ...

Can you reproduce the problem and ensure that you enabled the full debug-traces before and also you delimited the rmmod timing period with BEGIN/END markers ??

A ps output showing the ptlrpcd pids would be helpful too.

Comment by Gregoire Pichon [ 17/Jan/13 ]

The stop notification is sent to ptlrpcd threads one after the other. There is no concept of group of ptlrpcd thread nor execution-sets.

Here are the debug traces you requested.

root     26857     2  0 16:44 ?        00:00:00 [ptlrpcd_rcv]
root     26858     2  0 16:44 ?        00:00:00 [ptlrpcd_0]
root     26859     2  0 16:44 ?        00:00:00 [ptlrpcd_1]
root     26860     2  0 16:44 ?        00:00:00 [ptlrpcd_2]
root     26861     2  0 16:44 ?        00:00:00 [ptlrpcd_3]
root     26862     2  0 16:44 ?        00:00:00 [ptlrpcd_4]
root     26863     2  0 16:44 ?        00:00:00 [ptlrpcd_5]
root     26864     2  0 16:44 ?        00:00:00 [ptlrpcd_6]
root     26865     2  0 16:44 ?        00:00:00 [ptlrpcd_7]
root     26866     2  0 16:44 ?        00:00:00 [ptlrpcd_8]
root     26867     2  0 16:44 ?        00:00:00 [ptlrpcd_9]
root     26868     2  0 16:44 ?        00:00:00 [ptlrpcd_10]
root     26869     2  0 16:44 ?        00:00:00 [ptlrpcd_11]
root     26870     2  0 16:44 ?        00:00:00 [ptlrpcd_12]
root     26871     2  0 16:44 ?        00:00:00 [ptlrpcd_13]
root     26872     2  0 16:44 ?        00:00:00 [ptlrpcd_14]
root     26873     2  0 16:44 ?        00:00:00 [ptlrpcd_15]
root     26874     2  0 16:44 ?        00:00:00 [ptlrpcd_16]
root     26875     2  0 16:44 ?        00:00:00 [ptlrpcd_17]
root     26876     2  0 16:44 ?        00:00:00 [ptlrpcd_18]
root     26877     2  0 16:44 ?        00:00:00 [ptlrpcd_19]
root     26878     2  0 16:44 ?        00:00:00 [ptlrpcd_20]
root     26879     2  0 16:44 ?        00:00:00 [ptlrpcd_21]
root     26880     2  0 16:44 ?        00:00:00 [ptlrpcd_22]
root     26881     2  0 16:44 ?        00:00:00 [ptlrpcd_23]
root     26882     2  0 16:44 ?        00:00:00 [ptlrpcd_24]
root     26883     2  0 16:44 ?        00:00:00 [ptlrpcd_25]
root     26884     2  0 16:44 ?        00:00:00 [ptlrpcd_26]
root     26885     2  0 16:44 ?        00:00:00 [ptlrpcd_27]
root     26886     2  0 16:44 ?        00:00:00 [ptlrpcd_28]
root     26887     2  0 16:44 ?        00:00:00 [ptlrpcd_29]
root     26888     2  0 16:44 ?        00:00:00 [ptlrpcd_30]
root     26889     2  0 16:44 ?        00:00:00 [ptlrpcd_31]

file in attachment: lctl.dk.lu2624

Comment by Bruno Faccini (Inactive) [ 17/Jan/13 ]

Hum sure, I was wrong assuming stop of ptlrpcds could occur in parallel ! Don't know why I thought we were ahead in the way they can operate more independently ...

With "execution-sets" I was only referring to the different ptlrpcd scheduling policies which can either allow a single or end-up with a group of threads to execute on a single core ... But this has no importance when they execute this part of the code one at a time as you highlighted !

Thank's for the debug log.

Comment by Bruno Faccini (Inactive) [ 21/Jan/13 ]

Ok, full debug-trace confirms that with current algorithm and depending on receipt of the wake-up signals for each thread to stop or outstanding requests, full stop of all ptlrpcd threads will take about the sum of 2s per-thread for an idle situation.

So, I agree with you, to separate the asynchronous signal send to each thread and the wait on completion will parallelize speed-up the ptlrpcd stopping process.

Comment by Peter Jones [ 23/Feb/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:26:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.