[LU-2624] Stop of ptlrpcd threads is long Created: 16/Jan/13 Updated: 23/Feb/13 Resolved: 23/Feb/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.5 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Gregoire Pichon | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre 2.1.3 |
||
| Attachments: |
|
| Severity: | 3 |
| Epic: | ptlrpc |
| Rank (Obsolete): | 6141 |
| Description |
|
The stop of ptlrpcd threads lasts more than one second per thread. On hardware with large number of cores this lead to a few minutes to unload ptlrpc module. # lscpu | grep "^CPU(s)" CPU(s): 32 # ps -ef | grep ptlrpcd root 7301 2 0 10:58 ? 00:00:00 [ptlrpcd_rcv] root 7302 2 0 10:58 ? 00:00:00 [ptlrpcd_0] root 7303 2 0 10:58 ? 00:00:00 [ptlrpcd_1] root 7304 2 0 10:58 ? 00:00:00 [ptlrpcd_2] root 7305 2 0 10:58 ? 00:00:00 [ptlrpcd_3] root 7306 2 0 10:58 ? 00:00:00 [ptlrpcd_4] root 7307 2 0 10:58 ? 00:00:00 [ptlrpcd_5] root 7308 2 0 10:58 ? 00:00:00 [ptlrpcd_6] root 7309 2 0 10:58 ? 00:00:00 [ptlrpcd_7] root 7310 2 0 10:58 ? 00:00:00 [ptlrpcd_8] root 7311 2 0 10:58 ? 00:00:00 [ptlrpcd_9] root 7312 2 0 10:58 ? 00:00:00 [ptlrpcd_10] root 7313 2 0 10:58 ? 00:00:00 [ptlrpcd_11] root 7314 2 0 10:58 ? 00:00:00 [ptlrpcd_12] root 7315 2 0 10:58 ? 00:00:00 [ptlrpcd_13] root 7316 2 0 10:58 ? 00:00:00 [ptlrpcd_14] root 7317 2 0 10:58 ? 00:00:00 [ptlrpcd_15] root 7318 2 0 10:58 ? 00:00:00 [ptlrpcd_16] root 7319 2 0 10:58 ? 00:00:00 [ptlrpcd_17] root 7320 2 0 10:58 ? 00:00:00 [ptlrpcd_18] root 7321 2 0 10:58 ? 00:00:00 [ptlrpcd_19] root 7322 2 0 10:58 ? 00:00:00 [ptlrpcd_20] root 7323 2 0 10:58 ? 00:00:00 [ptlrpcd_21] root 7324 2 0 10:58 ? 00:00:00 [ptlrpcd_22] root 7325 2 0 10:58 ? 00:00:00 [ptlrpcd_23] root 7326 2 0 10:58 ? 00:00:00 [ptlrpcd_24] root 7327 2 0 10:58 ? 00:00:00 [ptlrpcd_25] root 7328 2 0 10:58 ? 00:00:00 [ptlrpcd_26] root 7329 2 0 10:58 ? 00:00:00 [ptlrpcd_27] root 7330 2 0 10:58 ? 00:00:00 [ptlrpcd_28] root 7331 2 0 10:58 ? 00:00:00 [ptlrpcd_29] root 7332 2 0 10:58 ? 00:00:00 [ptlrpcd_30] root 7333 2 0 10:58 ? 00:00:00 [ptlrpcd_31] # time modprobe -r ptlrpc real 1m7.204s user 0m0.000s sys 0m0.030s |
| Comments |
| Comment by Gregoire Pichon [ 16/Jan/13 ] |
|
I am going to post a fix. With the fix, stop of ptlrpcd threads is dramatically reduced: # time modprobe -r ptlrpc real 0m6.675s user 0m0.000s sys 0m0.030s |
| Comment by Gregoire Pichon [ 16/Jan/13 ] |
| Comment by Bruno Faccini (Inactive) [ 16/Jan/13 ] |
|
Hello Gregoire !! |
| Comment by Andreas Dilger [ 16/Jan/13 ] |
|
Do you have any idea why it takes do long to stop each thread? This might also be a sign of something else wrong (e.g. if there are structures being kept around to long and needing to be freed at the end). |
| Comment by Gregoire Pichon [ 17/Jan/13 ] |
|
I don't think there is something wrong. In ptlrpcd() routine, the ptlrpcd thread loops waiting for work to do. The wait condition has a timeout of 1 second when the set is idle. When the thread is notified to stop, it performs one more loop before exiting. That explains why it lasts at least one second for each thread to stop. do { struct l_wait_info lwi; int timeout; rc = lu_env_refill(&env); if (rc != 0) { /* * XXX This is very awkward situation, because * execution can neither continue (request * interpreters assume that env is set up), nor repeat * the loop (as this potentially results in a tight * loop of -ENOMEM's). * * Fortunately, refill only ever does something when * new modules are loaded, i.e., early during boot up. */ CERROR("Failure to refill session: %d\n", rc); continue; } timeout = ptlrpc_set_next_timeout(set); lwi = LWI_TIMEOUT(cfs_time_seconds(timeout ? timeout : 1), ptlrpc_expired_set, set); lu_context_enter(&env.le_ctx); l_wait_event(set->set_waitq, ptlrpcd_check(&env, pc), &lwi); lu_context_exit(&env.le_ctx); /* * Abort inflight rpcs for forced stop case. */ if (cfs_test_bit(LIOD_STOP, &pc->pc_flags)) { if (cfs_test_bit(LIOD_FORCE, &pc->pc_flags)) ptlrpc_abort_set(set); exit++; } /* * Let's make one more loop to make sure that ptlrpcd_check() * copied all raced new rpcs into the set so we can kill them. */ } while (exit < 2); |
| Comment by Bruno Faccini (Inactive) [ 17/Jan/13 ] |
|
Hello Gregoire, I agree with Andreas, there must be something else to explain the >1mn for the rmmod. BTW, each or at least groups of the ptlrpcd thread would execute in parallel on multiple Cores (depending on their scheduling policy) thus the timing you get would be the max of all execution-sets which looks too long for me ... Can you reproduce the problem and ensure that you enabled the full debug-traces before and also you delimited the rmmod timing period with BEGIN/END markers ?? A ps output showing the ptlrpcd pids would be helpful too. |
| Comment by Gregoire Pichon [ 17/Jan/13 ] |
|
The stop notification is sent to ptlrpcd threads one after the other. There is no concept of group of ptlrpcd thread nor execution-sets. Here are the debug traces you requested. root 26857 2 0 16:44 ? 00:00:00 [ptlrpcd_rcv] root 26858 2 0 16:44 ? 00:00:00 [ptlrpcd_0] root 26859 2 0 16:44 ? 00:00:00 [ptlrpcd_1] root 26860 2 0 16:44 ? 00:00:00 [ptlrpcd_2] root 26861 2 0 16:44 ? 00:00:00 [ptlrpcd_3] root 26862 2 0 16:44 ? 00:00:00 [ptlrpcd_4] root 26863 2 0 16:44 ? 00:00:00 [ptlrpcd_5] root 26864 2 0 16:44 ? 00:00:00 [ptlrpcd_6] root 26865 2 0 16:44 ? 00:00:00 [ptlrpcd_7] root 26866 2 0 16:44 ? 00:00:00 [ptlrpcd_8] root 26867 2 0 16:44 ? 00:00:00 [ptlrpcd_9] root 26868 2 0 16:44 ? 00:00:00 [ptlrpcd_10] root 26869 2 0 16:44 ? 00:00:00 [ptlrpcd_11] root 26870 2 0 16:44 ? 00:00:00 [ptlrpcd_12] root 26871 2 0 16:44 ? 00:00:00 [ptlrpcd_13] root 26872 2 0 16:44 ? 00:00:00 [ptlrpcd_14] root 26873 2 0 16:44 ? 00:00:00 [ptlrpcd_15] root 26874 2 0 16:44 ? 00:00:00 [ptlrpcd_16] root 26875 2 0 16:44 ? 00:00:00 [ptlrpcd_17] root 26876 2 0 16:44 ? 00:00:00 [ptlrpcd_18] root 26877 2 0 16:44 ? 00:00:00 [ptlrpcd_19] root 26878 2 0 16:44 ? 00:00:00 [ptlrpcd_20] root 26879 2 0 16:44 ? 00:00:00 [ptlrpcd_21] root 26880 2 0 16:44 ? 00:00:00 [ptlrpcd_22] root 26881 2 0 16:44 ? 00:00:00 [ptlrpcd_23] root 26882 2 0 16:44 ? 00:00:00 [ptlrpcd_24] root 26883 2 0 16:44 ? 00:00:00 [ptlrpcd_25] root 26884 2 0 16:44 ? 00:00:00 [ptlrpcd_26] root 26885 2 0 16:44 ? 00:00:00 [ptlrpcd_27] root 26886 2 0 16:44 ? 00:00:00 [ptlrpcd_28] root 26887 2 0 16:44 ? 00:00:00 [ptlrpcd_29] root 26888 2 0 16:44 ? 00:00:00 [ptlrpcd_30] root 26889 2 0 16:44 ? 00:00:00 [ptlrpcd_31] file in attachment: lctl.dk.lu2624 |
| Comment by Bruno Faccini (Inactive) [ 17/Jan/13 ] |
|
Hum sure, I was wrong assuming stop of ptlrpcds could occur in parallel ! Don't know why I thought we were ahead in the way they can operate more independently ... With "execution-sets" I was only referring to the different ptlrpcd scheduling policies which can either allow a single or end-up with a group of threads to execute on a single core ... But this has no importance when they execute this part of the code one at a time as you highlighted ! Thank's for the debug log. |
| Comment by Bruno Faccini (Inactive) [ 21/Jan/13 ] |
|
Ok, full debug-trace confirms that with current algorithm and depending on receipt of the wake-up signals for each thread to stop or outstanding requests, full stop of all ptlrpcd threads will take about the sum of 2s per-thread for an idle situation. So, I agree with you, to separate the asynchronous signal send to each thread and the wait on completion will parallelize speed-up the ptlrpcd stopping process. |
| Comment by Peter Jones [ 23/Feb/13 ] |
|
Landed for 2.4 |