Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.5
    • None
    • lustre 2.1.3
      bullxlinux 6.1 (based on redhat 6.1)
      machine with 32 CPUs
    • 3
    • 6141

    Description

      The stop of ptlrpcd threads lasts more than one second per thread. On hardware with large number of cores this lead to a few minutes to unload ptlrpc module.

      # lscpu | grep "^CPU(s)"
      CPU(s):                32
      # ps -ef | grep ptlrpcd
      root      7301     2  0 10:58 ?        00:00:00 [ptlrpcd_rcv]
      root      7302     2  0 10:58 ?        00:00:00 [ptlrpcd_0]
      root      7303     2  0 10:58 ?        00:00:00 [ptlrpcd_1]
      root      7304     2  0 10:58 ?        00:00:00 [ptlrpcd_2]
      root      7305     2  0 10:58 ?        00:00:00 [ptlrpcd_3]
      root      7306     2  0 10:58 ?        00:00:00 [ptlrpcd_4]
      root      7307     2  0 10:58 ?        00:00:00 [ptlrpcd_5]
      root      7308     2  0 10:58 ?        00:00:00 [ptlrpcd_6]
      root      7309     2  0 10:58 ?        00:00:00 [ptlrpcd_7]
      root      7310     2  0 10:58 ?        00:00:00 [ptlrpcd_8]
      root      7311     2  0 10:58 ?        00:00:00 [ptlrpcd_9]
      root      7312     2  0 10:58 ?        00:00:00 [ptlrpcd_10]
      root      7313     2  0 10:58 ?        00:00:00 [ptlrpcd_11]
      root      7314     2  0 10:58 ?        00:00:00 [ptlrpcd_12]
      root      7315     2  0 10:58 ?        00:00:00 [ptlrpcd_13]
      root      7316     2  0 10:58 ?        00:00:00 [ptlrpcd_14]
      root      7317     2  0 10:58 ?        00:00:00 [ptlrpcd_15]
      root      7318     2  0 10:58 ?        00:00:00 [ptlrpcd_16]
      root      7319     2  0 10:58 ?        00:00:00 [ptlrpcd_17]
      root      7320     2  0 10:58 ?        00:00:00 [ptlrpcd_18]
      root      7321     2  0 10:58 ?        00:00:00 [ptlrpcd_19]
      root      7322     2  0 10:58 ?        00:00:00 [ptlrpcd_20]
      root      7323     2  0 10:58 ?        00:00:00 [ptlrpcd_21]
      root      7324     2  0 10:58 ?        00:00:00 [ptlrpcd_22]
      root      7325     2  0 10:58 ?        00:00:00 [ptlrpcd_23]
      root      7326     2  0 10:58 ?        00:00:00 [ptlrpcd_24]
      root      7327     2  0 10:58 ?        00:00:00 [ptlrpcd_25]
      root      7328     2  0 10:58 ?        00:00:00 [ptlrpcd_26]
      root      7329     2  0 10:58 ?        00:00:00 [ptlrpcd_27]
      root      7330     2  0 10:58 ?        00:00:00 [ptlrpcd_28]
      root      7331     2  0 10:58 ?        00:00:00 [ptlrpcd_29]
      root      7332     2  0 10:58 ?        00:00:00 [ptlrpcd_30]
      root      7333     2  0 10:58 ?        00:00:00 [ptlrpcd_31]
      # time modprobe -r ptlrpc
      real	1m7.204s
      user	0m0.000s
      sys	0m0.030s
      

      Attachments

        Activity

          [LU-2624] Stop of ptlrpcd threads is long
          pjones Peter Jones added a comment -

          Landed for 2.4

          pjones Peter Jones added a comment - Landed for 2.4

          Ok, full debug-trace confirms that with current algorithm and depending on receipt of the wake-up signals for each thread to stop or outstanding requests, full stop of all ptlrpcd threads will take about the sum of 2s per-thread for an idle situation.

          So, I agree with you, to separate the asynchronous signal send to each thread and the wait on completion will parallelize speed-up the ptlrpcd stopping process.

          bfaccini Bruno Faccini (Inactive) added a comment - Ok, full debug-trace confirms that with current algorithm and depending on receipt of the wake-up signals for each thread to stop or outstanding requests, full stop of all ptlrpcd threads will take about the sum of 2s per-thread for an idle situation. So, I agree with you, to separate the asynchronous signal send to each thread and the wait on completion will parallelize speed-up the ptlrpcd stopping process.

          Hum sure, I was wrong assuming stop of ptlrpcds could occur in parallel ! Don't know why I thought we were ahead in the way they can operate more independently ...

          With "execution-sets" I was only referring to the different ptlrpcd scheduling policies which can either allow a single or end-up with a group of threads to execute on a single core ... But this has no importance when they execute this part of the code one at a time as you highlighted !

          Thank's for the debug log.

          bfaccini Bruno Faccini (Inactive) added a comment - Hum sure, I was wrong assuming stop of ptlrpcds could occur in parallel ! Don't know why I thought we were ahead in the way they can operate more independently ... With "execution-sets" I was only referring to the different ptlrpcd scheduling policies which can either allow a single or end-up with a group of threads to execute on a single core ... But this has no importance when they execute this part of the code one at a time as you highlighted ! Thank's for the debug log.

          The stop notification is sent to ptlrpcd threads one after the other. There is no concept of group of ptlrpcd thread nor execution-sets.

          Here are the debug traces you requested.

          root     26857     2  0 16:44 ?        00:00:00 [ptlrpcd_rcv]
          root     26858     2  0 16:44 ?        00:00:00 [ptlrpcd_0]
          root     26859     2  0 16:44 ?        00:00:00 [ptlrpcd_1]
          root     26860     2  0 16:44 ?        00:00:00 [ptlrpcd_2]
          root     26861     2  0 16:44 ?        00:00:00 [ptlrpcd_3]
          root     26862     2  0 16:44 ?        00:00:00 [ptlrpcd_4]
          root     26863     2  0 16:44 ?        00:00:00 [ptlrpcd_5]
          root     26864     2  0 16:44 ?        00:00:00 [ptlrpcd_6]
          root     26865     2  0 16:44 ?        00:00:00 [ptlrpcd_7]
          root     26866     2  0 16:44 ?        00:00:00 [ptlrpcd_8]
          root     26867     2  0 16:44 ?        00:00:00 [ptlrpcd_9]
          root     26868     2  0 16:44 ?        00:00:00 [ptlrpcd_10]
          root     26869     2  0 16:44 ?        00:00:00 [ptlrpcd_11]
          root     26870     2  0 16:44 ?        00:00:00 [ptlrpcd_12]
          root     26871     2  0 16:44 ?        00:00:00 [ptlrpcd_13]
          root     26872     2  0 16:44 ?        00:00:00 [ptlrpcd_14]
          root     26873     2  0 16:44 ?        00:00:00 [ptlrpcd_15]
          root     26874     2  0 16:44 ?        00:00:00 [ptlrpcd_16]
          root     26875     2  0 16:44 ?        00:00:00 [ptlrpcd_17]
          root     26876     2  0 16:44 ?        00:00:00 [ptlrpcd_18]
          root     26877     2  0 16:44 ?        00:00:00 [ptlrpcd_19]
          root     26878     2  0 16:44 ?        00:00:00 [ptlrpcd_20]
          root     26879     2  0 16:44 ?        00:00:00 [ptlrpcd_21]
          root     26880     2  0 16:44 ?        00:00:00 [ptlrpcd_22]
          root     26881     2  0 16:44 ?        00:00:00 [ptlrpcd_23]
          root     26882     2  0 16:44 ?        00:00:00 [ptlrpcd_24]
          root     26883     2  0 16:44 ?        00:00:00 [ptlrpcd_25]
          root     26884     2  0 16:44 ?        00:00:00 [ptlrpcd_26]
          root     26885     2  0 16:44 ?        00:00:00 [ptlrpcd_27]
          root     26886     2  0 16:44 ?        00:00:00 [ptlrpcd_28]
          root     26887     2  0 16:44 ?        00:00:00 [ptlrpcd_29]
          root     26888     2  0 16:44 ?        00:00:00 [ptlrpcd_30]
          root     26889     2  0 16:44 ?        00:00:00 [ptlrpcd_31]
          

          file in attachment: lctl.dk.lu2624

          pichong Gregoire Pichon added a comment - The stop notification is sent to ptlrpcd threads one after the other. There is no concept of group of ptlrpcd thread nor execution-sets. Here are the debug traces you requested. root 26857 2 0 16:44 ? 00:00:00 [ptlrpcd_rcv] root 26858 2 0 16:44 ? 00:00:00 [ptlrpcd_0] root 26859 2 0 16:44 ? 00:00:00 [ptlrpcd_1] root 26860 2 0 16:44 ? 00:00:00 [ptlrpcd_2] root 26861 2 0 16:44 ? 00:00:00 [ptlrpcd_3] root 26862 2 0 16:44 ? 00:00:00 [ptlrpcd_4] root 26863 2 0 16:44 ? 00:00:00 [ptlrpcd_5] root 26864 2 0 16:44 ? 00:00:00 [ptlrpcd_6] root 26865 2 0 16:44 ? 00:00:00 [ptlrpcd_7] root 26866 2 0 16:44 ? 00:00:00 [ptlrpcd_8] root 26867 2 0 16:44 ? 00:00:00 [ptlrpcd_9] root 26868 2 0 16:44 ? 00:00:00 [ptlrpcd_10] root 26869 2 0 16:44 ? 00:00:00 [ptlrpcd_11] root 26870 2 0 16:44 ? 00:00:00 [ptlrpcd_12] root 26871 2 0 16:44 ? 00:00:00 [ptlrpcd_13] root 26872 2 0 16:44 ? 00:00:00 [ptlrpcd_14] root 26873 2 0 16:44 ? 00:00:00 [ptlrpcd_15] root 26874 2 0 16:44 ? 00:00:00 [ptlrpcd_16] root 26875 2 0 16:44 ? 00:00:00 [ptlrpcd_17] root 26876 2 0 16:44 ? 00:00:00 [ptlrpcd_18] root 26877 2 0 16:44 ? 00:00:00 [ptlrpcd_19] root 26878 2 0 16:44 ? 00:00:00 [ptlrpcd_20] root 26879 2 0 16:44 ? 00:00:00 [ptlrpcd_21] root 26880 2 0 16:44 ? 00:00:00 [ptlrpcd_22] root 26881 2 0 16:44 ? 00:00:00 [ptlrpcd_23] root 26882 2 0 16:44 ? 00:00:00 [ptlrpcd_24] root 26883 2 0 16:44 ? 00:00:00 [ptlrpcd_25] root 26884 2 0 16:44 ? 00:00:00 [ptlrpcd_26] root 26885 2 0 16:44 ? 00:00:00 [ptlrpcd_27] root 26886 2 0 16:44 ? 00:00:00 [ptlrpcd_28] root 26887 2 0 16:44 ? 00:00:00 [ptlrpcd_29] root 26888 2 0 16:44 ? 00:00:00 [ptlrpcd_30] root 26889 2 0 16:44 ? 00:00:00 [ptlrpcd_31] file in attachment: lctl.dk.lu2624
          bfaccini Bruno Faccini (Inactive) added a comment - - edited

          Hello Gregoire, I agree with Andreas, there must be something else to explain the >1mn for the rmmod.

          BTW, each or at least groups of the ptlrpcd thread would execute in parallel on multiple Cores (depending on their scheduling policy) thus the timing you get would be the max of all execution-sets which looks too long for me ...

          Can you reproduce the problem and ensure that you enabled the full debug-traces before and also you delimited the rmmod timing period with BEGIN/END markers ??

          A ps output showing the ptlrpcd pids would be helpful too.

          bfaccini Bruno Faccini (Inactive) added a comment - - edited Hello Gregoire, I agree with Andreas, there must be something else to explain the >1mn for the rmmod. BTW, each or at least groups of the ptlrpcd thread would execute in parallel on multiple Cores (depending on their scheduling policy) thus the timing you get would be the max of all execution-sets which looks too long for me ... Can you reproduce the problem and ensure that you enabled the full debug-traces before and also you delimited the rmmod timing period with BEGIN/END markers ?? A ps output showing the ptlrpcd pids would be helpful too.

          I don't think there is something wrong.

          In ptlrpcd() routine, the ptlrpcd thread loops waiting for work to do. The wait condition has a timeout of 1 second when the set is idle. When the thread is notified to stop, it performs one more loop before exiting. That explains why it lasts at least one second for each thread to stop.

                  do {
                          struct l_wait_info lwi;
                          int timeout;
          
                          rc = lu_env_refill(&env);
                          if (rc != 0) {
                                  /*
                                   * XXX This is very awkward situation, because
                                   * execution can neither continue (request
                                   * interpreters assume that env is set up), nor repeat
                                   * the loop (as this potentially results in a tight
                                   * loop of -ENOMEM's).
                                   *
                                   * Fortunately, refill only ever does something when
                                   * new modules are loaded, i.e., early during boot up.
                                   */
                                  CERROR("Failure to refill session: %d\n", rc);
                                  continue;
                          }
          
                          timeout = ptlrpc_set_next_timeout(set);
                          lwi = LWI_TIMEOUT(cfs_time_seconds(timeout ? timeout : 1),
                                            ptlrpc_expired_set, set);
          
                          lu_context_enter(&env.le_ctx);
                          l_wait_event(set->set_waitq,
                                       ptlrpcd_check(&env, pc), &lwi);
                          lu_context_exit(&env.le_ctx);
          
                          /*
                           * Abort inflight rpcs for forced stop case.
                           */
                          if (cfs_test_bit(LIOD_STOP, &pc->pc_flags)) {
                                  if (cfs_test_bit(LIOD_FORCE, &pc->pc_flags))
                                          ptlrpc_abort_set(set);
                                  exit++;
                          }
          
                          /*
                           * Let's make one more loop to make sure that ptlrpcd_check()
                           * copied all raced new rpcs into the set so we can kill them.
                           */
                  } while (exit < 2);
          
          pichong Gregoire Pichon added a comment - I don't think there is something wrong. In ptlrpcd() routine, the ptlrpcd thread loops waiting for work to do. The wait condition has a timeout of 1 second when the set is idle. When the thread is notified to stop, it performs one more loop before exiting. That explains why it lasts at least one second for each thread to stop. do { struct l_wait_info lwi; int timeout; rc = lu_env_refill(&env); if (rc != 0) { /* * XXX This is very awkward situation, because * execution can neither continue (request * interpreters assume that env is set up), nor repeat * the loop (as this potentially results in a tight * loop of -ENOMEM's). * * Fortunately, refill only ever does something when * new modules are loaded, i.e., early during boot up. */ CERROR( "Failure to refill session: %d\n" , rc); continue ; } timeout = ptlrpc_set_next_timeout(set); lwi = LWI_TIMEOUT(cfs_time_seconds(timeout ? timeout : 1), ptlrpc_expired_set, set); lu_context_enter(&env.le_ctx); l_wait_event(set->set_waitq, ptlrpcd_check(&env, pc), &lwi); lu_context_exit(&env.le_ctx); /* * Abort inflight rpcs for forced stop case . */ if (cfs_test_bit(LIOD_STOP, &pc->pc_flags)) { if (cfs_test_bit(LIOD_FORCE, &pc->pc_flags)) ptlrpc_abort_set(set); exit++; } /* * Let's make one more loop to make sure that ptlrpcd_check() * copied all raced new rpcs into the set so we can kill them. */ } while (exit < 2);

          Do you have any idea why it takes do long to stop each thread? This might also be a sign of something else wrong (e.g. if there are structures being kept around to long and needing to be freed at the end).

          adilger Andreas Dilger added a comment - Do you have any idea why it takes do long to stop each thread? This might also be a sign of something else wrong (e.g. if there are structures being kept around to long and needing to be freed at the end).

          Hello Gregoire !!
          Thank's for the report and the fix proposal already.
          Will review it and give you feedback soon.
          Bye.

          bfaccini Bruno Faccini (Inactive) added a comment - Hello Gregoire !! Thank's for the report and the fix proposal already. Will review it and give you feedback soon. Bye.
          pichong Gregoire Pichon added a comment - http://review.whamcloud.com/5039

          I am going to post a fix.

          With the fix, stop of ptlrpcd threads is dramatically reduced:

          # time modprobe -r ptlrpc
          
          real	0m6.675s
          user	0m0.000s
          sys	0m0.030s
          
          pichong Gregoire Pichon added a comment - I am going to post a fix. With the fix, stop of ptlrpcd threads is dramatically reduced: # time modprobe -r ptlrpc real 0m6.675s user 0m0.000s sys 0m0.030s

          People

            bfaccini Bruno Faccini (Inactive)
            pichong Gregoire Pichon
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: