Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15068

Race between commit callback and reply_out_callback::LNET_EVENT_SEND

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When LNet is under load it is possible for messages to be queued while waiting for a peer TX credit or a network TX credit. When running benchmarks on a large scale system we observed clients hitting "slow reply" timeouts for MDS_REINT RPCs. Tracing revealed that the server received the MDS_REINT RPC and sent a reply to the client, but the reply was queued in LNet because there weren't any peer credits available.

      Shortly after, the commit callback was triggered which added the reply state to be handled via ptlrpc_commit_replies() -> rs_batch_add()

      void ptlrpc_commit_replies(struct obd_export *exp)
      {
      ...
                      if (rs->rs_transno <= exp->exp_last_committed) {
                              list_del_init(&rs->rs_obd_list);
                              rs_batch_add(&batch, rs);
                      } 
      

      The reply state MD handle then got unlinked by ptlrpc_handle_rs().

      static int
      ptlrpc_handle_rs(struct ptlrpc_reply_state *rs)
      {
      ...
              if ((!been_handled && rs->rs_on_net) || nlocks > 0) {
                      spin_unlock(&rs->rs_lock);
      
                      if (!been_handled && rs->rs_on_net) {
                              LNetMDUnlink(rs->rs_md_h);
      

      But the reply never left the server - it was always queued in LNet. Since the MD was unlinked, LNet aborted the send once a credit became available. Client eventually hit "timeout for slow reply" and this caused the client to reconnect.

      I'm able to readily reproduce the issue using a four node cluster where I have 1 MDS, 1 OSS and 2 clients.
      1. Run mdtest create
      2. Start LST in the background - I'm doing a simultaneous read and write session where MDS is in the "to" group and the OSS and 2 clients are in the "from" group - concurrency 64
      3. Run mdtest delete

      LST causes credit starvation during the mdtest delete phase, and so the replies are more readily queued in LNet as I described above.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: