Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8420

unexpected? client eviction after bulk transfer timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      The following scenario leading to client's eviction has been observed in acceptance testing:

      1) client 1 owns PW lock on file A and sends write rpc to ost
      2) ost initiates a bulk transfer which gets lost somewhere in networks
      3) client 2 enqueues PR lock on file A
      4) the server sees the incompatible lock, sends blocking ast to client 1 and starts waiting until client 1 cancels the lock.
      5) bulk transfer timeouts, but client 1 does not get a reply in that case.

      int tgt_brw_write(struct tgt_session_info *tsi)
      ...
              rc = target_bulk_io(exp, desc, &lwi);
              no_reply = rc != 0;
      ...
      

      6) blocking ast callback timer expires and the server evicts client 1
      7) write rpc on client 1 times out, and client 1 finds itself evicted

      AT settings managed to make client's rpc timeout bigger than blast callback timeout.

      Attachments

        Activity

          People

            wc-triage WC Triage
            vsaveliev Vladimir Saveliev
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: