Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5073

lustre don't able to unload modules in conf-sanity 31.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • any lustre code, tcp network (o2ib should be affected also).
    • 3
    • 14005

    Description

      LNet commit a MD immediately after LNetGet or LNetPut called, but LNet may don't have active communication to destination node. In that case ptlrpc may expire request before LNet will release an TX descriptor, but request can't be freed because lnet hold a reference for a request buffer.
      I found it (or similar issue) in 2008y and maxim was tried to fix it in commit

      commit f6cd596982ed4380e5547181022ad81e4c6d3512
      Author: maxim <maxim>
      Date:   Fri Sep 5 14:58:15 2008 +0000
      
          b=16308
          i=isaac
          i=liang
          Conf-sanity test_32a couldn't stop ost and mds because it
          tried to access non-existent peer and tcp connect took
          quite long before timing out. The patch flushes txs pinned
          to a peer even if it's still in "connecting" state.
      

      but it fix isn't complete and LNet still don't able to release TX.
      so we need some support on ptlrpc layer to solve it issue in async unlink way.

      Attachments

        Issue Links

          Activity

            People

              cliffw Cliff White (Inactive)
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: