Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13807

LNet: Net delete dead lock deadlock

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      While using an LUTF script which deletes a net dynamically via

      lnetctl import --del config.yaml

      We ran into a dead lock.

      It appears like a local NI is in recovery. A message is about to get sent to this NI. We call lnet_nid2peerni_locked() which locks the mutex. However this mutex is held by the process trying to delete the net, leading to a dead lock.

      The code should first clean up the recovery queue. IE remove all instances of this NI from the recovery queue before deleting the NI.

      A similar processing is done in LNetNIFini()

      Attachments

        Issue Links

          Activity

            [LU-13807] LNet: Net delete dead lock deadlock
            ashehata Amir Shehata (Inactive) made changes -
            Link New: This issue is related to LU-12233 [ LU-12233 ]
            ashehata Amir Shehata (Inactive) created issue -

            People

              ssmirnov Serguei Smirnov
              ashehata Amir Shehata (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: