Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-218

deadlock (pagefault vs. blocking ast) in clio

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • 2.6.18-194.17.1.el5
    • 3
    • 10397

    Description

      The following deadlock was hit when I was running mmap tests:

      mmap test thread: pagefault -> lock page -> release dlm lock -> cancel dlm lock -> cl_lock_mutex_get;
      bl_ast handler thread: cancel dlm lock -> cl_lock_mutex_get -> flush pages -> lock page;

      And because of this deadlock, ll_imp_inval thread is blocked on cl_lock_mutex_get, so the client eviction can never be finished. I think it's the root cause of LU-180, but I'm not 100 percent sure because they didn't provide the stack trace to prove it yet.

      The stack trace is attached.

      Attachments

        Activity

          [LU-218] deadlock (pagefault vs. blocking ast) in clio

          Duplicated with LU-122.

          niu Niu Yawei (Inactive) added a comment - Duplicated with LU-122 .

          Right, I think it's same as LU-122, and LU-180 is also probably caused by this deadlock. Will mark it as duplicated bug.

          niu Niu Yawei (Inactive) added a comment - Right, I think it's same as LU-122 , and LU-180 is also probably caused by this deadlock. Will mark it as duplicated bug.
          green Oleg Drokin added a comment -

          I wonder if this is the same thing as LU-122?

          green Oleg Drokin added a comment - I wonder if this is the same thing as LU-122 ?

          Jay, could you take a look to see if it's a known bug or not? Thanks.

          niu Niu Yawei (Inactive) added a comment - Jay, could you take a look to see if it's a known bug or not? Thanks.

          People

            rread Robert Read
            niu Niu Yawei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: