Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5225

Client is evicted by multiple OSTs on all OSSs

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 14559

    Description

      As part of LU-2827 patch intensive testing, J.Simmons encountered a new issue when running with patch on top of latest/current 2.5.59/master version.

      Having a look to the infos provided (ftp.whamcloud.com/uploads/LU-4584/20140609-run1.tbz and 20140609-run2.tbz), it appears that at some point of time, Client's RPCs are not sent anymore. This mainly causes Client's locks cancel answers to Server's/OSTs blocking ASTs requests not to be sent and further evictions.

      The reason why Client's RPCs are not sent anymore can not be found using only the Lustre debug log level (dlmtrace) on Client, but I can see during Client's eviction process/handling these RPCs were on the delayed queue.

      Attachments

        Issue Links

          Activity

            [LU-5225] Client is evicted by multiple OSTs on all OSSs

            Dup of LU-4861, as proven by on-site testing.

            bfaccini Bruno Faccini (Inactive) added a comment - Dup of LU-4861 , as proven by on-site testing.

            Yes this is still working for me. You can close it. If I have any problems in next weeks test shot I will open another ticket.

            simmonsja James A Simmons added a comment - Yes this is still working for me. You can close it. If I have any problems in next weeks test shot I will open another ticket.

            James, is this still working for you ?
            If yes, do you agree if we close it as a dup of LU-4861 ?

            bfaccini Bruno Faccini (Inactive) added a comment - James, is this still working for you ? If yes, do you agree if we close it as a dup of LU-4861 ?

            > 2) The patch deals with a deadlock issue would can explain why I saw evictions.
            Hummm yes, that could be where your flair has made the difference, because LU-4861 only reports an application hang due to this dead-lock but no Client evictions ...

            bfaccini Bruno Faccini (Inactive) added a comment - > 2) The patch deals with a deadlock issue would can explain why I saw evictions. Hummm yes, that could be where your flair has made the difference, because LU-4861 only reports an application hang due to this dead-lock but no Client evictions ...

            It was the testing with 2..5.60 clients. When I updated the clients to a newer version and could not reproduce the problem I figured some patch that landed fixed the problem. So I examined the list of merged patches since the broken client. The only one that made sense was LU-4861 since it

            1) Since I was seeing evictions from the OST it makes since a possible source of the problem could be the osc layer.

            2) The patch deals with a deadlock issue would can explain why I saw evictions.

            It seems I'm familiar enough with the code to make a good enough educated guess what will fix my problems

            I have been testing the LU-4861 patch with 2.5.2 clients with excellent success so far.

            simmonsja James A Simmons added a comment - It was the testing with 2..5.60 clients. When I updated the clients to a newer version and could not reproduce the problem I figured some patch that landed fixed the problem. So I examined the list of merged patches since the broken client. The only one that made sense was LU-4861 since it 1) Since I was seeing evictions from the OST it makes since a possible source of the problem could be the osc layer. 2) The patch deals with a deadlock issue would can explain why I saw evictions. It seems I'm familiar enough with the code to make a good enough educated guess what will fix my problems I have been testing the LU-4861 patch with 2.5.2 clients with excellent success so far.

            James,
            Thanks working+helping so hard on this, but I have an additional question, what made you point to LU-4861 patch as a possible fix ?

            bfaccini Bruno Faccini (Inactive) added a comment - James, Thanks working+helping so hard on this, but I have an additional question, what made you point to LU-4861 patch as a possible fix ?

            So far the results on my small scale system are very promising using the patch from LU-4861. If all goes well I will move it to the next scale system. If that works then we will use it in our test shot for Tuesday.

            simmonsja James A Simmons added a comment - So far the results on my small scale system are very promising using the patch from LU-4861 . If all goes well I will move it to the next scale system. If that works then we will use it in our test shot for Tuesday.

            People

              bfaccini Bruno Faccini (Inactive)
              bfaccini Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: