Details

    • Question/Request
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.4
    • None
    • Rhel 8.1 clients, 7.7 servers
    • 9223372036854775807

    Description

      I took your advice from LU-13382 and went back to a 2.12.4-1 client with IB support, released 11th of feb this year.  What we are finding now is that the (rhel8.1) compute nodes randomly reboot frequently, leaving us with a rather unstable cluster and some rather unhappy users.

      We have done some testing and found that the issue appears to be caused by the change from radix_tree_exceptional_entry to xa_is_value as described in LU-13136.  We were able to patch the 2.12.4-1 client source and create a patched client that appears to solve the problem.

      However, we did have some issues getting this to compile and had to make some semi intelligent guesses to get it to work.  That leaves us less than confident that we haven't introduced other bugs that will be appear if we were to use our patched client version in production.

      Do you have a version of the 2.12.4 client with that patch applied and IB support that you can release to us?  Or a version even a version 2.12.5 client with IB support? Something that would give us a little more confidence than our current patched version.

      Thanks

      jon

      Attachments

        Issue Links

          Activity

            [LU-13438] Rhel8.1 / lustre-client 2.12.4-1

            Yes, thats the one.

            jon

            JonSy Jon Symon (Inactive) added a comment - Yes, thats the one. jon
            pjones Peter Jones added a comment -

            Jon

            Ok, so you mean LU-13163, which is already landed to b2_12 for 2.12.5 and the patch is here - https://review.whamcloud.com/37481/?

            Peter

            pjones Peter Jones added a comment - Jon Ok, so you mean LU-13163 , which is already landed to b2_12 for 2.12.5 and the patch is here - https://review.whamcloud.com/37481/? Peter

             yes, it was getting a little late last night.  Sorry about that, I should learn to wait for morning before I post things.

            Excuse my obvious ignorance at this point. I have had a look at the link you provided - it appears to be discussing the req_capsule_has_field() issue.  That is indeed something I raised a little while ago, and we are still seeing those messages.  However the immediate issue I am facing is lustre-client version 2.12.4-1 (with IB) and support for rhel8.1

            The patch I was referring to was a change in mdc_request.c, replacing  radix_tree_exceptional_entry with xa_is_value.  That appears to be what is causing at least some of the issues we are experiencing.  The version of the source rpms on the whamcloud download site date 11th Feb does not have that patch applied.

            My lack of experience at patching the source and then rebuilding the rpms is showing, in that I have spent some time last night and today trying to get that patch in place, without a great deal of success.

            Having said that, I could be barking up entirely the wrong tree, but one of my colleagues did manage to produce a lustre client with that patch installed, and it did seem to help.  Unfortunately, that client was lost and we don't seem to be able to produce it again!

            Hence my desire to find a lustre client (with IB) that is supported under rhel8.1

            Cheers

            jon

            JonSy Jon Symon (Inactive) added a comment -  yes, it was getting a little late last night.  Sorry about that, I should learn to wait for morning before I post things. Excuse my obvious ignorance at this point. I have had a look at the link you provided - it appears to be discussing the req_capsule_has_field() issue.  That is indeed something I raised a little while ago, and we are still seeing those messages.  However the immediate issue I am facing is lustre-client version 2.12.4-1 (with IB) and support for rhel8.1 The patch I was referring to was a change in mdc_request.c, replacing  radix_tree_exceptional_entry with xa_is_value.  That appears to be what is causing at least some of the issues we are experiencing.  The version of the source rpms on the whamcloud download site date 11th Feb does not have that patch applied. My lack of experience at patching the source and then rebuilding the rpms is showing, in that I have spent some time last night and today trying to get that patch in place, without a great deal of success. Having said that, I could be barking up entirely the wrong tree, but one of my colleagues did manage to produce a lustre client with that patch installed, and it did seem to help.  Unfortunately, that client was lost and we don't seem to be able to produce it again! Hence my desire to find a lustre client (with IB) that is supported under rhel8.1 Cheers jon
            pjones Peter Jones added a comment -

            Jon

            The port to b2_12 is at https://review.whamcloud.com/38188

            Peter

            PS/ It took a while to piece together what you were talking about with all the typos

            pjones Peter Jones added a comment - Jon The port to b2_12 is at https://review.whamcloud.com/38188 Peter PS/ It took a while to piece together what you were talking about with all the typos
            JonSy Jon Symon (Inactive) added a comment - - edited

            apologies that should read 2.12.4-1 - its late and I've been looking at this for too long tonight!

            JonSy Jon Symon (Inactive) added a comment - - edited apologies that should read 2.12.4-1 - its late and I've been looking at this for too long tonight!

            People

              pjones Peter Jones
              JonSy Jon Symon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: