Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • None
    • 8135

    Description

      Cray is preparing to submit gnilnd for upstream adoption. This ticket is for tracking that work.

      Attachments

        Activity

          [LU-1419] Tracking ticket for gnilnd push

          I have pushed a updated driver with Isaac suggestions. The driver appears to work pretty well. The only thing I have observed with the driver is that on server node after LNET is brought up it can't ping the MGS until after I first ping the routers. Issac have any idea what could be causing that?

          simmonsja James A Simmons added a comment - I have pushed a updated driver with Isaac suggestions. The driver appears to work pretty well. The only thing I have observed with the driver is that on server node after LNET is brought up it can't ping the MGS until after I first ping the routers. Issac have any idea what could be causing that?

          Hi Cory.

          I have been regularly testing master on our Gemini test bed so I have a motivation to keep this driver working If Cray doesn't mind I have no issue with keeping this driver in sync. Please look it over. The error code you can inject have changed and I just guessed what the values are so if you attempt to regression test internally this latest patch your test will most likely fail. So we need to sync up on that. Also from your earlier patch some of the Whamcloud engineers had concerns about the certain parts of the code. I didn't want to change those parts of the code without a serious inspection from the cray engineer working on this LNET driver as well. Any changes will be welcomed for testing.

          simmonsja James A Simmons added a comment - Hi Cory. I have been regularly testing master on our Gemini test bed so I have a motivation to keep this driver working If Cray doesn't mind I have no issue with keeping this driver in sync. Please look it over. The error code you can inject have changed and I just guessed what the values are so if you attempt to regression test internally this latest patch your test will most likely fail. So we need to sync up on that. Also from your earlier patch some of the Whamcloud engineers had concerns about the certain parts of the code. I didn't want to change those parts of the code without a serious inspection from the cray engineer working on this LNET driver as well. Any changes will be welcomed for testing.
          spitzcor Cory Spitz added a comment -

          Doug, good question. I don't know exactly, it likely won't be an instantaneous sync. However, Cray won't make code drops to match shipping versions. We intend to keep to the 'master' model. (The version submitted here is already "ahead" of our released versions). Cray will push up changes on as a regular basis as we can manage. Surely, it would be nice to stay current with 'master'. For other contributions, I suggest that we handle those as any other: through Gerrit with community review.

          spitzcor Cory Spitz added a comment - Doug, good question. I don't know exactly, it likely won't be an instantaneous sync. However, Cray won't make code drops to match shipping versions. We intend to keep to the 'master' model. (The version submitted here is already "ahead" of our released versions). Cray will push up changes on as a regular basis as we can manage. Surely, it would be nice to stay current with 'master'. For other contributions, I suggest that we handle those as any other: through Gerrit with community review.

          Question: What will be Cray's policy for keeping the gnilnd code synchronized between the Lustre main repository and the Cray repository? Will you be doing code drops to match shipping versions? How will changes made by non-Cray community members be handled?

          doug Doug Oucharek (Inactive) added a comment - Question: What will be Cray's policy for keeping the gnilnd code synchronized between the Lustre main repository and the Cray repository? Will you be doing code drops to match shipping versions? How will changes made by non-Cray community members be handled?
          hornc Chris Horn added a comment - http://review.whamcloud.com/#change,3381

          Attached to ticket since JIRA managed my script.

          simmonsja James A Simmons added a comment - Attached to ticket since JIRA managed my script.

          Here is a handy script I used for re-tabbing.After you commit your code just run it and then push it upstream.
          .
          #!/bin/bash -e
          #

          1. Rewrite the last commit to remove any trailing whitespace
          2. in the new version of changed lines.
          3. Then replace space-based indentation with TAB based indentation
          4. based on TABS at every eight position
            #
            [[ -z $TRACE ]] || set -x
            trap "rm -f $tmpf" 0
            tmpf1=$TMP/$$.1.diff
            tmpf2=$TMP/$$.2.diff
            git show --binary >$tmpf1
            perl -p -e 's/(+.?)[ \t]+$/$1/; while(m/(+\t)( {1,7}

            \t|

            {8}

            )(.*)/)

            { $_=$1."\t".$3."\n"; }

            ' <$tmpf1 >$tmpf2
            if ! cmp -s $tmpf1 $tmpf2
            then
            git apply --binary --index -R --whitespace=nowarn $tmpf1
            git apply --binary --index $tmpf2
            GIT_EDITOR=true git commit --amend
            else
            echo "No changes"
            fi

          simmonsja James A Simmons added a comment - Here is a handy script I used for re-tabbing.After you commit your code just run it and then push it upstream. . #!/bin/bash -e # Rewrite the last commit to remove any trailing whitespace in the new version of changed lines. Then replace space-based indentation with TAB based indentation based on TABS at every eight position # [[ -z $TRACE ]] || set -x trap "rm -f $tmpf" 0 tmpf1=$TMP/$$.1.diff tmpf2=$TMP/$$.2.diff git show --binary >$tmpf1 perl -p -e 's/ (+. ?)[ \t]+$/$1/; while(m/ (+\t )( {1,7} \t| {8} )(.*)/) { $_=$1."\t".$3."\n"; } ' <$tmpf1 >$tmpf2 if ! cmp -s $tmpf1 $tmpf2 then git apply --binary --index -R --whitespace=nowarn $tmpf1 git apply --binary --index $tmpf2 GIT_EDITOR=true git commit --amend else echo "No changes" fi
          spitzcor Cory Spitz added a comment -

          Our push of gnilnd is eminent. However, it was written before we adopted the retab policy. Can we please submit with the old whitespace style? It would help us from touching every single line or keeping our version and the contributed version different. Thoughts or suggestions?

          spitzcor Cory Spitz added a comment - Our push of gnilnd is eminent. However, it was written before we adopted the retab policy. Can we please submit with the old whitespace style? It would help us from touching every single line or keeping our version and the contributed version different. Thoughts or suggestions?

          The ticket is in http://jira.whamcloud.com/browse/LU-1422 for this. Want to let you know for peer review. Thanks

          simmonsja James A Simmons added a comment - The ticket is in http://jira.whamcloud.com/browse/LU-1422 for this. Want to let you know for peer review. Thanks
          spitzcor Cory Spitz added a comment -

          Yes, we'll aim at master. Then, if there is reason to push back, we can do so later.

          Other changes that aren't related to the gnilnd should be tracked in a different ticket. James, can you open one for the INITIAL_CONNECT_TIMEOUT problem?

          spitzcor Cory Spitz added a comment - Yes, we'll aim at master. Then, if there is reason to push back, we can do so later. Other changes that aren't related to the gnilnd should be tracked in a different ticket. James, can you open one for the INITIAL_CONNECT_TIMEOUT problem?

          Will this work only be aimed at the master branch? Also during ORNL testing of IR recovery we discovered some old left over code from the catamount days that impacted the recovery time. Should we merge that fix under this ticket? Its just the INITIAL_CONNECT_TIMEOUT in obd_support.h.

          simmonsja James A Simmons added a comment - Will this work only be aimed at the master branch? Also during ORNL testing of IR recovery we discovered some old left over code from the catamount days that impacted the recovery time. Should we merge that fix under this ticket? Its just the INITIAL_CONNECT_TIMEOUT in obd_support.h.

          People

            wc-triage WC Triage
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: