Uploaded image for project: 'Lustre Documentation'
  1. Lustre Documentation
  2. LUDOC-494

Clarify when setting lnet route hops is required for Lustre 2.12 and Lustre 2.14

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      In a discussion on https://review.whamcloud.com/#/c/43127/, in response to:

      Note that bit of code is requiring that hop count be set for some routes, when they did not need to be set before (in lustre 2.12)

      Chris said:

      "Yes, good point. I think there was always an implicit requirement that hop count be set for multi-hop routes if the avoid_asym_route_failure feature was enabled, but we should make that explicit."

      However this isn't reflected in the manual or lnetctl(8). (yet)

       

      Attachments

        Activity

          [LUDOC-494] Clarify when setting lnet route hops is required for Lustre 2.12 and Lustre 2.14

          "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/44916/
          Subject: LUDOC-494 lnet: clarify use of route hopcount
          Project: doc/manual
          Branch: master
          Current Patch Set:
          Commit: f7da09ba79b2522ca51d001c59ab1212d051309c

          gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/44916/ Subject: LUDOC-494 lnet: clarify use of route hopcount Project: doc/manual Branch: master Current Patch Set: Commit: f7da09ba79b2522ca51d001c59ab1212d051309c

          "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44916
          Subject: LUDOC-494 lnet: clarify use of route hopcount
          Project: doc/manual
          Branch: master
          Current Patch Set: 1
          Commit: 48588c2fcdc74e48caca530f7e38f3036143ea95

          gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44916 Subject: LUDOC-494 lnet: clarify use of route hopcount Project: doc/manual Branch: master Current Patch Set: 1 Commit: 48588c2fcdc74e48caca530f7e38f3036143ea95
          ofaaland Olaf Faaland added a comment -

          Hi Serguei,

          We'll be happy to review the patches.

          Thanks

          ofaaland Olaf Faaland added a comment - Hi Serguei, We'll be happy to review the patches. Thanks

          Hi,

          I reviewed the current related documentation, listed below are the recommended changes:

          The lnetctl section of the Lustre manual and lnetctl man page should be updated to mention that the hop count defaults to 1 if not specified when adding a route with lnetctl.

          Also, the manual should be updated to clarify that "avoid_asym_route_failure" module parameter applies only to single-hop routers. 

          Also, the following passage from 34.3.7. LNet Peer Health should be modified: 

          "A router is considered down if any of its NIDs are down. For example, router X has three NIDs: Xnid1, Xnid2, and Xnid3. A client is connected to the router via Xnid1. The client has router checker enabled. The router checker periodically sends a ping to the router via Xnid1. The router responds to the ping with the status of each of its NIDs. In this case, it responds with Xnid1=up, Xnid2=up, Xnid3=down. If avoid_asym_router_failure==1, the router is considered down if any of its NIDs are down, so router X is considered down and will not be used for routing messages. If avoid_asym_router_failure==0, router X will continue to be used for routing messages."  

          The above sounds incorrect to me now, because the router shouldn't be considered down unless it cannot reach remote net.

          Thanks,

          Serguei.

           

          ssmirnov Serguei Smirnov added a comment - Hi, I reviewed the current related documentation, listed below are the recommended changes: The lnetctl section of the Lustre manual and lnetctl man page should be updated to mention that the hop count defaults to 1 if not specified when adding a route with lnetctl. Also, the manual should be updated to clarify that "avoid_asym_route_failure" module parameter applies only to single-hop routers.  Also, the following passage from 34.3.7. LNet Peer Health should be modified:  "A router is considered down if any of its NIDs are down. For example, router X has three NIDs: Xnid1 , Xnid2 , and Xnid3 . A client is connected to the router via Xnid1 . The client has router checker enabled. The router checker periodically sends a ping to the router via Xnid1 . The router responds to the ping with the status of each of its NIDs. In this case, it responds with Xnid1=up , Xnid2=up , Xnid3=down . If avoid_asym_router_failure==1 , the router is considered down if any of its NIDs are down, so router X is considered down and will not be used for routing messages. If avoid_asym_router_failure==0 , router X will continue to be used for routing messages."   The above sounds incorrect to me now, because the router shouldn't be considered down unless it cannot reach remote net. Thanks, Serguei.  
          pjones Peter Jones added a comment -

          Serguei

          Could you please advise on what changes should be made to the manual here?

          Thanks

          Peter

          pjones Peter Jones added a comment - Serguei Could you please advise on what changes should be made to the manual here? Thanks Peter
          ofaaland Olaf Faaland added a comment - Related to  https://jira.whamcloud.com/browse/LU-14555

          People

            ssmirnov Serguei Smirnov
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: