Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14288

Enhance nodemap ranges to work better with IPv6

Details

    • 9223372036854775807

    Description

      The nodemap functionality in lustre allows sets of hosts to be described using a range of network addresses.

      For network types where the address is a simple integer, this seems reasonable.

      For IPv4 where an address is usually represented as 4 small integers, this is manageable, but not really ideal.  The standard throughout IP is to use a netmask to identify a set of addresses, whether for routing, for access control, or any other purpose.

      For IPv6 it would be quite inconvenient to use ranges.  Addresses are rarely allocated sequentially, and specifying a netmask length is much easier.  Also, it is standard practice to store IPv6 address (and IPv4) in network-byte-order, so comparing two addresses to see if they are ordered (as needed for ranges) is clumsy to implement.

      Clearly we need to keep address-range support for existing address types, but I believe we should not support it for new IP address.  Further we should add netmask support for existing address types.

      Internally, this would mean keeping nodemap information in a range-tree for 4-bytes addresses (as we already do), and using a netmask based structure for longer addresses.

      From user-space, we would add a syntax (/nn suffix) to specify a netmask.  For 4-byte addresses this would be converted to a range.  For large address, this would be used as-is.

       

      Attachments

        Issue Links

          Activity

            [LU-14288] Enhance nodemap ranges to work better with IPv6
            hornc Chris Horn added a comment -

            I see that test_101 and test_103 already have '[[ -n $ARR_IF0_IP ]] || skip "Need IPv4 NIDs"', but it isn't clear from reading the test that this is a test that "doesn't make sense for IPv6" as much as "this isn't implemented for IPv6" yet? Should those subtests be moved over to be excepted by LU-17457 "LNet ip2nets and routes parameters do not work with large NIDs", or should that be a new ticket?

            I think when I added them to always_except I was thinking that we could re-enable them after adding ipv6 range support with this ticket. However, we decided against that strategy and instead invented the nidmask feature (because a range of ipv6 NIDs can be extremely large). But nidmasks are really only appropriate for answering the question "is this NID part of the network?" while test_101/103 need the ability specify a range of NIDs that are then expanded to a list. I think the appropriate thing would be to remove the always_except and keep the '[[ -n $ARR_IF0_IP ]] || skip "Need IPv4 NIDs"' check

            Also, is there any nodemap test case that runs with IPv6 NIDs+netmask? If not, then one should be added before this ticket is closed again, otherwise we won't know if this is working or not.

            Yes, conf-sanity/43a and some sanity-sec test cases exercise the nidmask code with ipv6 when FORCE_LARGE_NID=true and LOAD_MODULES_REMOTE=true.

            hornc Chris Horn added a comment - I see that test_101 and test_103 already have '[[ -n $ARR_IF0_IP ]] || skip "Need IPv4 NIDs"', but it isn't clear from reading the test that this is a test that "doesn't make sense for IPv6" as much as "this isn't implemented for IPv6" yet? Should those subtests be moved over to be excepted by LU-17457 "LNet ip2nets and routes parameters do not work with large NIDs", or should that be a new ticket? I think when I added them to always_except I was thinking that we could re-enable them after adding ipv6 range support with this ticket. However, we decided against that strategy and instead invented the nidmask feature (because a range of ipv6 NIDs can be extremely large). But nidmasks are really only appropriate for answering the question "is this NID part of the network?" while test_101/103 need the ability specify a range of NIDs that are then expanded to a list. I think the appropriate thing would be to remove the always_except and keep the '[[ -n $ARR_IF0_IP ]] || skip "Need IPv4 NIDs"' check Also, is there any nodemap test case that runs with IPv6 NIDs+netmask? If not, then one should be added before this ticket is closed again, otherwise we won't know if this is working or not. Yes, conf-sanity/43a and some sanity-sec test cases exercise the nidmask code with ipv6 when FORCE_LARGE_NID=true and LOAD_MODULES_REMOTE=true.

            This ticket is closed, but I see a couple of subtests in sanity-lnet.sh that are being skipped because of this ticket:

            if [[ $NETTYPE =~ (tcp|o2ib)[0-9]* ]]; then
                    if $FORCE_LARGE_NID; then
                            always_except LU-14288 101
                            always_except LU-14288 103
                            always_except LU-17457 199
                            always_except LU-17457 208
                            always_except LU-9680 213
                            always_except LU-17458 220
                            always_except LU-5960 230
                            always_except LU-9680 231
                            always_except LU-17457 255
                            always_except LU-9680 302
            
                            FAKE_NID="${FAKE_IPV6}@tcp"
                    else
                            FAKE_NID="${FAKE_IP}@tcp"
                    fi
            fi
            

            The "always_except" marker should be used when a test is broken because of a defect that needs to be fixed, and the Jira ticket shouldn't be closed until the bug is fixed.

            If those subtests are IPv4-specific and do not make sense to run with IPv6, then they should use "skip" with a reason why they should not be run with IPv6.

            I see that test_101 and test_103 already have '[[ -n $ARR_IF0_IP ]] || skip "Need IPv4 NIDs"', but it isn't clear from reading the test that this is a test that "doesn't make sense for IPv6" as much as "this isn't implemented for IPv6" yet? Should those subtests be moved over to be excepted by LU-17457 "LNet ip2nets and routes parameters do not work with large NIDs", or should that be a new ticket?

            I think this ticket (IPv6 + nodemaps) shouldn't be mixed up with LNet routers (even though both relate to NID ranges).

            Also, is there any nodemap test case that runs with IPv6 NIDs+netmask? If not, then one should be added before this ticket is closed again, otherwise we won't know if this is working or not.

            adilger Andreas Dilger added a comment - This ticket is closed, but I see a couple of subtests in sanity-lnet.sh that are being skipped because of this ticket: if [[ $NETTYPE =~ (tcp|o2ib)[0-9]* ]]; then if $FORCE_LARGE_NID; then always_except LU-14288 101 always_except LU-14288 103 always_except LU-17457 199 always_except LU-17457 208 always_except LU-9680 213 always_except LU-17458 220 always_except LU-5960 230 always_except LU-9680 231 always_except LU-17457 255 always_except LU-9680 302 FAKE_NID= "${FAKE_IPV6}@tcp" else FAKE_NID= "${FAKE_IP}@tcp" fi fi The " always_except " marker should be used when a test is broken because of a defect that needs to be fixed, and the Jira ticket shouldn't be closed until the bug is fixed. If those subtests are IPv4-specific and do not make sense to run with IPv6, then they should use " skip " with a reason why they should not be run with IPv6. I see that test_101 and test_103 already have ' [[ -n $ARR_IF0_IP ]] || skip "Need IPv4 NIDs" ', but it isn't clear from reading the test that this is a test that "doesn't make sense for IPv6" as much as "this isn't implemented for IPv6" yet? Should those subtests be moved over to be excepted by LU-17457 " LNet ip2nets and routes parameters do not work with large NIDs ", or should that be a new ticket? I think this ticket (IPv6 + nodemaps) shouldn't be mixed up with LNet routers (even though both relate to NID ranges). Also, is there any nodemap test case that runs with IPv6 NIDs+netmask? If not, then one should be added before this ticket is closed again, otherwise we won't know if this is working or not.
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56184/
            Subject: LU-14288 nodemap: Use nidmasks for IPv6 NIDs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8f6988be4416fe0256583e5d6ea63e92ec0811ca

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56184/ Subject: LU-14288 nodemap: Use nidmasks for IPv6 NIDs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8f6988be4416fe0256583e5d6ea63e92ec0811ca

            We have one patch in gerrit but I still need to create a patch to move from linked list to interval tree. The work for using interval tree with jobstats by Shuan will provide the needed bits to implement proper interval trees for nodemap

            simmonsja James A Simmons added a comment - We have one patch in gerrit but I still need to create a patch to move from linked list to interval tree. The work for using interval tree with jobstats by Shuan will provide the needed bits to implement proper interval trees for nodemap

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55922/
            Subject: LU-14288 lnet: Introduce nidmasks
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4b12a9dcaf7ef08ec37c9209c31e44623f57548c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55922/ Subject: LU-14288 lnet: Introduce nidmasks Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4b12a9dcaf7ef08ec37c9209c31e44623f57548c

            "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56184
            Subject: LU-14288 nodemap: Use nidmasks for IPv6 NIDs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a6e64cb3ee208ec12aad59c434ea2922d72f7232

            gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56184 Subject: LU-14288 nodemap: Use nidmasks for IPv6 NIDs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a6e64cb3ee208ec12aad59c434ea2922d72f7232

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56058
            Subject: LU-14288 nodemap: debug IPv6 failures
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 91b195645b2d233cb67b2b8fecd17783d2accdb7

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56058 Subject: LU-14288 nodemap: debug IPv6 failures Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 91b195645b2d233cb67b2b8fecd17783d2accdb7

            In our latest testing for sanity-sec we are seeing a kernel crash for IPv6 with nodemap. I can't reproduce it locally so its something WC testbed specific.

            simmonsja James A Simmons added a comment - In our latest testing for sanity-sec we are seeing a kernel crash for IPv6 with nodemap. I can't reproduce it locally so its something WC testbed specific.

            People

              simmonsja James A Simmons
              neilb Neil Brown
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: