Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2950

Port LNET routes_config change done by Fujitsu in latest lustre

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.5.0
    • 7082

    Description

      Port a change made by Fujitsu where they added a new lnet module param: routes_config, which can specify a file that contains a set of routes to be used by lnet

      Attachments

        Issue Links

          Activity

            [LU-2950] Port LNET routes_config change done by Fujitsu in latest lustre
            haasken Ryan Haasken added a comment -

            Since this change added the ability to configure LNet routes from a file, and updated the lnet init script to load that config file at start time, I think the reload and probe actions should be updated so that they will reload the configuration from the file, if necessary.

            From the LSB, this is what the reload action should do:

            cause the configuration of the service to be reloaded without actually stopping and restarting the service

            Similarly probe should do the following (according to SUSE, at least):

            probe — Probes for the necessity of a reload. Depending on the service, prints "reload" or "restart" if a reload is required. Nothing is printed if a reload is not required.

            I am curious about this because I am working on porting at least the lnet init script to SLES. Amir, am I right to think that the lnet init script should call lustre_routes_config during the reload operation?

            haasken Ryan Haasken added a comment - Since this change added the ability to configure LNet routes from a file, and updated the lnet init script to load that config file at start time, I think the reload and probe actions should be updated so that they will reload the configuration from the file, if necessary. From the LSB, this is what the reload action should do: cause the configuration of the service to be reloaded without actually stopping and restarting the service Similarly probe should do the following (according to SUSE, at least): probe — Probes for the necessity of a reload. Depending on the service, prints "reload" or "restart" if a reload is required. Nothing is printed if a reload is not required. I am curious about this because I am working on porting at least the lnet init script to SLES. Amir, am I right to think that the lnet init script should call lustre_routes_config during the reload operation?

            I have updated review:
            http://review.whamcloud.com/#change,5757
            to add the man pages and
            http://review.whamcloud.com/#change,5965
            for the testing portion
            If you guys can review it, that would be great.

            ashehata Amir Shehata (Inactive) added a comment - I have updated review: http://review.whamcloud.com/#change,5757 to add the man pages and http://review.whamcloud.com/#change,5965 for the testing portion If you guys can review it, that would be great.

            Hi Hiroya,

            A note to highlight:

            The DLC (Dynamic Lnet Configuration) will be backwards compatible. (IE, existing configurations will not need to be changed, when DLC is introduced)

            thanks
            amir

            ashehata Amir Shehata (Inactive) added a comment - Hi Hiroya, A note to highlight: The DLC (Dynamic Lnet Configuration) will be backwards compatible. (IE, existing configurations will not need to be changed, when DLC is introduced) thanks amir

            Hi Hiroya,

            My apologies for the delay, for some reason I wasn't getting any notifications on Jira of comments being added. I will look into that.

            I have looked at the case that you have indicated; and you are absolutely right. The im_a_router flag will not be set when initially LNetNIInit() is called, and that will cause the "forwarding" parameter not to be checked, resulting in an lnet router node not behaving as expected. Thanks for pointing this out.

            There are a couple of way to approach this:

            1. Is to ensure that for a router node, in the luster.conf you enter a routes module param entry that corresponds to the NID of the router. This way when the routes param is being parsed a router node will set the "im_a_router" param properly and thus allocate the pools. The script: "lustre_routes_config" can be used later on to add extra routing entries as desired.
            This solution would be temporary until a feature we have in the pipe, DLC, lands. This will allow the modifications of lnet configurations dynamically.

            2. Is to modify the patch to ensure that when routes are added, if a route NID is added which corresponds to the local NID, then im_a_route is set, and immediately the pools are created.

            Please let me know your thoughts on these approaches.

            Thanks again for catching this case.

            amir

            ashehata Amir Shehata (Inactive) added a comment - Hi Hiroya, My apologies for the delay, for some reason I wasn't getting any notifications on Jira of comments being added. I will look into that. I have looked at the case that you have indicated; and you are absolutely right. The im_a_router flag will not be set when initially LNetNIInit() is called, and that will cause the "forwarding" parameter not to be checked, resulting in an lnet router node not behaving as expected. Thanks for pointing this out. There are a couple of way to approach this: 1. Is to ensure that for a router node, in the luster.conf you enter a routes module param entry that corresponds to the NID of the router. This way when the routes param is being parsed a router node will set the "im_a_router" param properly and thus allocate the pools. The script: "lustre_routes_config" can be used later on to add extra routing entries as desired. This solution would be temporary until a feature we have in the pipe, DLC, lands. This will allow the modifications of lnet configurations dynamically. 2. Is to modify the patch to ensure that when routes are added, if a route NID is added which corresponds to the local NID, then im_a_route is set, and immediately the pools are created. Please let me know your thoughts on these approaches. Thanks again for catching this case. amir
            nozaki Hiroya Nozaki (Inactive) added a comment - - edited

            Hi, Amir. Nice to meet you.

            I heartily agree your implementation because I also think reading a file into kernel space can be a kind of dangerous.
            But running the new script means that the timing to read a configuration file should be changed. That's why I wonder that lnet configuration will be the same as expected when "lustre_routes_config" or something like that runs on a router node after "lctl network up". Because LNetNIInit() checks "im_a_router" there and, I guess, it may have some effects on interpreting configuration.
            Actually, I'm not so sure that it can be problem but I'll appreciate if you answer my question, or consider the case if you've never considered it.

            Thank you

            nozaki Hiroya Nozaki (Inactive) added a comment - - edited Hi, Amir. Nice to meet you. I heartily agree your implementation because I also think reading a file into kernel space can be a kind of dangerous. But running the new script means that the timing to read a configuration file should be changed. That's why I wonder that lnet configuration will be the same as expected when "lustre_routes_config" or something like that runs on a router node after "lctl network up". Because LNetNIInit() checks "im_a_router" there and, I guess, it may have some effects on interpreting configuration. Actually, I'm not so sure that it can be problem but I'll appreciate if you answer my question, or consider the case if you've never considered it. Thank you
            ashehata Amir Shehata (Inactive) added a comment - The code inspection is @ http://review.whamcloud.com/#change,5757

            Can you submit your scripts into gerrit?

            keith Keith Mannthey (Inactive) added a comment - Can you submit your scripts into gerrit?

            We created two scripts. One to convert from the legacy syntax to the new syntax, and the second to take a file of the new syntax and configure lustre using lctl.

            The legacy syntax (which is currently being used in the lustre.conf file):
            <network> [<hop>] <gateway>@<exit network>[:<priority>]

            The new syntax (which is used in the new scripts):
            <network>:

            { gateway: <gateway>@<exit network> [hope: <hop>] [priority: <prioirty>] }

            I would think the intention is to move the syntax to the new form in the future, so there isn't three different syntax for configuring routes.

            Below is sample test runs. I have also attached two files to this Bug that were used for testing the scripts.

            [ashehata@localhost scripts]$ ~/lustre-master/lustre/scripts/lustre_routes_conversion legacy_routes_test new_routes_test
            tcp1 23 192.168.213.1@tcp:1 tcp5 34 193.30.4.3@tcp:4
            tcp2 54 10.1.3.2@tcp
            tcp3 10.3.4.3@tcp:3
            tcp4 10.3.3.4@tcp
            lustre_routes_conversion: converted routes written to test9
            [ashehata@localhost scripts]$ ~/lustre-master/lustre/scripts/lustre_routes_config --dry-run --verbose new_routes_test
            lctl --net tcp1 add_route 192.168.213.1@tcp 23 1
            lctl --net tcp5 add_route 193.30.4.3@tcp 34 4
            lctl --net tcp2 add_route 10.1.3.2@tcp 54 4
            lctl --net tcp3 add_route 10.3.4.3@tcp 1 3
            lctl --net tcp4 add_route 10.3.3.4@tcp 1 3

            ashehata Amir Shehata (Inactive) added a comment - We created two scripts. One to convert from the legacy syntax to the new syntax, and the second to take a file of the new syntax and configure lustre using lctl. The legacy syntax (which is currently being used in the lustre.conf file): <network> [<hop>] <gateway>@<exit network> [:<priority>] The new syntax (which is used in the new scripts): <network>: { gateway: <gateway>@<exit network> [hope: <hop>] [priority: <prioirty>] } I would think the intention is to move the syntax to the new form in the future, so there isn't three different syntax for configuring routes. Below is sample test runs. I have also attached two files to this Bug that were used for testing the scripts. [ashehata@localhost scripts] $ ~/lustre-master/lustre/scripts/lustre_routes_conversion legacy_routes_test new_routes_test tcp1 23 192.168.213.1@tcp:1 tcp5 34 193.30.4.3@tcp:4 tcp2 54 10.1.3.2@tcp tcp3 10.3.4.3@tcp:3 tcp4 10.3.3.4@tcp lustre_routes_conversion: converted routes written to test9 [ashehata@localhost scripts] $ ~/lustre-master/lustre/scripts/lustre_routes_config --dry-run --verbose new_routes_test lctl --net tcp1 add_route 192.168.213.1@tcp 23 1 lctl --net tcp5 add_route 193.30.4.3@tcp 34 4 lctl --net tcp2 add_route 10.1.3.2@tcp 54 4 lctl --net tcp3 add_route 10.3.4.3@tcp 1 3 lctl --net tcp4 add_route 10.3.3.4@tcp 1 3

            test files

            ashehata Amir Shehata (Inactive) added a comment - test files

            We took Issac's suggestion and we created a shell script that reads a routes config file then uses lctl to configure lnet with the routes. This avoids opening files in kernel space.

            ashehata Amir Shehata (Inactive) added a comment - We took Issac's suggestion and we created a shell script that reads a routes config file then uses lctl to configure lnet with the routes. This avoids opening files in kernel space.

            People

              ashehata Amir Shehata (Inactive)
              ashehata Amir Shehata (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: