Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16454

can't set max_mod_rpcs_in_flight > 8

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.15.1
    • None
    • 3
    • 9223372036854775807

    Description

      I am trying to increase mdc.*.max_mod_rpcs_in_flight to grater than 8 but I get an error.

       

      # lctl set_param mdc.fs1-MDT0000-mdc-ffff902107a5e000.max_rpcs_in_flight=128
      mdc.fs1-MDT0000-mdc-ffff902107a5e000.max_rpcs_in_flight=128

      #lctl set_param mdc.fs1-MDT0000-mdc-ffff902107a5e000.max_mod_rpcs_in_flight=127
      error: set_param: setting /sys/fs/lustre/mdc/fs1-MDT0000-mdc-ffff902107a5e000/max_mod_rpcs_in_flight=127: Numerical result out of range

      # lctl get_param version
      version=2.15.1

      Attachments

        Issue Links

          Activity

            [LU-16454] can't set max_mod_rpcs_in_flight > 8

            "Vitaliy Kuznetsov <vkuznetsov@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49749
            Subject: LU-16454 component: Add a per-MDT "max_mod_rpcs_in_flight"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 23463fee16abd0821b95129b333c31f354cf8a94

            gerrit Gerrit Updater added a comment - "Vitaliy Kuznetsov <vkuznetsov@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49749 Subject: LU-16454 component: Add a per-MDT "max_mod_rpcs_in_flight" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 23463fee16abd0821b95129b333c31f354cf8a94

            adilger Ok, I'll start working on a solution to this ticket.
            Thanks

            vkuznetsov Vitaliy Kuznetsov added a comment - adilger Ok, I'll start working on a solution to this ticket. Thanks

            Vitaliy, in my previous investigation of a similar issue in LU-14144 I couldn't find any good reason in the code or commit history why max_mod_rpcs_per_client was specifically a module parameter on the server and not a regular sysfs parameter. There doesn't appear to be any runtime dependency on this value (i.e. it doesn't define a static number of slots for the per-client replies or anything), and the only thing it is used for is to pass the limit to the client. For the same reason, there also doesn't appear to be a particularly hard limitation why the client cannot change and exceed the server-provided parameter, except to avoid overloading the server with too many RPCs at once, but that may also be true of the current limit with a larger number of clients, no different than "max_rpcs_in_flight".

            It seems reasonable to add a per-MDT "max_mod_rpcs_in_flight" tunable parameter to lustre/mdt/mdt_lproc.c so that it can be set with "lctl set_param" at runtime, for example like async_commit_count. The global max_mod_rpcs_per_client parameter should be used as the initial value, and add "(deprecated)" to the module description in mdt_handler.c.

            Mahmoud, the console error message printed when the client limit is reached is "myth-MDT0000-mdc-ffff979380fc1800: can't set max_mod_rpcs_in_flight=32 higher than ocd_maxmodrpcs=8 returned by the server at connection" but I agree this isn't totally clear. Instead of reporting "ocd_maxmodrpcs" (which is an internal field name) it should report the new "mdt.myth-MDT0000.max_mod_rpcs_in_flight" parameter, which would steer the admin to the right location to change this value. However, in the current implementation it would still be necessary to unmount/remount (or at least force a client reconnection) if this parameter is changed.

            The main question is whether there is any value for the MDS to "limit" the value that can be set by the client (which is not done for max_rpcs_in_flight or most other parameters) or if the client should be able set this larger than the default value the MDT returned (maybe some upper limit like 4x or 8x the MDT limit)? That would allow something like "lctl set_param -P ..max_mod_rpcs_in_time" to affect both the clients and servers.

            adilger Andreas Dilger added a comment - Vitaliy, in my previous investigation of a similar issue in LU-14144 I couldn't find any good reason in the code or commit history why max_mod_rpcs_per_client was specifically a module parameter on the server and not a regular sysfs parameter. There doesn't appear to be any runtime dependency on this value (i.e. it doesn't define a static number of slots for the per-client replies or anything), and the only thing it is used for is to pass the limit to the client. For the same reason, there also doesn't appear to be a particularly hard limitation why the client cannot change and exceed the server-provided parameter, except to avoid overloading the server with too many RPCs at once, but that may also be true of the current limit with a larger number of clients, no different than " max_rpcs_in_flight ". It seems reasonable to add a per-MDT " max_mod_rpcs_in_flight " tunable parameter to lustre/mdt/mdt_lproc.c so that it can be set with " lctl set_param " at runtime, for example like async_commit_count . The global max_mod_rpcs_per_client parameter should be used as the initial value, and add " (deprecated) " to the module description in mdt_handler.c . Mahmoud, the console error message printed when the client limit is reached is " myth-MDT0000-mdc-ffff979380fc1800: can't set max_mod_rpcs_in_flight=32 higher than ocd_maxmodrpcs=8 returned by the server at connection " but I agree this isn't totally clear. Instead of reporting " ocd_maxmodrpcs " (which is an internal field name) it should report the new " mdt.myth-MDT0000.max_mod_rpcs_in_flight " parameter, which would steer the admin to the right location to change this value. However, in the current implementation it would still be necessary to unmount/remount (or at least force a client reconnection) if this parameter is changed. The main question is whether there is any value for the MDS to "limit" the value that can be set by the client (which is not done for max_rpcs_in_flight or most other parameters) or if the client should be able set this larger than the default value the MDT returned (maybe some upper limit like 4x or 8x the MDT limit)? That would allow something like " lctl set_param -P . .max_mod_rpcs_in_time " to affect both the clients and servers.
            pjones Peter Jones added a comment -

            Vitaliy

            We discussed this during the triage call today. Andreas has some suggestions of how to address this issue that he will share and then could you please follow up and implement?

            Thanks

            Peter

            pjones Peter Jones added a comment - Vitaliy We discussed this during the triage call today. Andreas has some suggestions of how to address this issue that he will share and then could you please follow up and implement? Thanks Peter

            I figured out the issue. It was module setting on the server. The documentation should be update to state that the server side module param should be updated first.

             

            mhanafi Mahmoud Hanafi added a comment - I figured out the issue. It was module setting on the server. The documentation should be update to state that the server side module param should be updated first.  

            Here is the error in debug logs

             

            00000020:00020000:0.0F:1673137999.151510:0:2742:0:(genops.c:2175:obd_set_max_mod_rpcs_in_flight()) fs1-MDT0000-mdc-ffff902107a5e000: can't set max_mod_rpcs_in_flight=9 higher than ocd_maxmodrpcs=8 returned by the server at connection

            mhanafi Mahmoud Hanafi added a comment - Here is the error in debug logs   00000020:00020000:0.0F:1673137999.151510:0:2742:0:(genops.c:2175:obd_set_max_mod_rpcs_in_flight()) fs1-MDT0000-mdc-ffff902107a5e000: can't set max_mod_rpcs_in_flight=9 higher than ocd_maxmodrpcs=8 returned by the server at connection

            People

              vkuznetsov Vitaliy Kuznetsov
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: