Details

    • Improvement
    • Resolution: Won't Fix
    • Minor
    • None
    • None
    • 8678

    Description

      We are interested in implementing UID/GID based NRS policies to see what we can get. In order to do this, it is essential to add the UID/GID of the processes that trigger the RPCs into to the request bodies. We implement this by filling the UID/GID into the padding of the request body and then get a 'UID/GID Round Robin' policy by changing CRRN policy (the attaced patch). We know it is not a good implementation though it works fine for testing. And we know it is not easy to implement a good one because we need to handle global user ID over entire cluster. Any advice or idea? Thanks!

      Attachments

        Issue Links

          Activity

            [LU-3468] Add UID/GID into RPC request
            pjones Peter Jones added a comment -

            ok thanks Li Xi!

            pjones Peter Jones added a comment - ok thanks Li Xi!

            Hi Jeff,

            This a earlier ticket than TBF. Now TBF has implemented jobstat support, which I think can cover most use cases of UID/GID based RPC scheduler. It would be good for me if this ticket is closed.

            Thank you!

            lixi Li Xi (Inactive) added a comment - Hi Jeff, This a earlier ticket than TBF. Now TBF has implemented jobstat support, which I think can cover most use cases of UID/GID based RPC scheduler. It would be good for me if this ticket is closed. Thank you!

            Has there been any further development on this patch? How does it compare to TBF (LU-3558)? Thanks!

            laytonjb Jeff Layton (Inactive) added a comment - Has there been any further development on this patch? How does it compare to TBF ( LU-3558 )? Thanks!

            Hi Andreas,

            Thank you so much for the advice! It is really helpful!

            lixi Li Xi (Inactive) added a comment - Hi Andreas, Thank you so much for the advice! It is really helpful!

            I would suggest a couple of different things:

            • the JobStats information would be a very good way of handling this, and it would allow prioritizing RPC processing between different batch jobs as well as between batch and interactive (e.g. with JobID==batch and without==interactive)
            • the OST and MDT RPCs already contain space for the UID/GID in each of the RPCs (struct obdo and struct mdt_body). That makes it a bit more complex to process the RPCs for NRS, but the ORR policy is already looking into the RPC request to determine the OST object ID and offsets. I'm not sure if the uid/gid fields are always filled in for all OST/MDT RPCs, but they could be.
            • alternately, it might be enough to do round-robin over the UID/GID of the objects being accessed? It wouldn't be 100% fair in every case, but would work for the large majority of cases and would avoid the need to change the network protocol just for this.

            In the long term, I'd prefer to develop only a small number of policies that are more sophisticated. Having separate policies for each "parameter" means that it will be difficult to get the best overall performance. Separate UID/GID policies will allow load balancing between users, but will not optimize the IO ordering like ORR.

            It would be better to have a single NRS policy that can do many things at once, like balance between nodes, users, jobs, sort RPCs within objects, both round-robin and constrained with upper and lower limits for bandwidth or IOPS.

            adilger Andreas Dilger added a comment - I would suggest a couple of different things: the JobStats information would be a very good way of handling this, and it would allow prioritizing RPC processing between different batch jobs as well as between batch and interactive (e.g. with JobID==batch and without==interactive) the OST and MDT RPCs already contain space for the UID/GID in each of the RPCs (struct obdo and struct mdt_body). That makes it a bit more complex to process the RPCs for NRS, but the ORR policy is already looking into the RPC request to determine the OST object ID and offsets. I'm not sure if the uid/gid fields are always filled in for all OST/MDT RPCs, but they could be. alternately, it might be enough to do round-robin over the UID/GID of the objects being accessed? It wouldn't be 100% fair in every case, but would work for the large majority of cases and would avoid the need to change the network protocol just for this. In the long term, I'd prefer to develop only a small number of policies that are more sophisticated. Having separate policies for each "parameter" means that it will be difficult to get the best overall performance. Separate UID/GID policies will allow load balancing between users, but will not optimize the IO ordering like ORR. It would be better to have a single NRS policy that can do many things at once, like balance between nodes, users, jobs, sort RPCs within objects, both round-robin and constrained with upper and lower limits for bandwidth or IOPS.

            Is JobStats suitable for this purpose?
            https://jira.hpdd.intel.com/browse/LU-694

            lixi Li Xi (Inactive) added a comment - Is JobStats suitable for this purpose? https://jira.hpdd.intel.com/browse/LU-694

            People

              pjones Peter Jones
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: