Details
-
New Feature
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Currently, logging with ost.OSS.ost_io.req_buffer_history and get_param mds.MDS.mdt.req_history only provides client NIDs. It would be way more helpful if the client UIDs and GIDs were included in the logs
Attachments
Issue Links
- is related to
-
LU-16077 Cannot use tbf to filter brw request per effective uid/gid, inode attr ids is used instead
-
- Resolved
-
-
LU-18179 Implementation of Round-Robin/Fair Share response with Token Bucket Filters
-
- Open
-
-
LU-14501 NRS TBF UID: limit per "any" user?
-
- Open
-
-
LU-17503 IO500: improve NRS TBF to sort requests by object offset for ior-hard-write
-
- Open
-
I've flip-flopped and don't think LU-18179 is the right place for my comments either. I'll post them here, and you can tell me if this is related to what you want the UID/GID for...
Rather than tuning a large (and continually changing) number of e.g. UID rules, it would be best to set the default TBF rules to automatically throttle jobs (by UID or JobID or NID or PROJID) that are using too much of the server resources when there is contention on the server. The TBF rules are already processing every RPC that arrives at the server, so IMHO this is the right place to detect RPC overload and throttle the offenders rather than adding an extra layer to process the RPCs again in userspace.
It should be possible to specify a default TBF rule like "change default rate=1000" as described in LU-14501 to cap individual UIDs at 1000 RPCs/sec, but if the server cannot process the RPCs at the required rate across all UIDs then it will try to evenly balance the available processing rate across the UIDs submitting RPCs. For example, say the OST can handle 2000 IOPS in total. If there are only 2 UIDs running IOPS-intensive jobs, each one should be able to use up to their full 1000 IOPS limit. If there is one UID with an IOPS-intensive job (1000 IOPS+), but 9 other UIDs trying to run "normal" jobs (150 IOPS) at the same time, then each UID would initially get 2000/10 = 200 IOPS. The 9 "normal" jobs would have all of their RPCs processed every second, total 9 x 150 = 1350 IOPS, and the one IOPS-intensive job could use the remaining 650 IOPS without affecting the other jobs. There should be some "memory/credit" for UIDs that don't use all of their IOPS in the last few slices.