[LU-14501] NRS TBF UID: limit per "any" user? - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.12.6
Labels:
- TBF
Environment:
CentOS 7

Rank (Obsolete):
9223372036854775807

Description

In my understanding of the TBF UID rules, it is not possible to set a limit per "any UID". In our testing, the default rule ( default {*} 10000 ) seems to include ALL UIDs, that is, 10,000 requests for all users. Please correct me if I'm wrong.

Say, we want to limit ost.OSS.ost_io.nrs_tbf_rule to 100 reqs/user. Do we have to add a rule for each UID? In our case, we have ~5,400 users.

Attachments

Issue Links

is duplicated by

LU-14567 Wildcard support for NRS TBF UID rules

Resolved

is related to

LU-18179 Implementation of Round-Robin/Fair Share response with Token Bucket Filters

Open

LU-18180 UID in req_buffer_history

Open

LU-17920 Add permanent TBF rules

Resolved

LU-13037 print stats for NRS TBF rules

Open

LU-17296 NRS TBF default rules

Open

LU-17902 add NRS TBF policy for nodemap

Open

LU-18183 shared rate limit for a TBF rule

Open

(3 is related to)

Activity

[LU-14501] NRS TBF UID: limit per "any" user?

Li Xi added a comment - 31/Mar/21 1:47 AM

is there a way to see which UID(s) have reached the rate limit?

I don't think there is any existing way. And I doubt there is any way to implement that efficiently. The status of the TBF are changing all the time. The rate limitation could be 1000 RPC/s or so. That means, a UID could be limited by TBF now, and after 1ms, the limitation might be gone. Under such quick change, there seems no efficient way to dump the real-time status.

But it doesn't mean we are not able to collect some statistics or summaries. For example, I think we could implement a mechanism to record the UIDs reached the limitation in the past period (e.g. an hour?). It will take a significant effort to implemnt though. And before that, need to analyze whether that is useful for your use cases, and whether that is useful for a broader use cases.

Li Xi added a comment - 31/Mar/21 1:47 AM is there a way to see which UID(s) have reached the rate limit? I don't think there is any existing way. And I doubt there is any way to implement that efficiently. The status of the TBF are changing all the time. The rate limitation could be 1000 RPC/s or so. That means, a UID could be limited by TBF now, and after 1ms, the limitation might be gone. Under such quick change, there seems no efficient way to dump the real-time status. But it doesn't mean we are not able to collect some statistics or summaries. For example, I think we could implement a mechanism to record the UIDs reached the limitation in the past period (e.g. an hour?). It will take a significant effort to implemnt though. And before that, need to analyze whether that is useful for your use cases, and whether that is useful for a broader use cases.

Stephane Thiell added a comment - 25/Mar/21 6:32 AM

Hi Li,

Thanks for your response and clarification. After further testing, it seems to work as you describe. Reducing default {*} does indeed reduce the rate per UID and not globally. Then adding other per-uid rules does properly override the default (eg. I added a rule to exempt UID 0 and it's working).

One thing I was wondering: is there a way to see which UID(s) have reached the rate limit? I don't think there is any stats about that in /sys, but perhaps with a special lustre logging debug mask? That would help adapting our rates.

Stephane Thiell added a comment - 25/Mar/21 6:32 AM Hi Li, Thanks for your response and clarification. After further testing, it seems to work as you describe. Reducing default {*} does indeed reduce the rate per UID and not globally. Then adding other per-uid rules does properly override the default (eg. I added a rule to exempt UID 0 and it's working). One thing I was wondering: is there a way to see which UID(s) have reached the rate limit? I don't think there is any stats about that in /sys, but perhaps with a special lustre logging debug mask? That would help adapting our rates.

Li Xi added a comment - 09/Mar/21 2:45 PM - edited

the default rule ( default {*} 10000 ) seems to include ALL UIDs, that is, 10,000 requests for all users.

For the TBF with UID type, I don't think this is correct. Each user with unique UID will have an dedicated different bucket for TBF UID, thus the rate limitation is for each user. If we want to make sure each user is only able to get rate of <= 100 request/sec from each NRS svcpt, setting "...nrs_tbf_rule='change default rate=100'" would be enough. (please note there might be several svcpt on a server, meaning the actually limitation will be 100 * svcpt). I don't think 5400 rules is needed for this use case .

Li Xi added a comment - 09/Mar/21 2:45 PM - edited the default rule ( default {*} 10000 ) seems to include ALL UIDs, that is, 10,000 requests for all users. For the TBF with UID type, I don't think this is correct. Each user with unique UID will have an dedicated different bucket for TBF UID, thus the rate limitation is for each user. If we want to make sure each user is only able to get rate of <= 100 request/sec from each NRS svcpt, setting " ...nrs_tbf_rule='change default rate=100' " would be enough. (please note there might be several svcpt on a server, meaning the actually limitation will be 100 * svcpt). I don't think 5400 rules is needed for this use case .

NRS TBF UID: limit per "any" user?

Details

Description

Attachments

Issue Links

Activity

People

Dates