[LU-12064] Adaptive timeout at_min adjustment & granularity - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- medium

Rank (Obsolete):
9223372036854775807

Description

The adaptive timeout code currently works on a granularity of full seconds, and ignores timeouts of "0". This means the MDS adaptive timeout code doesn't really adjust the timeouts there.

This means, for example, the bl_ast timeout stays at the default value of 100 seconds * 1.5 (ldlm_bl_timeout), so, 150 seconds.

This is a very long time to wait, and the AT code is supposed to shorten this.

There are two obvious approaches here.

Stop ignoring "0" values in the adaptive timeout code, and set a default non-zero at_min (setting it to 1 second should mean no behavioral change, as that's the current minimum real value). This solution should be simple and shouldn't affect existing installs too much. (configuring at_min is pretty common anyway)
Update the adaptive timeout code to use more precise time intervals than 1 second.

I'm inclined to #1. But in real configs, at_min is generally recommended to be something like 40 seconds. So perhaps we should default to that instead.

Note specifically in the ldlm_bl_timeout we use the max() of this and ldlm_enqueue_min (default is OBD_TIMEOUT_DEFAULT, 100 seconds), so we'll only get down to that value there.

A few open questions here.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

LAD2022-Scaling_Up_to_20k_Client-Cedeyn_Delbary.pdf
720 kB
24/Dec/22 2:04 AM

Issue Links

is related to

LU-11989 Global filesystem hangs in 2.12

Resolved

LU-17514 hint for expected number of connected clients

Open

is related to

LU-4942 lock callback timeout is not per-export

Resolved

LU-15246 Add per device adaptive timeout parameters

Resolved

Activity

[LU-12064] Adaptive timeout at_min adjustment & granularity

Andreas Dilger added a comment - 13/Sep/23 6:15 PM

One proposal that might help here (and in other places) is for the servers to persistently track the maximum number of connected clients, so that the MDS/OSS knows after a restart how many clients might connect and can set at_min to an appropriate value right from the start.

Andreas Dilger added a comment - 13/Sep/23 6:15 PM One proposal that might help here (and in other places) is for the servers to persistently track the maximum number of connected clients, so that the MDS/OSS knows after a restart how many clients might connect and can set at_min to an appropriate value right from the start.

Andreas Dilger added a comment - 13/Sep/23 5:29 PM

The patch that landed is only increasing at_min to a reasonable minimum value. The work to implement a dynamic at_min/at_max based on the number of connected clients has not been done.

Andreas Dilger added a comment - 13/Sep/23 5:29 PM The patch that landed is only increasing at_min to a reasonable minimum value. The work to implement a dynamic at_min/at_max based on the number of connected clients has not been done.

Peter Jones added a comment - 06/Sep/23 1:06 PM

Landed for 2.16

Peter Jones added a comment - 06/Sep/23 1:06 PM Landed for 2.16

Gerrit Updater added a comment - 06/Sep/23 6:15 AM

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50609/
Subject: LU-12064 ptlrpc: set at_min=5 by default
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 46804c2230cc0f72d4472cddd5a37456e1f2fb00

Gerrit Updater added a comment - 06/Sep/23 6:15 AM "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50609/ Subject: LU-12064 ptlrpc: set at_min=5 by default Project: fs/lustre-release Branch: master Current Patch Set: Commit: 46804c2230cc0f72d4472cddd5a37456e1f2fb00

Gerrit Updater added a comment - 12/Apr/23 4:31 AM

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50609
Subject: LU-12064 ptlrpc: set at_min=5 by default
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1805e4dfa8fcba712b5fa3868f26988e7635dbcb

Gerrit Updater added a comment - 12/Apr/23 4:31 AM "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50609 Subject: LU-12064 ptlrpc: set at_min=5 by default Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1805e4dfa8fcba712b5fa3868f26988e7635dbcb

Andreas Dilger added a comment - 20/Jan/23 5:41 AM

With ever-increasing core counts on the client, it makes sense to scale the number of "clients" by max_mod_rpcs_in_flight when computing at_min so that a multi-threaded workload on a smaller number of clients is handled similarly to a larger number of clients with max_mod_rpcs_in_flight=1. With the default max_mod_rpcs_in_flight=8 this would be a multiplier of 3 to the calculated at_min value.

Andreas Dilger added a comment - 20/Jan/23 5:41 AM With ever-increasing core counts on the client, it makes sense to scale the number of "clients" by max_mod_rpcs_in_flight when computing at_min so that a multi-threaded workload on a smaller number of clients is handled similarly to a larger number of clients with max_mod_rpcs_in_flight=1 . With the default max_mod_rpcs_in_flight=8 this would be a multiplier of 3 to the calculated at_min value.

Andreas Dilger added a comment - 30/Sep/22 9:33 PM

I think one of the current issues is that at_min=0 allows the ping flood to happen. If there is at_min > 0 it would significantly reduce the flood. Having a reasonable at_min for the cluster size will help significantly.

Andreas Dilger added a comment - 30/Sep/22 9:33 PM I think one of the current issues is that at_min=0 allows the ping flood to happen. If there is at_min > 0 it would significantly reduce the flood. Having a reasonable at_min for the cluster size will help significantly.

DELBARY Gael (Inactive) added a comment - 28/Sep/22 3:37 PM - edited

The main concern about at_min is that if you specify a value inferior to your lnet transaction timeout (modulo the number of lnet retry if it is setup...) the propabilty to flood your lnet networks increases drastically because an rpc not acknowledge by server will be retransmitted by client (not directly but through an high priority rpc) regarding the at_min value. I don't remember by heart the piece of code on client side but it is what we have observed. Anyway in lustre others timeouts like ldlm_timeout, obd_timeout are generally initialized in the code with constant value higher than default transaction timeout. On what we have seen on large scale we could set a default at_min value to lnet_transaction_timeout+1 (for non routing configuration) or max((lnet_transaction_timeout+1),ilog(num_clients) * 3). I think we have to rely on under layers. Does it make sense?

DELBARY Gael (Inactive) added a comment - 28/Sep/22 3:37 PM - edited The main concern about at_min is that if you specify a value inferior to your lnet transaction timeout (modulo the number of lnet retry if it is setup...) the propabilty to flood your lnet networks increases drastically because an rpc not acknowledge by server will be retransmitted by client (not directly but through an high priority rpc) regarding the at_min value. I don't remember by heart the piece of code on client side but it is what we have observed. Anyway in lustre others timeouts like ldlm_timeout, obd_timeout are generally initialized in the code with constant value higher than default transaction timeout. On what we have seen on large scale we could set a default at_min value to lnet_transaction_timeout+1 (for non routing configuration) or max((lnet_transaction_timeout+1),ilog(num_clients) * 3) . I think we have to rely on under layers. Does it make sense?

Andreas Dilger added a comment - 28/Sep/22 2:07 PM - edited

Presentation from CEA set at_min=55 for a cluster with 20,000 clients, so this would also match approximately the ilog(num_clients) * 3 formula (16 * 3 = 48).

Andreas Dilger added a comment - 28/Sep/22 2:07 PM - edited Presentation from CEA set at_min=55 for a cluster with 20,000 clients, so this would also match approximately the ilog(num_clients) * 3 formula ( 16 * 3 = 48 ).

Andreas Dilger added a comment - 29/Jul/21 7:40 PM - edited

What else would be useful here is to tune at_min as a function of the number of clients connected to the servers. For systems with ~200 clients, having at_min=15 is typical, and with ~1500 clients at_min=30 is typical, so a function like the following seems reasonable:

        at_min = ilog(num_clients) * 3;

though any explicitly-specified at_min value should take precedence. That probably means having a separate flag that indicates whether at_min has been explicitly set or not, and otherwise recalculating it when clients connect and disconnect.

Andreas Dilger added a comment - 29/Jul/21 7:40 PM - edited What else would be useful here is to tune at_min as a function of the number of clients connected to the servers. For systems with ~200 clients, having at_min=15 is typical, and with ~1500 clients at_min=30 is typical, so a function like the following seems reasonable: at_min = ilog(num_clients) * 3; though any explicitly-specified at_min value should take precedence. That probably means having a separate flag that indicates whether at_min has been explicitly set or not, and otherwise recalculating it when clients connect and disconnect.

People

Assignee:: WC Triage

Reporter:: Patrick Farrell (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 12/Mar/19 9:23 PM

Updated:: 03/Apr/25 4:00 PM