[LU-16002] Ping evictor delayed client eviction for 3 ping interval more than defined Created: 10/Jul/22 Updated: 13/Sep/23 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Boyko | Assignee: | Alexander Boyko |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Ping evictor adds 3 ping interval to eviction time(6*ping interval) PING_EVICT_TIMEOUT. For obd_timeout 300 the result eviction time became 670 instead of 450. It confuses and delays all conflicting requests on server side. |
| Comments |
| Comment by Gerrit Updater [ 10/Jul/22 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47928 |
| Comment by Gerrit Updater [ 19/Jul/22 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47982 |
| Comment by Gerrit Updater [ 17/Sep/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47982/ |
| Comment by Gerrit Updater [ 15/Oct/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47928/ |
| Comment by Peter Jones [ 16/Oct/22 ] |
|
Landed for 2.16 |
| Comment by Andreas Dilger [ 11/Sep/23 ] |
|
aboyko, could you please provide some more background on why a tunable ping_interval is needed? I'm concerned that allowing ping_interval to be tuned separately from obd_timeout can lead to random client eviction when clients are not sending RPCs or OBD_PING in a timely manner. This might be hard to notice if it works out like e.g. ping_interval = obd_timeout - 10 and this is OK while an import is active and sending RPCs or OBD_PING, but fails intermittently if the import becomes idle and a ping is also lost. I'd much prefer to have a per-device obd_timeout value as implemented in patch https://review.whamcloud.com/50519 "LU-9912 ptlrpc: make obd timeout a per-device param", and then the ping_interval for each import is controlled by obd->obd_timeout / 4. This would work properly for clients that mount multiple filesystems, unlike having a global obd_timeout (and now global ping_interval). However, before we change anything with the global ping_interval that was aadded in 2.16, I'd like to understand why it was added and what problem it was solving. I'd prefer to avoid having a tunable ping_interval entirely, just because it can go badly. If this was needed to solve some specific problem, would a per-device obd_timeout also solve this same issue? Also, hornc landed patch https://review.whamcloud.com/49807 " I'm thinking we should remove the global ping_interval tunable completely (so that pings are always tied to obd_timeout), and use something like: #define PING_INTERVAL(obd) (obd_timeout(obd) / 4) It would still be possible to keep evict_multiplier if that is important, something like: #define PING_INTERVAL(obd) (obd_timeout(obd) * 3 / (evict_multiplier * 2)) but before we add complexity I'd like to understand what this was needed for. |
| Comment by Alexander Boyko [ 12/Sep/23 ] |
|
We had an issue there cascading failures bring timeouts to ~1700s, blocking callback timeout. Something like - one client node with LDLM lock crashed, server waited it, increased AT. Crash and eviction was not a problem to a whole system but it highly increased AT and shared lock for a root directory. We detected 3-6 problems during it, bl timeouts, eviction logic, etc. The one way to prevent such case is to detect crashed client early and evict it by pinger_evictor, we can reduce ping_interval and evictor multiplier for this. By default eviction time is 6 ping interval, server would not evict client if 5 pings are lost. For a perfect network it is overhead, and could be reduced (eviction multiplier). Similar things relate to a ping_interval. If obd_timeout is 300s(it is used in real), ping interval is 75s. To detect client fail faster ping interval should be reduced. I've made a comment about obd_timeout at LU. From my point of view obd_timeout is especially recovery timeout, but recovery and pinger don't have any relations. Only some historical. |
| Comment by Andreas Dilger [ 13/Sep/23 ] |
|
I agree that it is useful in such cases to be able to tune ping_interval and/or evict_multiplier, but it would make sense to ensure that ping_interval < obd_timeout / 2 and evict_multiplier >= 2 so that it cannot be set to a value where the client will be evicted easily. Even so, it still makes sense to use obd_timeout(exp) directly by default, unless ping_interval is explicitly set: #define PING_INTERVAL(obd) (ping_interval ?: (obd_timeout(obd) * 3 / (evict_multiplier * 2))) Also, for future reference, it is possible to evict specific clients from the MDS with "lctl set_param mdt.*.evict_client=UUID" or "lctl set_param mdt.*.evict_client=nid:NID". It will evict the client UUID/NID from all the targets if "mdt.*.evict_tgt_nids" is set. |