[LU-14979] LNet: add tunable parameter to control max recovery interval duration Created: 01/Sep/21 Updated: 11/Jun/22 Resolved: 11/Jun/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Serguei Smirnov | Assignee: | Cyril Bordage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet, lnet-health | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Currently implemented recovery ping mechanism increases the next scheduled recovery ping attempt timeout exponentially (base 2) and limits the timeout at 900 seconds. This hard-coded value appears to be too high in many cases. Introduce a tunable parameter that can be used to limit the recovery ping timeout and come up with a reasonable default. |
| Comments |
| Comment by Chris Horn [ 01/Sep/21 ] |
|
Can you add some detail about the cases where the value is too high? My hope was that resetting the interval when we received a message from an NI would be sufficient. Is that not working for some reason? |
| Comment by Serguei Smirnov [ 01/Sep/21 ] |
|
Chris, I set up a test forĀ |
| Comment by Chris Horn [ 01/Sep/21 ] |
|
Okay, that makes sense and is working like I would expect. It might be interesting to see whether this is an issue in an environment where Node A is a Lustre server and B is a Lustre client (and vice versa) and there is actual i/o going on (or maybe even just idle client traffic). I think if there was i/o going on then things might recover more quickly, but the idle client case might also take a while to recover the NI (but, if the client is idle maybe it doesn't really matter. Once I/O was started we may again recover quickly). |
| Comment by Gerrit Updater [ 15/Sep/21 ] |
|
"Cyril Bordage <cbordage@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44927 |
| Comment by Gerrit Updater [ 11/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44927/ |
| Comment by Peter Jones [ 11/Jun/22 ] |
|
Landed for 2.16 |