[LU-17216] enable_health_write, health_check improvements Created: 21/Oct/23 Updated: 23/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Tim Day | Assignee: | Tim Day |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
enable_health_write should be tunable rather than a compilation option. This allows us to test it more easily and gives admins the option to try it out without having to recompile their Lustre servers. It will still be disabled by default.
This health write should be enabled for MDT/MGT also. Especially since DNE means there are many more metadata related disks.
Getting more verbose info from health checks would be useful. Lustre should report health by OBD device. It should also tell you what's wrong. To implement this, the health check functions could return a enum indicating the root cause of the health check failure (disk IO, ptlrpc, etc.). Then, the individual check need only return the correct enum. |
| Comments |
| Comment by Gerrit Updater [ 21/Oct/23 ] |
|
"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52782 |
| Comment by Gerrit Updater [ 22/Oct/23 ] |
|
"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52785 |
| Comment by Gerrit Updater [ 29/Nov/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52782/ |
| Comment by Andreas Dilger [ 23/Jan/24 ] |
|
The added test in 52782 is failing interop between master (2.15.60.20) and 2.15.4. Please review failure and push a patch. Either skip because it is not expected to work with old servers, or fix as needed: |
| Comment by Gerrit Updater [ 23/Jan/24 ] |
|
"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53770 |