[LU-13594] register OOM callback in Lustre Created: 22/May/20 Updated: 22/Jun/22 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Etienne Aujames |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | easy | ||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
It would be useful to register an OOM callback in Lustre using register_oom_notifier() (and deregister at shutdown with unregister_oom_notifier(), firstly in libcfs and obdclass to print the current libcfs_kmemory and memused_show()/memused_max_show(), as well as potentially trying to shrink caches (e.g. the number of LNet message buffers, debug logs, etc.) before a userspace process is killed. |
| Comments |
| Comment by Gerrit Updater [ 21/Mar/21 ] |
|
Arshad Hussain (arshad.hussain@aeoncomputing.com) uploaded a new patch: https://review.whamcloud.com/42121 |
| Comment by Gerrit Updater [ 26/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/42121/ |
| Comment by Peter Jones [ 26/Jan/22 ] |
|
Landed for 2.15 |
| Comment by Andreas Dilger [ 22/Jun/22 ] |
|
I've seen this callback a few times recently, running sanityn test_56 on ZFS but the current code just prints a brief message without much context and does nothing else: [16506.409968] obd_memory max: 200336259, obd_memory current: 200336259 [16506.975974] obd_memory max: 200416739, obd_memory current: 200416739 [16507.013294] obd_memory max: 200416739, obd_memory current: 200416739 [16507.020553] obd_memory max: 200416739, obd_memory current: 200416739 [16507.035227] obd_memory max: 200416739, obd_memory current: 200416739 [16507.218562] obd_memory max: 200471595, obd_memory current: 200471595 [16507.224060] obd_memory max: 200471595, obd_memory current: 200471595 [16507.226494] obd_memory max: 200471595, obd_memory current: 200471595 [16507.229583] obd_memory max: 200471595, obd_memory current: 200471595 [16507.231476] obd_memory max: 200471595, obd_memory current: 200471595 It would be better if this message was prefixed with "{{Lustre: OOM handler: }}" to give some context to what it means. Secondly, having the handler itself at least provides some minimal information (Lustre memory usage is 200MB in this case, on a 3GB VM, not including LNet memory usage which should also be printed). It would be better if this callback actually tried to do something useful under memory pressure. Possible candidates would be:
|