Details
-
Task
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
There have been a few gnilnd changes since the last time we sync'd up. I'll be pushing up the latest commits.
There have been a few gnilnd changes since the last time we sync'd up. I'll be pushing up the latest commits.
James, are you saying that gnilnd is now using more memory or that allocations are failing when the node is under high memory pressure?
Also, I assume this is on compute nodes that you are seeing this issue. Is that true?
I don't see that these changes would cause gnilnd to use more memory.
http://review.whamcloud.com/17663 changed the vmalloc allocation flags so an allocation will fail instead of waiting forever to allocate memory.
We have seen heartbeat failures when a node needs to allocate memory to establish a connection in the case where Lustre is trying to write to disk in order to free memory.