Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.3, Lustre 1.8.8
-
None
-
3
-
4350
Description
While working with router buffers, I set the number of large buffers to a number beyond the amount of memory I had assigned to the VM running Lustre. Number of large buffer: 1024, amount of memory: 1G. The VM froze with all 3 virtual cpu's running at 100%.
Looking deeper into this, I found that the Linux memory allocation system will keep trying to free up memory to satisfy the request. However, even after waiting 15 minutes, the VM did not "unfreeze".
I changed the default flags we use for memory allocation to include __GFP_NORETRY to stop the memory allocator from looping. When re-running the above test, I found the system no longer froze but returned -ENOMEM to the caller as expected.
This bug is to track a discussion as to whether we should start using __GFP_NORETRY and if so, how widespread.