Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
If the Lustre shrinkers can't free any pages but don't set nr_scanned, they may be called forever by the kernel - see do_shrink_slab:
while (total_scan >= batch_size || total_scan >= freeable) { unsigned long ret; unsigned long nr_to_scan = min(batch_size, total_scan); shrinkctl->nr_to_scan = nr_to_scan; shrinkctl->nr_scanned = nr_to_scan; ret = shrinker->scan_objects(shrinker, shrinkctl); if (ret == SHRINK_STOP) break; freed += ret; count_vm_events(SLABS_SCANNED, shrinkctl->nr_scanned); total_scan -= shrinkctl->nr_scanned; scanned += shrinkctl->nr_scanned; cond_resched(); }
This can cause near hangs under heavy memory pressure, where we call the shrinker over and over again but don't make progress. We tend to eventually free enough pages to continue, but it doesn't work well.