Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
We've tried to solve this in the past by integrating NFS unstable pages tracking in to Lustre, but this is fraught - it treats our uncommitted pages as dirty, which means we get rate limited on them. The kernels idea of an appropriate number of outstanding pages is based on local file systems, and isn't enough for us, so this causes performance issues. The SOFT_SYNC feature we created to work with unstable pages also just asks the OST nicely to do a commit, and includes no way for the client to be notified quickly.
This means it can't be responsive enough to avoid tasks getting OOM-killed.
Linux kernel already has matured solution for OOM with cgroup.
The most related codes are in balance_dirty_pages:
If the dirtied and uncommitted pages are over "background_thresh" for global memory limitation and memory cgroup limitation, the write back threads are woken to perform some whiteout.
In this ticket, we give a solution similar to NFS:
In the completion of writeback for the dirtied pages (@brw_interpret), __mark_inode_dirty(), which will attach the @bdi_writeback (each memory cgroup can have its own bdi_writeback) to the inode.
Once the writeback threads is woken up, and @for_background is set, it will check whether @wb_over_bg_thresh. For background writeout, stop when we are below the background dirty threshold.
So what we should do in Lustre client is:
When writeback thread for background cals ll_writepages() to write out data, If the inode has dirtied pending pages, flush dirtied pages to OST and sync them to commit the unlined pages. If all pages has cleared dirtied flags, but still in unstable (uncommitted) state, we should send a dedicated sync RPC to the OST and thus the uncommitted pages will be released finally.
As unstable page account in kernel may have bad impact on the performance, thus we need to optimize the unstable page account code in next phase work.
Attachments
Issue Links
- is related to
-
LU-17151 sanity: test_411b Error: '(3) failed to write successfully'
- Resolved
-
LU-17183 sanity.sh test_411b: cgroups OOM on ARM
- Resolved
-
LU-16696 Lustre memcg oom workaround for unpatched kernels
- Resolved
- is related to
-
LU-16697 Lustre should set appropriate BDI_CAP_* and s_iflags for writeback and cgroup wb
- Resolved