We've been seeing strange caching behavior on our PPC IO nodes, eventually resulting in OOM events. This is particularly harmful for us because there are critical system components running in user space on these nodes, forcing us to run with "panic_on_oom" enabled.
We see a large amount of "Active File" pages as reported by /proc/vmstat and /proc/meminfo which spikes during Lustre IOR jobs. For the test I am running that is unusual since I'm not running any executables out of Lustre, it should only be "inactive" IOR data accumulating in the page cache as a result of the Lustre IO. The really strange thing is, prior to testing Orion rebased code, "Active Files" would sometimes stay low (in the 100's Meg range) and sometimes it would grow very large (in the 5 Gig range). It's hard to tell if the variation still exists in the rebased code because the OOM events are hitting more frequently, basically every time I run an IOR.
We also see a large amount of "Inactive File" pages which we believe should be limited by the patch we carry from
LU-744, but doesn't seem to be the case:
From what I can tell, Lustre is trying to limit the cache to the value we are setting 4G. When I dump the lustre page cache I roughly see 4G worth of pages, but the number of pages listed does not reflect the values seen in vmstat and meminfo.
So I have a few questions which I'd like to get an answer to:
I added a systemtap script to disable the panic_on_oom flag and dump the Lustre page cache, /proc/vmstat, and /proc/meminfo file to try and gain some understanding into the problem. I'll upload those files as attachments in case they prove useful.