Niu,
We just applied the patch today to our production file system (Lustre 2.4.3) and are running some heavy purges right now. I collected some info about the memory usage. Prior to the patch, it seemed like the memory growth was dominated by the "Inactive(file)" in /proc/meminfo. I dropped the cache in the MDS server (echo 3 > /proc/sys/vm/drop_caches) and collected Inactive(file) usage every minute:
nactive(file): 1146656 kB
Inactive(file): 3426128 kB
Inactive(file): 5510484 kB
Inactive(file): 6634728 kB
Inactive(file): 7514500 kB
Inactive(file): 8075948 kB
Inactive(file): 8662528 kB
Inactive(file): 9210796 kB
Inactive(file): 9576412 kB
Inactive(file): 9974336 kB
Inactive(file): 10400772 kB
Inactive(file): 10710464 kB
Inactive(file): 10964180 kB
Inactive(file): 11280900 kB
Inactive(file): 11591336 kB
Inactive(file): 11731164 kB
Inactive(file): 11817340 kB
Inactive(file): 11920016 kB
Inactive(file): 12040800 kB
Inactive(file): 12196232 kB
Inactive(file): 12148272 kB
Inactive(file): 12269224 kB
Inactive(file): 12251768 kB
Inactive(file): 12263596 kB
The number initially ramped up fast, but then leveled off a bit. Just to double check, I dropped the cache again:
Inactive(file): 401152 kB
Inactive(file): 2724788 kB
Inactive(file): 4409916 kB
Inactive(file): 6003208 kB
Inactive(file): 6532220 kB
Inactive(file): 7319768 kB
Inactive(file): 8154560 kB
Inactive(file): 8769084 kB
Inactive(file): 9271760 kB
Inactive(file): 9650020 kB
Inactive(file): 9918932 kB
Inactive(file): 10170456 kB
Inactive(file): 10303404 kB
Inactive(file): 10602256 kB
Inactive(file): 10972760 kB
Inactive(file): 11509680 kB
Inactive(file): 11986980 kB
Inactive(file): 12436528 kB
Inactive(file): 12770672 kB
Inactive(file): 13195352 kB
Inactive(file): 13463276 kB
Inactive(file): 13807816 kB
Inactive(file): 14029160 kB
Inactive(file): 14749976 kB
Inactive(file): 14879704 kB
Inactive(file): 14908984 kB
Inactive(file): 14988196 kB
Inactive(file): 15123316 kB
Inactive(file): 15240824 kB
Inactive(file): 15341328 kB
Inactive(file): 15464332 kB
We got the same behavior, and more importantly, we seem to be reclaiming the memory from Inactive(file). I also checked MemFree and Buffers before/after dropping caches:
(Before)
MemTotal: 66053640 kB
MemFree: 51291028 kB
Buffers: 10685976 kB
(After)
MemTotal: 66053640 kB
MemFree: 63239432 kB
Buffers: 198148 kB
Buffer usage dropped below 200 MB. Given the rate at which we are purging, that never would have happened prior to applying the patch.
It feel 90% confident this patch solved the problem. If we can continue purging at this rate over the couple of days without increased memory usage, then I think I will be 100% confident.
Great news. Landed for 2.5.4 and 2.7