Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
Lustre 2.1.0
-
3
-
4554
Description
The memory usage at mount time for lustre 2.1 appears to be significantly worse than under 1.8. In particular, it looks like slab-8192 usage has grown significantly.
On 1.8 clients, the memory usage by lustre is maybe 1GB of memory to mount four of our filesystems.
On 2.1 clients, the memory usage has jumped to 5GB of memory to mount the same four filesystems.
It looks like there are 3144 oscs at this time.
The memory pretty clearly increases with each filesystem mounted, and then reduces again at each umount. I would suspect that we have some bad new per-osc memory usage or something along those lines, or otherwise there would be more fallout.
But this is a pretty significant loss of memory, and it means that our applications are now OOMing on the 2.1 clients. Many of the applications are very specifically tuned in their memory usage, and the loss 4GB of memory per node is quite a problem.
Attachments
Issue Links
- is related to
-
LU-2979 sanity 133a: proc counter for mkdir on mds1 was not incremented
-
- Closed
-
- Trackbacks
-
Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....
http://review.whamcloud.com/3240
LU-1282lprocfs: disable some client percpu stats datait's unnecessary to use percpu stats on client side.
thread safe.
the major percpu data allocation locates at ptlrpc_lprocfs_register(), which asks for (EXTRA_MAX_OPCODES+LUSTRE_MAX_OPCODES) items for each cpu block, which is 5(PTLRPC op#)+20(OST op#)+21(MDS op#)+6(LDLM op#)+6(MGS op#)+3(OBD op#)+9(LLOG op#)+3(SEC op#)+1(SEQ op#)+1(FLD op#) = 75
this patch disables this stats been allocated for each cpu on the client side, while this could possibly affect client's performance.
I think we could reduce lprocfs_counter, just keep its common data to another lprocfs_counter_header structure, they are essentially the same for all cpu per each type of counter.