Details
-
Improvement
-
Resolution: Won't Fix
-
Minor
-
None
-
Lustre 2.4.0
-
5796
Description
As noted in LU-1282 the per-cpu component of a ptlrpc stats array uses 8K of memory. For large core count, low memory platforms this is indeed excessive. For example, a Xeon Phi that connects to 456 OSTs will eventually use 10% of memory in ptlrpc stats.
The high memory use is due to the number of RPC opcodes which must be supported together with the implementation of the counter array. When I counted there were 77 RPC opcodes supported plus 19 extra opcodes MDS_REINT_xxx and LDLM_ENQUEUE_.... This gets us to 8K after slab rounding: (77 + 19) * sizeof(lprocfs_counter) = 7680. (We are only 6 opcodes away from using 16K per cpu.) However I cannot find an instance of any client (mdc, osc, osp) that uses more than 14 opcodes.
# find /proc/fs/lustre/ -name stats -exec grep -q ^req_waittime {} \; -exec wc -l {} \; | column -t 7 /proc/fs/lustre/ldlm/services/ldlm_canceld/stats 7 /proc/fs/lustre/ldlm/services/ldlm_cbd/stats 10 /proc/fs/lustre/osp/lustre-OST0001-osc-MDT0000/stats 9 /proc/fs/lustre/osp/lustre-OST0000-osc-MDT0000/stats 5 /proc/fs/lustre/osp/lustre-MDT0000-osp-OST0001/stats 5 /proc/fs/lustre/osp/lustre-MDT0000-osp-OST0000/stats 5 /proc/fs/lustre/osp/lustre-MDT0000-osp-MDT0000/stats 9 /proc/fs/lustre/ost/OSS/ost_io/stats 7 /proc/fs/lustre/ost/OSS/ost_create/stats 14 /proc/fs/lustre/ost/OSS/ost/stats 7 /proc/fs/lustre/mdt/lustre-MDT0000/mdt_fld/stats 7 /proc/fs/lustre/mdt/lustre-MDT0000/mdt_mdss/stats 8 /proc/fs/lustre/mdt/lustre-MDT0000/mdt_readpage/stats 14 /proc/fs/lustre/mdt/lustre-MDT0000/mdt/stats 4 /proc/fs/lustre/mdt/lustre-MDT0000/stats 14 /proc/fs/lustre/mgs/MGS/mgs/stats 13 /proc/fs/lustre/osc/lustre-OST0001-osc-ffff8801e48f0000/stats 11 /proc/fs/lustre/osc/lustre-OST0000-osc-ffff8801e48f0000/stats 15 /proc/fs/lustre/mdc/lustre-MDT0000-mdc-ffff8801e48f0000/stats
The largest being an mdc which used 14 (including MDS_REINT and LDLM_ENQUEUE):
# cat /proc/fs/lustre/mdc/lustre-MDT0000-mdc-ffff8801e48f0000/stats snapshot_time 1355161746.36229 secs.usecs req_waittime 3298 samples [usec] 57 221559 5204570 294225983354 req_active 3298 samples [reqs] 1 10 3669 5069 mds_getattr 56 samples [usec] 89 5071 21884 31118008 mds_getattr_lock 27 samples [usec] 595 916 17880 12026662 mds_close 603 samples [usec] 180 3466 218004 95843514 mds_readpage 52 samples [usec] 329 1635 29738 19445852 mds_connect 2 samples [usec] 179 880 1059 806441 mds_getstatus 1 samples [usec] 57 57 57 3249 mds_statfs 20 samples [usec] 166 438 6112 1956252 mds_getxattr 175 samples [usec] 197 801 43570 12160700 ldlm_cancel 74 samples [usec] 222 6626 39097 64121733 obd_ping 11 samples [usec] 63 587 3797 1560175 seq_query 1 samples [usec] 185628 185628 185628 34457754384 fld_query 2 samples [usec] 314 394 708 253832
Based on this, I propose using an fixed length open-addressed hash to tally the opcodes used by each client. If we set the number of slots to 16 then we can provide the same information using 1K per cpu.