[LU-2454] Reduce memory usage of ptlrpc stats Created: 10/Dec/12  Updated: 31/Jul/14  Resolved: 31/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: client

Rank (Obsolete): 5796

 Description   

As noted in LU-1282 the per-cpu component of a ptlrpc stats array uses 8K of memory. For large core count, low memory platforms this is indeed excessive. For example, a Xeon Phi that connects to 456 OSTs will eventually use 10% of memory in ptlrpc stats.

The high memory use is due to the number of RPC opcodes which must be supported together with the implementation of the counter array. When I counted there were 77 RPC opcodes supported plus 19 extra opcodes MDS_REINT_xxx and LDLM_ENQUEUE_.... This gets us to 8K after slab rounding: (77 + 19) * sizeof(lprocfs_counter) = 7680. (We are only 6 opcodes away from using 16K per cpu.) However I cannot find an instance of any client (mdc, osc, osp) that uses more than 14 opcodes.

# find /proc/fs/lustre/ -name stats -exec grep -q ^req_waittime {} \; -exec wc -l {} \; | column -t
7   /proc/fs/lustre/ldlm/services/ldlm_canceld/stats
7   /proc/fs/lustre/ldlm/services/ldlm_cbd/stats
10  /proc/fs/lustre/osp/lustre-OST0001-osc-MDT0000/stats
9   /proc/fs/lustre/osp/lustre-OST0000-osc-MDT0000/stats
5   /proc/fs/lustre/osp/lustre-MDT0000-osp-OST0001/stats
5   /proc/fs/lustre/osp/lustre-MDT0000-osp-OST0000/stats
5   /proc/fs/lustre/osp/lustre-MDT0000-osp-MDT0000/stats
9   /proc/fs/lustre/ost/OSS/ost_io/stats
7   /proc/fs/lustre/ost/OSS/ost_create/stats
14  /proc/fs/lustre/ost/OSS/ost/stats
7   /proc/fs/lustre/mdt/lustre-MDT0000/mdt_fld/stats
7   /proc/fs/lustre/mdt/lustre-MDT0000/mdt_mdss/stats
8   /proc/fs/lustre/mdt/lustre-MDT0000/mdt_readpage/stats
14  /proc/fs/lustre/mdt/lustre-MDT0000/mdt/stats
4   /proc/fs/lustre/mdt/lustre-MDT0000/stats
14  /proc/fs/lustre/mgs/MGS/mgs/stats
13  /proc/fs/lustre/osc/lustre-OST0001-osc-ffff8801e48f0000/stats
11  /proc/fs/lustre/osc/lustre-OST0000-osc-ffff8801e48f0000/stats
15  /proc/fs/lustre/mdc/lustre-MDT0000-mdc-ffff8801e48f0000/stats

The largest being an mdc which used 14 (including MDS_REINT and LDLM_ENQUEUE):

# cat /proc/fs/lustre/mdc/lustre-MDT0000-mdc-ffff8801e48f0000/stats
snapshot_time             1355161746.36229 secs.usecs
req_waittime              3298 samples [usec] 57 221559 5204570 294225983354
req_active                3298 samples [reqs] 1 10 3669 5069
mds_getattr               56 samples [usec] 89 5071 21884 31118008
mds_getattr_lock          27 samples [usec] 595 916 17880 12026662
mds_close                 603 samples [usec] 180 3466 218004 95843514
mds_readpage              52 samples [usec] 329 1635 29738 19445852
mds_connect               2 samples [usec] 179 880 1059 806441
mds_getstatus             1 samples [usec] 57 57 57 3249
mds_statfs                20 samples [usec] 166 438 6112 1956252
mds_getxattr              175 samples [usec] 197 801 43570 12160700
ldlm_cancel               74 samples [usec] 222 6626 39097 64121733
obd_ping                  11 samples [usec] 63 587 3797 1560175
seq_query                 1 samples [usec] 185628 185628 185628 34457754384
fld_query                 2 samples [usec] 314 394 708 253832

Based on this, I propose using an fixed length open-addressed hash to tally the opcodes used by each client. If we set the number of slots to 16 then we can provide the same information using 1K per cpu.



 Comments   
Comment by John Hammond [ 10/Dec/12 ]

See http://review.whamcloud.com/4792 for a draft patch.

Comment by John Hammond [ 10/Dec/12 ]

In the description I should have said "not including MDS_REINT and LDLM_ENQUEUE".

Note that this patch leaves the old ptlrpc stats in-place for the sake of comparison. The new stats (intended to replace the old) are available in /proc/ under the name ptlrpc_cli_stats.

# find /proc/fs/lustre/ -name ptlrpc_cli_stats
/proc/fs/lustre/osp/lustre-OST0001-osc-MDT0000/ptlrpc_cli_stats
/proc/fs/lustre/osp/lustre-OST0000-osc-MDT0000/ptlrpc_cli_stats
/proc/fs/lustre/osp/lustre-MDT0000-osp-OST0001/ptlrpc_cli_stats
/proc/fs/lustre/osp/lustre-MDT0000-osp-OST0000/ptlrpc_cli_stats
/proc/fs/lustre/osp/lustre-MDT0000-osp-MDT0000/ptlrpc_cli_stats
/proc/fs/lustre/mdt/lustre-MDT0000/ptlrpc_cli_stats
/proc/fs/lustre/osc/lustre-OST0001-osc-ffff8801e48f0000/ptlrpc_cli_stats
/proc/fs/lustre/osc/lustre-OST0000-osc-ffff8801e48f0000/ptlrpc_cli_stats
/proc/fs/lustre/mdc/lustre-MDT0000-mdc-ffff8801e48f0000/ptlrpc_cli_stats
# cat /proc/fs/lustre/osc/lustre-OST0001-osc-ffff8801e48f0000/stats
snapshot_time             1355161403.512704 secs.usecs
req_waittime              80 samples [usec] 60 11830 72274 315613446
req_active                80 samples [reqs] 1 3 85 97
write_bytes               7 samples [bytes] 3 786432 1287584 742367009314
ost_setattr               23 samples [usec] 349 3183 13524 15072470
ost_write                 7 samples [usec] 1207 11830 35510 285830346
ost_connect               1 samples [usec] 298 298 298 88804
ost_punch                 4 samples [usec] 442 1463 3087 3035753
ost_statfs                4 samples [usec] 85 295 767 181775
ldlm_cancel               6 samples [usec] 318 1127 3907 3172199
obd_ping                  4 samples [usec] 60 204 495 72161
# cat /proc/fs/lustre/osc/lustre-OST0001-osc-ffff8801e48f0000/ptlrpc_cli_stats
req_waittime              77 samples [usec] 60 11830 71065 315091891
req_active                77 samples [reqs] 1 3 82 94
write_bytes               7 samples [bytes] 3 786432 1287584 742367009314
obd_ping                  4 samples [usec] 60 204 495 72161
ost_setattr               22 samples [usec] 349 3183 12989 14786245
ost_write                 7 samples [usec] 1207 11830 35510 285830346
ldlm_enqueue              30 samples [usec] 273 1256 14285 7999137
ldlm_cancel               6 samples [usec] 318 1127 3907 3172199
ost_connect               1 samples [usec] 298 298 298 88804
ost_punch                 4 samples [usec] 442 1463 3087 3035753
ost_statfs                3 samples [usec] 85 295 494 107246

(The sanpshot_time line is missing but easily added if people are in to that.)

The use of atomic_t entry/exit counters to protect accesses to the
count, min, max, sum, sum_sq has been copied over from the original
implementation. This is not to say that I endorse it---there are
several palces where I believe that rmb() or wmb() should be inserted
for correctness. I wondered if seqlock_t was evaluated for this
purpose, and if so why wasn't it used?

I ran sanity with the attached patch 7c3dd04 and found that the probing performed well. With 134975 calls to pcs_search() there were only 2836 cases in which the first probe missed.

Generated at Sat Feb 10 01:25:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.