Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
12488
Description
from discussion today: osc_read is not present in the manual
JH: Consider the read() syscall, the OST_READ RPC, and a mmapped region.
JH: Read moves some number of bytes from the page cache to a userspace buffer.
JH: That number is used to tally the read_bytes counter.
JH: OST_READ moves some number of bytes from memory on an OST to memory on a client.
JH: That number of bytes is used to tally the osc_reads stat.
JH: With mmap, a memory access that faults triggers an OST_READ.
JH: An access that doesn't fault tallies nothing.
JH: As Andreas pointed out, read_bytes is ignorant of caching.
JH: It doesn't give you an idea of the worked required to satisfy the read.
JH: But it's interesting anyway. And the ratio of these two stats is also interesting.
JH: Perhaps confusingly there are also counters named read_bytes and ost_readin the osc stats.
JH: t:~# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff88016dcde800/stats ^M
snapshot_time 1391204667.36216 secs.usecs^M
req_waittime 178231 samples [usec] 32 29256 90673666 227929045180^M
req_active 178231 samples [reqs] 1 9 237854 447998^M
read_bytes 6145 samples [bytes] 0 55696 10481241 152202899719^M
write_bytes 23414 samples [bytes] 5 91736 74789821 1631652972043^M
ost_read 6145 samples [usec] 69 6604 1436882 1097380908^M
ost_write 23414 samples [usec] 149 29256 27478013 141084890911
[3:47:21 PM] John Hammond: The osc_read counter from llite should be the sum of
all read_bytes stats over all oscs associated to that superblock.
[3:47:51 PM] John Hammond: The ost_read stat is about how long the RPCs took.
[4:09:15 PM] Andreas Dilger: John: your supposition is correct:^M
$ lctl get_param llite.*.stats | grep read^M
read_bytes 1025 samples [bytes] 0 1048576 1073741824^M
osc_read 1029 samples [bytes] 4096 1048576 1073762304^M
$ lctl get_para osc.*.stats | grep read^M
read_bytes 1029 samples [bytes] 4096 1048576 1073762304 112589999
0728704^M
ost_read 1029 samples [usec] 1391 595570 347839127 120942246579
015
AD: llite.*.stats:osc_read == osc.*.stats:read_bytes
AD: it would be great if this was all clearly written in the manual (hint, hint)
JH: That guy from TACC should have updated the manual when he added the osc_{read,write} stats to llite. But I don't think he works there anymore.
JH: Richard has an office at TACC though and knows the manual pretty well. So I nominate him.
JH: The problem with documenting lustre stats is the sense of sorrow and horror one feels:
JH: t:~# find /proc/fs/lustre/ -name '*stats*' -exec basename {} \; | sort | uniq -c^M
5 brw_stats^M
2 extents_stats^M
2 extents_stats_per_process^M
2 hash_stats^M
3 job_stats^M
4 ldlm_stats^M
5 md_stats^M
2 offset_stats^M
4 osc_stats^M
2 read_ahead_stats^M
1 rename_stats^M
4 rpc_stats^M
1 site_stats^M
2 statahead_stats^M
53 stats^M
2 stats_track_gid^M
2 stats_track_pid^M
2 stats_track_ppid^M
6 unstable_stats