Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
12488
Description
from discussion today: osc_read is not present in the manual
JH: Consider the read() syscall, the OST_READ RPC, and a mmapped region. JH: Read moves some number of bytes from the page cache to a userspace buffer. JH: That number is used to tally the read_bytes counter. JH: OST_READ moves some number of bytes from memory on an OST to memory on a client. JH: That number of bytes is used to tally the osc_reads stat. JH: With mmap, a memory access that faults triggers an OST_READ. JH: An access that doesn't fault tallies nothing. JH: As Andreas pointed out, read_bytes is ignorant of caching. JH: It doesn't give you an idea of the worked required to satisfy the read. JH: But it's interesting anyway. And the ratio of these two stats is also interesting. JH: Perhaps confusingly there are also counters named read_bytes and ost_readin the osc stats. JH: t:~# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff88016dcde800/stats ^M snapshot_time 1391204667.36216 secs.usecs^M req_waittime 178231 samples [usec] 32 29256 90673666 227929045180^M req_active 178231 samples [reqs] 1 9 237854 447998^M read_bytes 6145 samples [bytes] 0 55696 10481241 152202899719^M write_bytes 23414 samples [bytes] 5 91736 74789821 1631652972043^M ost_read 6145 samples [usec] 69 6604 1436882 1097380908^M ost_write 23414 samples [usec] 149 29256 27478013 141084890911 [3:47:21 PM] John Hammond: The osc_read counter from llite should be the sum of all read_bytes stats over all oscs associated to that superblock. [3:47:51 PM] John Hammond: The ost_read stat is about how long the RPCs took. [4:09:15 PM] Andreas Dilger: John: your supposition is correct:^M $ lctl get_param llite.*.stats | grep read^M read_bytes 1025 samples [bytes] 0 1048576 1073741824^M osc_read 1029 samples [bytes] 4096 1048576 1073762304^M $ lctl get_para osc.*.stats | grep read^M read_bytes 1029 samples [bytes] 4096 1048576 1073762304 112589999 0728704^M ost_read 1029 samples [usec] 1391 595570 347839127 120942246579 015 AD: llite.*.stats:osc_read == osc.*.stats:read_bytes AD: it would be great if this was all clearly written in the manual (hint, hint) JH: That guy from TACC should have updated the manual when he added the osc_{read,write} stats to llite. But I don't think he works there anymore. JH: Richard has an office at TACC though and knows the manual pretty well. So I nominate him. JH: The problem with documenting lustre stats is the sense of sorrow and horror one feels: JH: t:~# find /proc/fs/lustre/ -name '*stats*' -exec basename {} \; | sort | uniq -c^M 5 brw_stats^M 2 extents_stats^M 2 extents_stats_per_process^M 2 hash_stats^M 3 job_stats^M 4 ldlm_stats^M 5 md_stats^M 2 offset_stats^M 4 osc_stats^M 2 read_ahead_stats^M 1 rename_stats^M 4 rpc_stats^M 1 site_stats^M 2 statahead_stats^M 53 stats^M 2 stats_track_gid^M 2 stats_track_pid^M 2 stats_track_ppid^M 6 unstable_stats