[LU-11407] Improve stats data Created: 19/Sep/18 Updated: 01/Aug/23 Resolved: 26/Jul/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.2 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
It would be useful to store and report the "job start" time for the JobStats. Currently we show in the obdfilter.*.job_stats file: - job_id: mythbackend.0
snapshot_time: 1537384753
read_bytes: { samples: 321, unit: bytes, min: 4096, max: 4194304, sum: 1025404928 }
write_bytes: { samples: 12656, unit: bytes, min: 22028, max: 919476, sum: 5413800656 }
sync: { samples: 11168, unit: reqs }
statfs: { samples: 31249, unit: reqs }
but this doesn't tell us anything about when this job started, so we can't find the throughput or IOPS rates. It should be simple to store the first time this job reported IO so that we can have some idea about the rate. A further enhancement would be to store the full brw_stats into the job_stats file, but that is a more complex change. |
| Comments |
| Comment by Andreas Dilger [ 19/Sep/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33201 |
| Comment by Gerrit Updater [ 02/Oct/18 ] |
|
|
| Comment by Joe Grund [ 08/Oct/18 ] |
|
What release(s) is this enhancement planning to land in? |
| Comment by Joe Grund [ 08/Oct/18 ] |
|
Is there a sample of how the new output will look? |
| Comment by Li Xi [ 09/Oct/18 ] |
|
We might need to create seperate tickets, but we have some requirements for the stats improvement:
|
| Comment by James A Simmons [ 09/Oct/18 ] |
|
Some time back a patch for the kernel code was pushed for 1) and it was rejected. Now if you really want it we could make "lctl get_param **.*stats" a wrapper around a function in liblustreapi that does these calculations for you. |
| Comment by Andreas Dilger [ 12/Oct/18 ] |
|
I was hoping to include it in 2.12 as a very minor enhancement, but if there is a significant issue affecting the parser then I could wait. I've also changed the job stats to have a sec.nsec timestamp to make it consistent with other stats. I've refreshed the patch (will push soon) to include a field elapsed_time: which is the difference between the start time and current time, so Li Xi's parser doesn't need to do that. Doing all of the division in the kernel is problematic because the kernel does not support floating-point math. The output for any stat that has snapshot_time: at the start will get two additional lines: snapshot_time: 123456789.123456789 (secs.nsec) start_time: 123456678.012345678 elapsed_time: 1111.111111111 |
| Comment by Joe Grund [ 12/Oct/18 ] |
|
No issue on my end, just want to know where I need to target. |
| Comment by Li Xi [ 19/Oct/18 ] |
|
I don't think floating-point math is necessary, since 64 bit interger should be enough for most of the collectors. A rate with high precision doesn't help too much for analysis. Anyway, the elapsed_time helps a lot. Thanks. |
| Comment by Gerrit Updater [ 01/Mar/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37764 |
| Comment by Gerrit Updater [ 27/Oct/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/33201/ |
| Comment by Gerrit Updater [ 26/Jul/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/37764/ |
| Comment by Peter Jones [ 26/Jul/22 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 13/Sep/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48539 |
| Comment by Gerrit Updater [ 26/Sep/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/48539/ |
| Comment by Gerrit Updater [ 25/Apr/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50734 |
| Comment by Gerrit Updater [ 01/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50734/ |