[LU-12585] obdfilter read_bytes stat reports requested read bytes and not actual read bytes Created: 24/Jul/19  Updated: 12/Mar/22  Resolved: 07/Feb/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.7, Lustre 2.12.1
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: Patrick Farrell
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15642 restore server read/write latency mea... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

obdfilter.*.stats has a read_bytes metric. This metrics reports the number of requested bytes, and not what I was expecting, what was actually read bytes from the OST.

By example, if you have a 1MB file, and you try to read 16MB from it, the{{ read_bytes}} will likely reports 16MB and not 1MB.

To reproduce:

# llmount.sh

# dd if=/dev/zero of=/mnt/lustre/dump bs=1M count=1
# lctl set_param -n ldlm.namespaces.*.lru_size=clear

# dd if=/mnt/lustre/dump of=/dev/null bs=16M count=1
# lctl get_param obdfilter.*.stats | grep read_bytes
read_bytes 4 samples [bytes] 4194304 4194304 16777216
 

I think this comes from the fact that the stats is updated at I/O preparation, and not I/O completion.

Is this intended? I think we should either:

  • Fix the stat to store the actual read bytes
  • Or, add a second stat to store the actual read bytes if we don't want to change this one


 Comments   
Comment by Andreas Dilger [ 25/Feb/21 ]

in newer Lustre releases, there is:

osd-ldiskfs.myth-OST0000.stats=
snapshot_time             1614211618.617166153 secs.nsecs
get_page                  92759 samples [usec] 0 103 426203 4154059
cache_access              20606 samples [pages] 1 1024 16346835
cache_hit                 4869 samples [pages] 1 1024 342396
cache_miss                20005 samples [pages] 1 1024 16004439

which shows whether pages are coming from cache or from disk.

That said, with patch https://review.whamcloud.com/38816 "LU-13597 ofd: add more information to job_stats" the read stats are now stored after all of the pages have been read, so it would be trivial to replace "tot_bytes" with an actual counter of the bytes read.

Comment by Gerrit Updater [ 12/Jan/22 ]

"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46075
Subject: LU-12585 obdfilter: Use actual I/O bytes in stats
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 34cbc31a9e73498d7f9324e942c85fbc44791c83

Comment by Gerrit Updater [ 20/Jan/22 ]

"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46229
Subject: LU-12585 mdt: Add read/write latency to MDT stats
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6af42177e90e323d8d53878db3e2f56dd096b0ab

Comment by Gerrit Updater [ 07/Feb/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46075/
Subject: LU-12585 obdfilter: Use actual I/O bytes in stats
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1c98e950fa7c9fac3d7494278abebee7c64c5397

Comment by Gerrit Updater [ 07/Feb/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46229/
Subject: LU-12585 mdt: Add read/write latency to MDT stats
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a942fc916871eebe6615fe8e29471e2386d46f1d

Comment by Peter Jones [ 07/Feb/22 ]

Landed for 2.15

Generated at Sat Feb 10 02:53:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.