[LU-9683] Negative value reported for req_waittime mdt stat Created: 19/Jun/17  Updated: 10/Aug/17  Resolved: 29/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Tom Nabarro (Inactive) Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9019 Migrate lustre to standard 64 bit tim... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

On creating a lustre filesystem (in IML automated test) with lustre version 2.9.58_86_g2383a62, the following was reported indicating a negative value for req_waittime stat (causing IML to not report the stat):

[root@lotus-58vm16 ~]# less /proc/fs/lustre/mds/MDS/mdt/stats
snapshot_time             1497888436.757961280 secs.nsecs
req_waittime              54892 samples [usec] -8497 0 -4659765 1053607811
req_qdepth                54892 samples [reqs] 0 0 0 0
req_active                54892 samples [reqs] 1 1 54892 54892
req_timeout               54892 samples [sec] 1 10 54910 55090
reqbuf_avail              150546 samples [bufs] 63 64 9634690 616604158
ldlm_ibits_enqueue        13 samples [reqs] 1 1 13 13
mds_reint_setattr         1 samples [reqs] 1 1 1 1
mds_reint_open            8 samples [reqs] 1 1 8 8
mds_getattr               1 samples [usec] 44 44 44 1936
mds_connect               5 samples [usec] 27 16179 22451 297791095
mds_get_root              1 samples [usec] 15 15 15 225
mds_statfs                1 samples [usec] 26 26 26 676
mds_getxattr              2 samples [usec] 31 3259 3290 10622042
obd_ping                  54868 samples [usec] 12 548 2165145 95976761


 Comments   
Comment by Alex Zhuravlev [ 12/Jul/17 ]

getting quite similar issue with sanity/127a:
osc.lustre-OST0000-osc-MDT0000.stats=
snapshot_time 1499863534.716076693 secs.nsecs
osc.lustre-OST0000-osc-ffff8800c24ba800.stats=
snapshot_time 1499863534.716169621 secs.nsecs
req_waittime 5 samples [usec] -980 0 -3637 2803447
req_active 5 samples [reqs] 1 1 5 5
ost_punch 1 samples [usec] -725 0 -725 525625
ldlm_cancel 1 samples [usec] -980 0 -980 960400

Comment by Mikhail Pershin [ 12/Jul/17 ]

I want just confirm that this is not rare issue or something that happens time to time. It is always so and time stats are broken at the moment

Comment by James A Simmons [ 12/Jul/17 ]

I can take a look at this when I port the code to 64 bit time handling

Comment by Alex Zhuravlev [ 13/Jul/17 ]

https://review.whamcloud.com/#/c/28027/

Comment by Gerrit Updater [ 29/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28027/
Subject: LU-9683 ptlrpc: fix argument misorder
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 61c48e79fdfb825ea1ab2649cdadaccfb863155c

Comment by Peter Jones [ 29/Jul/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 31/Jul/17 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/28290
Subject: LU-9683 ptlrpc: fix argument misorder
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: dde04af12cddd05e5414184f428a303b911b40a9

Comment by Gerrit Updater [ 10/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28290/
Subject: LU-9683 ptlrpc: fix argument misorder
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 907f35db14e5b796bc4d722e16dc206d759e4a9a

Generated at Sat Feb 10 02:28:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.