[LU-7733] ptlrpc_lprocfs_svc_req_history_show (Seq_printf) time resolution should be higher than 1 second. Created: 02/Feb/16  Updated: 19/Sep/16  Resolved: 16/May/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Major
Reporter: Mahmoud Hanafi Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

We would like to see a higher time resolution in ptlrpc_lprocfs_svc_req_history_show output.

Here we see it is only printing up to a second.

seq_printf(s, LPD64":%s:%s:x"LPU64":%d:%s:%ld:%lds(%+lds) ",
                           req->rq_history_seq, libcfs_nid2str(req->rq_self),
                           libcfs_id2str(req->rq_peer), req->rq_xid,
                           req->rq_reqlen, ptlrpc_rqphase2str(req),
                           req->rq_arrival_time.tv_sec,
                           req->rq_sent - req->rq_arrival_time.tv_sec,
                           req->rq_sent - req->rq_deadline);


 Comments   
Comment by Oleg Drokin [ 03/Feb/16 ]

its trivial to add extra resolution, the biggest question is that of compatibility. While we d o not have any tools that parse it, there might be something in the community that is not expecting anything but an integer there.

Perhaps we need to ask around in lustre-devel?

Comment by Peter Jones [ 03/Feb/16 ]

Bob

Can you push a provisional change here while we assess the impact of making this change?

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 03/Feb/16 ]

not at all clear what is wanted here. I can easily print a higher resolution value in place of req->rq_arrival_time.tv_sec, just convert to jiffies or microseconds or something. not very human readable compared to seconds IMHO. don't know what to do about the diff values, req->rq_sent - req->rq_arrival_time.tv_sec and req->rq_sent - req->rq_deadline. rq_sent is in seconds so I think only seconds diffs are easy to come by.

Comment by Mahmoud Hanafi [ 04/Feb/16 ]

We would be interested at lest microseconds. In the rpctrace debug dump the completion times are in micro-seconds.

Comment by Gerrit Updater [ 05/Feb/16 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/18335
Subject: LU-7733 ptlrpc: print times in microseconds
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4343fc1738f22b24042b12616b5ba16567ff30fe

Comment by Bob Glossman (Inactive) [ 05/Feb/16 ]

still not sure I'm really addressing the request exactly, but have pushed a mod with my best guess. it's at least a starting point for discussion.

Comment by Mahmoud Hanafi [ 11/Feb/16 ]

I agree with Andreas comment, what we want is seconds.microsecond.

I did try out the patch. One the MDS it is working but not the output we were looking for...

6250131278572683267:10.151.26.117@o2ib:12345-10.151.26.120@o2ib:x1525908159598524:224:Complete:1455222088536099:-536099us(-192000000us) opc 400
6250131278572683271:10.151.26.117@o2ib:12345-10.151.26.120@o2ib:x1525908159598504:224:Complete:1455222088536106:-536106us(-192000000us) opc 400
6250127494744440832:10.151.26.117@o2ib:12345-10.151.27.19@o2ib:x1525733058816312:264:Complete:1455221207545364:-545364us(-192000000us) opc 700

There is something wrong on the OSS as it doesn't print anything.

nbp1-oss1 /proc/fs/lustre/ost/OSS/ost_io # cat req_buffer_history_max
5000
nbp1-oss1 /proc/fs/lustre/ost/OSS/ost_io # cat req_buffer_history_len
1250
nbp1-oss1 /proc/fs/lustre/ost/OSS/ost_io # cat req_history
nbp1-oss1 /proc/fs/lustre/ost/OSS/ost_io # 
Comment by Mahmoud Hanafi [ 25/Mar/16 ]

I don't understand the negative numbers here....

6266059553153941504:10.151.27.60@o2ib:12345-10.151.7.174@o2ib:x1529670166131456:328:Complete:1458930678.979710:-1.020290s(-192.0s) 
6266059553091026945:10.151.27.60@o2ib:12345-10.151.53.147@o2ib:x1529728924728644:328:Complete:1458930678.964339:-1.035661s(-192.0s) 
6266059553440268289:10.151.27.60@o2ib:12345-10.151.54.82@o2ib:x1529750902839948:328:Complete:1458930679.001036:-1.998964s(-192.0s) 
6266059553444003841:10.151.27.60@o2ib:12345-10.151.54.82@o2ib:x1529750902839960:328:Complete:1458930679.001944:-1.998056s(-192.0s) 
6266059553454292994:10.151.27.60@o2ib:12345-10.151.40.144@o2ib:x1529023858938888:328:Complete:1458930679.004457:-1.995543s(-192.0s) 
6266059553458880514:10.151.27.60@o2ib:12345-10.151.40.144@o2ib:x1529023858938904:328:Complete:1458930679.005578:-1.994422s(-192.0s) 
6266059553481883650:10.151.27.60@o2ib:12345-10.151.40.144@o2ib:x1529023858938924:328:Complete:1458930679.011186:-1.988814s(-192.0s) 
6266059553533132803:10.151.27.60@o2ib:12345-10.153.10.116@o2ib233:x1529703480218204:328:Complete:1458930679.023701:-1.976299s(-192.0s) 
Comment by Bob Glossman (Inactive) [ 25/Mar/16 ]

the negative numbers are because the values shown are deltas, not absolute times. if I'm not mistaken the values shown without the patch were always 0 or negative too.

Comment by Jay Lan (Inactive) [ 19/Apr/16 ]

Bob, the review #18335 has conflict with b2_7_fe. 'Git blame' points to LU-6070.
Do we need LU-6070 review #907a321c? Or can you provide a b2_7_fe back port?

Comment by Bob Glossman (Inactive) [ 19/Apr/16 ]

Jay,
There is no current plan to back port to b2_7_fe. It can probably be done if thought important. It's a priority call, I'm not the decider.

Comment by Jay Lan (Inactive) [ 19/Apr/16 ]

Hi Bob, I can do the back port myself, given that Intel tells me that this patch would still work as expected without LU-6070.

Comment by Bob Glossman (Inactive) [ 19/Apr/16 ]

Jay,
As far as I can tell LU-6070 changes have nothing whatsoever to do with this mod. changes from both happen to overlap.

I can't promise 100% that LU-6070 will never be going into b2_7_fe too, but I don't think it's a requirement for making changes from this mod apply there.

We don't usually port to branches until after a mod lands in master. This one hasn't. Hasn't even gotten any feedback from the originator of the request if it meets his needs.

Comment by Jay Lan (Inactive) [ 19/Apr/16 ]

Oops, sorry, Bob. I did not read the contents of this ticket. Yes the mod was simply to increase the time resolution in print messages. Actually I already cherry-picked the mod to b2_5_fe before.

Comment by Gerrit Updater [ 16/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18335/
Subject: LU-7733 ptlrpc: print times in microseconds
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1929bfe81379a9c986bfa4202265f76d201a49bc

Comment by Peter Jones [ 16/May/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:11:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.