[LU-445] Send timestamps with LNet counters Created: 22/Jun/11  Updated: 01/Aug/12  Resolved: 12/Jul/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: Improvement Priority: Blocker
Reporter: Wally Wang (Inactive) Assignee: Doug Oucharek (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Bugzilla ID: 22,639
Rank (Obsolete): 4569

 Description   

We should send the timestamp data from the node along with the performance
counters. This will give us much more accurate performance data on a per-node
and per-group basis.

This is originally discussed in Oracle Bug 22639.



 Comments   
Comment by Wally Wang (Inactive) [ 22/Jun/11 ]

This is a change for lst:

http://review.whamcloud.com/#change,1000

We use a millisecond timer on a per-session basis. Each node then sends the milliseconds since session start over with the LNet counters. With 32bits and millisecond resolution, this should allow ~49 days of session age before the timers would wrap. There might be a bit of an issue with using 'jiffies_to_msecs()' - but I didn't find anything in cfs_time_XXX land that'd map to it easily.

This will change the wire attribute and is not backward compatible.

Comment by Andreas Dilger [ 22/Jun/11 ]

I commented on this patch in bugzilla, since I want to make it clear to Oracle that this patch should not land for b1_8.

Comment by Liang Zhen (Inactive) [ 19/Jan/12 ]

I think the simplest way is reuse sfw_counters_t::active_tests because nobody does/could really use it at all, for example, change it to active_tests::running.
Add a new member to sfw_counters_t will make srpc_msg_t larger (unfortunately srpc_stat_reply_t is the biggest wired structure in srpc_msg_t), and we didn't consider about we can have larger message while we implement LST, , which means if we have a bigger srpc_msg_t in new version, it will be silently dropped by old version. Anyway, this should be fixed in a different patch and it requires new LST message version, not LST session version (see LU-521).

I will post a patch for this, also I will file a new bug for larger LST message.

Liang

Comment by Bruce Korb (Inactive) [ 25/Jan/12 ]

One of our (Xyratex's) customers is bumping into this limitation. Silently losing data is not nice, but they have a higher priority in being able to do the testing with a homogeneously versioned cluster. I upgraded your old patch to current source and see you've upgraded it yourself. I'll take a peek at it. Thank you!

Comment by Alexey Lyashkov [ 25/Jan/12 ]

Liang,

what about a fix let to avoid to use "sizeof(srpc_msg_t)" as send size in LST code?
that will allow to send a hello message without had a -EPROTO error with expanded srpc_msg_t.
after hello message was accessed - we will switch a operations vectors to correct one.
it's looks flexible and anyway we need need to change a version for that case, because we have change a logic to time calculation and not able to use mixed (old and new messages) from same client.

Comment by Bruce Korb (Inactive) [ 26/Jan/12 ]

As a "hint", this is what I already implemented.

SRPC_STRUCT_TYPE_TABLE is a #define of SRpc() elements that
define all the different substructures that sent around for srpc.
Since this is the third table that would need to be kept
in sync with the enumeration and the union et al., and these
tables have two entries for each type, it seemed
to me to cross the threshold of what one ought to do by hand.

Anyway, this seems to work okay and would mean that the version
of an individual message would actually be the SRPC_MSG_VERSION
(basic protocol version number), plus the message length. The
upshot is that one could grow messages without rattling the
basic version number and the unused bits in the alternate
parts of the union would not get sent around either.

#define SRPC_STRUCT_TYPE_TABLE \
SRpc(0, MKSN, mksn, mksn) \
SRpc(1, RMSN, rmsn, rmsn) \
SRpc(2, BATCH, batch, bat) \
SRpc(3, STAT, stat, stat) \
SRpc(4, TEST, test, tes) \
SRpc(5, DEBUG, debug, dbg) \
SRpc(6, BRW, brw, brw) \
SRpc(7, PING, ping, ping) \
SRpc(8, JOIN, join, join)

diff --git a/lnet/selftest/rpc.c b/lnet/selftest/rpc.c
index 570e0e2..00435ea 100644
— a/lnet/selftest/rpc.c
+++ b/lnet/selftest/rpc.c
@@ -659,12 +659,47 @@ srpc_send_request (srpc_client_rpc_t *rpc)
return rc;
}

-int
+/**
+ * Let message lengths vary, depending on message type.
+ * If individual messages need to vary in size, more magic needs to be added.
+ *
+ * @param msg the selftest Lustre net RPC message
+ * @returns the size to transmit
+ */
+static inline int
+srpc_msg_len(srpc_msg_t * msg)
+{
+ static int const size_table[] =

{ +#define _SRpc_(_v, _N, _s, _f) \ + [SRPC_MSG_##_N##_REQST] = sizeof(srpc_##_s##_reqst_t), \ + [SRPC_MSG_##_N##_REPLY] = sizeof(srpc_##_s##_reply_t), + SRPC_STRUCT_TYPE_TABLE +#undef _SRpc_ + sizeof(srpc_generic_reqst_t), sizeof(srpc_generic_reply_t) + }

;
+ static int const type_ct = sizeof(size_table) / sizeof(size_table[0]);
+
+ int ix = msg->msg_type;
+ int len;
+
+ /*
+ * If out of range, presume the generic request/reply layout
+ * and presume odd numbered types are replies.
+ */
+ if (ix >= type_ct)
+ ix = (type_ct - 2) + (ix & 0x01);
+
+ len = offsetof(srpc_msg_t, msg_body) + size_table[ix];
+ return len;
+}
+
+static int
srpc_prepare_reply (srpc_client_rpc_t *rpc)
{
srpc_event_t *ev = &rpc->crpc_replyev;
__u64 *id = &rpc->crpc_reqstmsg.msg_body.reqst.rpyid;
int rc;
+ int msg_len = srpc_msg_len(&rpc->crpc_replymsg);

ev->ev_fired = 0;
ev->ev_data = rpc;
@@ -673,7 +708,7 @@ srpc_prepare_reply (srpc_client_rpc_t *rpc)
*id = srpc_next_id();

rc = srpc_post_passive_rdma(SRPC_RDMA_PORTAL, *id,

  • &rpc->crpc_replymsg, sizeof(srpc_msg_t),
    + &rpc->crpc_replymsg, msg_len,
    LNET_MD_OP_PUT, rpc->crpc_dest,
    &rpc->crpc_replymdh, ev);
    if (rc != 0) {
    @@ -1037,14 +1072,8 @@ srpc_send_rpc (swi_workitem_t *wi)
    LASSERT (rpc != NULL);
    LASSERT (wi == &rpc->crpc_wi);
  • cfs_spin_lock(&rpc->crpc_lock);
    -
  • if (rpc->crpc_aborted) { - cfs_spin_unlock(&rpc->crpc_lock); + if (rpc->crpc_aborted) goto abort; - }

    -

  • cfs_spin_unlock(&rpc->crpc_lock);

switch (wi->swi_state) {
default:

Comment by Liang Zhen (Inactive) [ 31/Jan/12 ]

Shadow, yes it's exactly what I want to do in the future, this might require quite some changes to LNet to avoid silent message dropping on old versions nodes because LST doesn't have handshake. As I mentioned in previous comment, we might want to create another ticket for this change, and just reuse old field to satisfy requirement on this ticket.

Liang

Comment by Alexey Lyashkov [ 27/Feb/12 ]

Hm.. we have a version field in any LST message - so we able to have several handlers for one message type.
But my change is very simple and will able to setup session.

Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = FAILURE
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
  • lnet/utils/lst.c
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,client,el5,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = FAILURE
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/selftest.h
  • lnet/utils/lst.c
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = FAILURE
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = FAILURE
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,server,el5,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/selftest.h
  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/framework.c
Comment by Peter Jones [ 06/Mar/12 ]

Landed for 2.3

Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/framework.c
  • lnet/selftest/selftest.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,client,el6,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
  • lnet/utils/lst.c
Comment by Build Master (Inactive) [ 06/Mar/12 ]

Integrated in lustre-master » i686,server,el6,ofa #506
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
Comment by Alexander Boyko [ 28/Mar/12 ]

Can this be land to b1_8?

Comment by Liang Zhen (Inactive) [ 30/Mar/12 ]

Sure, I will post a patch for b1_8

Comment by Alexander Boyko [ 12/Apr/12 ]

Liang, can you tell LU number or gerrit request for b1_8 patch?

Comment by Liang Zhen (Inactive) [ 12/Apr/12 ]

I just posted a patch at here: http://review.whamcloud.com/#change,2522

Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/selftest.h
  • lnet/selftest/framework.c
  • lnet/include/lnet/lnetst.h
  • lnet/utils/lst.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/selftest/selftest.h
  • lnet/utils/lst.c
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
  • lnet/include/lnet/lnetst.h
  • lnet/selftest/framework.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/selftest/framework.c
  • lnet/utils/lst.c
  • lnet/selftest/selftest.h
  • lnet/include/lnet/lnetst.h
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #340
LU-445 lnet: Send timestamps with LNet counters (Revision bc5b01bba4a6d934bb1092fab37adc9295a98487)

Result = SUCCESS
Oleg Drokin : bc5b01bba4a6d934bb1092fab37adc9295a98487
Files :

  • lnet/include/lnet/lnetst.h
  • lnet/utils/lst.c
  • lnet/selftest/framework.c
  • lnet/selftest/selftest.h
Comment by Doug Oucharek (Inactive) [ 26/Jun/12 ]

With Liang's recommendation, I am reopening this ticket to add a patch to make the change previously done backwards compatible.

With the change as is, a 2.3 system running against a 2.2 or 2.1 system will have an invalid timestamp for doing bandwidth calculations.

I plan to add a new flag to "lst stat" which will trigger the use of the remote timestamps. If the flag is not given, then the previous behaviour, using the local timestamp, will be done.

This change will be set up to change the default from using local timestamps to using remote timestamp when the Lustre version hits 2.8.

Comment by Doug Oucharek (Inactive) [ 26/Jun/12 ]

The patch for making the original work backward compatible is: http://review.whamcloud.com/#change,3192

Comment by Andreas Dilger [ 27/Jun/12 ]

Doug,
how are the old timestamps invalid? Looking at the old patch it seems it would be possible to check this programattically. The number of tests is always going to be some small number, but the milliseconds will typically be larger values. Is it reasonable to say if they are so small as to be indistinguishable from the test count it doesn't really matter?

Comment by Liang Zhen (Inactive) [ 27/Jun/12 ]

Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100.
I thought about this at the beginning, but think that could be confusing for code maintainer (OK, we are the maintainers), that's the reason I suggest Doug to fix by current way.
However, I think user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

Comment by Liang Zhen (Inactive) [ 27/Jun/12 ]

Yes we can distinguish this by value, the minimum interval of stat request would be 1 second which is 1000 milliseconds, but number of running tests on a node is almost impossible to be larger than 100.

I thought about this at the beginning, but felt it could be confusing for code maintainers (OK, we are the maintainers), that's the reason I suggested Doug to fix by current way.

However, I agree user will like it more if it can automatically decide which timestamp to choose, so if you think it's kind of acceptable style, I will not object to choose the easier way.

Comment by Doug Oucharek (Inactive) [ 27/Jun/12 ]

Ok, checking programmatically makes sense. Based on Liang's comment, I'm assuming I can use 100 as the value to check the timestamp against to determine whether to use local or remote timestamps.

Comment by Liang Zhen (Inactive) [ 27/Jun/12 ]

Please be more cautious about this and use at least 500 as the value, we can't use interval less than one second, even milliseconds stamp could not be so accurate but it shouldn't be less than 500.

Comment by Liang Zhen (Inactive) [ 01/Jul/12 ]

Change it to blocker

Comment by Doug Oucharek (Inactive) [ 12/Jul/12 ]

Change http://review.whamcloud.com/#change,3192 has landed thereby addressing the backwards compatibility issue.

Comment by Wally Wang (Inactive) [ 01/Aug/12 ]

http://review.whamcloud.com/#change,3514 is the 1st patch for b2_2. Will submit the 2nd patch after landing this one.

Generated at Sat Feb 10 01:07:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.