[LU-5557] enqueue and reint RPC are not tracked in MDS stats Created: 28/Aug/14 Updated: 06/Oct/14 Resolved: 06/Oct/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.5.2, Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Aurelien Degremont (Inactive) | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | llnl | ||
| Rank (Obsolete): | 15496 |
| Description |
|
MDS stats proc file /proc/fs/lustre/mds/MDS/mdt/stats does not track information about LDLM_ENQUEUE and MDS_REINT RPCs. $ cat /proc/fs/lustre/mds/MDS/mdt/stats snapshot_time 1409239309.161365 secs.usecs req_waittime 182 samples [usec] 17 420 19191 2604647 req_qdepth 182 samples [reqs] 0 1 3 3 req_active 182 samples [reqs] 1 3 251 403 req_timeout 182 samples [sec] 1 10 209 479 reqbuf_avail 463 samples [bufs] 64 64 29632 1896448 ldlm_ibits_enqueue 5 samples [reqs] 1 1 5 5 mds_getattr 1 samples [usec] 83 83 83 6889 mds_connect 6 samples [usec] 20 197 439 54031 mds_getstatus 1 samples [usec] 76 76 76 5776 mds_statfs 2 samples [usec] 74 95 169 14501 obd_ping 167 samples [usec] 12 130 5875 249977 These class of RPCs are explicitly blacklisted in the code for a very long time. +++ b/lustre/ptlrpc/service.c
@@ -2110,7 +2110,7 @@ put_conn:
if (likely(svc->srv_stats != NULL && request->rq_reqmsg != NULL)) {
__u32 op = lustre_msg_get_opc(request->rq_reqmsg);
int opc = opcode_offset(op);
if (opc > 0 && !(op == LDLM_ENQUEUE || op == MDS_REINT)) {
LASSERT(opc < LUSTRE_MAX_OPCODES);
lprocfs_counter_add(svc->srv_stats,
opc + EXTRA_MAX_OPCODES,
Is there some specific reasons to prevent that? Could we consider enabling them? |
| Comments |
| Comment by Andreas Dilger [ 28/Aug/14 ] |
|
I think John already has a patch to fix this. |
| Comment by John Hammond [ 29/Aug/14 ] |
|
I did. I'll restore it and look at addressing your comments. On this subject, is it in out long term interest to replace these jumbo opcodes (MDS_REINT and LDLM_ENQUEUE) with specific opcodes (MDS_OPEN, MDS_CREATE, MDS_UNLINK, ...)? It has been pointed out that this would make RPC traces much more useful. I'm not sure what "reint" means and I don't think that if I knew it would help anything. |
| Comment by Andreas Dilger [ 02/Sep/14 ] |
|
Once upon a time, there was a filesystem named Intermezzo that allowed clients to disconnect from the server while using and optionally modifying their locally cached copy of the data. When the client reconnected to the server, it would reintegrate the log of changes that it had made locally to get the server copy back in sync with the client. The thought for Lustre was to allow clients to eventually do the same thing. Initially, Lustre clients would only send individual reintegration records to the MDT to change the metadata, but in the future it would be possible to reintegrate a series of changes efficiently, allowing either writeback caching (WBC) clients and/or disconnected operation. In that case, the type of any individual operation isn't known in advance, and there may in fact be multiple different operations sent in the same RPC. Hence, there is only the MDS_REINT RPC type instead of separate RPC handlers for each update type. That said, it would be possible to send different RPC types for statistical purposes, and have all of the RPC handlers be the same piece of code. Similarly, while LDLM_ENQUEUE today is commonly used for open (along with an open intent), it may be used for other kinds of locking operations on the MDS (e.g re-enqueue a lock in revalidate after it has been cancelled due to conflict) as well as extent locks on the OSS. I don't think it would be possible to change LDLM_ENQUEUE to MDS_OPEN as a result. |
| Comment by Andreas Dilger [ 03/Sep/14 ] |
|
I also recently found http://review.whamcloud.com/342 which fixes up some of this same code. |
| Comment by John Hammond [ 15/Sep/14 ] |
|
> Similarly, while LDLM_ENQUEUE today is commonly used for open (along with an open intent), it may be used for other kinds of locking operations on the MDS (e.g re-enqueue a lock in revalidate after it has been cancelled due to conflict) as well as extent locks on the OSS. I don't think it would be possible to change LDLM_ENQUEUE to MDS_OPEN as a result. Then MDS_ENQUEUE_OPEN. |
| Comment by John Hammond [ 15/Sep/14 ] |
|
Please see http://review.whamcloud.com/11924 for the reint stats. |
| Comment by Peter Jones [ 06/Oct/14 ] |
|
Landed for 2.7 |
| Comment by Aurelien Degremont (Inactive) [ 06/Oct/14 ] |
|
Could we consider this for 2.5.4 ? |