[LU-11422] Make LNet Selftest post Health backward compatible Created: 24/Sep/18  Updated: 10/Oct/18  Resolved: 10/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Blocker
Reporter: Sonia Sharma (Inactive) Assignee: Sonia Sharma (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9120 LNet Network Health Feature Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LNet Selftest post LNet Health landing loses backward compatibility, which means lnet-selftest cannot be run between cross-version peers (Lustre 2.12 and pre Lustre 2.12). We should fix that.

In LNet Health feature, new health related stats have been added which changes the struct lnet_counters that we previously had (patch https://review.whamcloud.com/32949 "LU-9120 lnet: add global health statistics"). Due to this, struct srpc_stat_reply is changed as it looks like this -

 struct srpc_stat_reply {
        __u32                   str_status;
        struct lst_sid          str_sid;
        struct sfw_counters     str_fw; 
        struct srpc_counters    str_rpc;
        struct lnet_counters    str_lnet;
 } WIRE_ATTR;

 struct lnet_counters {
        __u32   msgs_alloc;
        __u32   msgs_max;
+       __u32   rst_alloc;
        __u32   errors;
        __u32   send_count;
        __u32   recv_count;
        __u32   route_count;
        __u32   drop_count;
+       __u32   resend_count;
+       __u32   response_timeout_count;
+       __u32   local_interrupt_count;
+       __u32   local_dropped_count;
+       __u32   local_aborted_count;
+       __u32   local_no_route_count;
+       __u32   local_timeout_count;
+       __u32   local_error_count;
+       __u32   remote_dropped_count;
+       __u32   remote_error_count;
+       __u32   remote_timeout_count;
+       __u32   network_timeout_count;
        __u64   send_length;
        __u64   recv_length;
        __u64   route_length;
        __u64   drop_length; 
} WIRE_ATTR;

 

Amir's idea - 
"What we can do is make a copy of the structure which is similar to the older one. And in the post health selftest we can have a translation function which takes the new structure and copies the relevant fields to the old one. This way selftest remains backwards compatible"



 Comments   
Comment by Andreas Dilger [ 24/Sep/18 ]

To understand this issue better, it would be useful to firstly describe which structure is affected, whether this compatibility is between userspace and the kernel, or between peers over the network. Please link this ticket to the ticket/patch that introduced the compatibility issue. That would make it more clear between which versions the incompatibility exists, and how much effort is needed to maintain compatibility.

Comment by Andreas Dilger [ 24/Sep/18 ]

Also, please fill in the Affects Version and Fix Version fields. If this incompatibility is introduced in current master, and we don't land the fix for this issue before 2.12 is released, does that mean cross-version LST usage is broken? That would make this issue a must-fix for 2.12 (ie. Critical or Blocker)

Comment by Sonia Sharma (Inactive) [ 25/Sep/18 ]

Updated

Comment by Andreas Dilger [ 25/Sep/18 ]

Having looked at this change, I'd say the current implementation is quite problematic.  It will require field-by-field data copying, and there isn't an easy way for older tools/clients to handle this easily.  Conversely, if all of the new fields are added to the end of the data structure, then it would be trivial for old clients to ignore the remaining fields, and new clients to just assume the new fields are zero of they are not present in the old struct. Is there any reason these fields were added in the middle of the struct, even being added into two disjoint places in the struct?

I see in this patch that lnet_selftest_structure_assertion() had the code commented out that was telling you that this change should not be done in this manner. The whole point of these assertions is to warn the developer that they are making a change that will break the userspace API or the network protocol, or both. That kind of change might be temporarily OK during testing, but should basically never be landed.

It appears that struct srpc_msg already has a mechanism to handle changes in a better manner - it has both a msg_magic and a msg_version field in the structure, that could be changed to indicate the presence of additional fields in the struct. New clients/tools would understand both the old/new version of the struct, and be able to (mostly) ignore the differences, except when accessing the stat_reply member. In the latter case, the new client would just avoid to access the added fields when the msg_version = 0x3[*], and when communicating with an old client/tool it just copies the smaller struct and drops the rest of the stats.

[*] NB - for the msg_version field, it is much better to make this a bitmap of features rather than a strict enumeration of versions. Having 2^32 versions is not useful as you will never have so many different features, and it is not clear to anyone what the difference between "v4" and "v7" of a struct/protocol would be. Conversely, having a SRPC_MSG_FEAT_LOCAL_COUNTER = 0x0002 (or similar) feature flag is easy to check, and it makes it clear to any reader what feature is be present in the struct. If this flag is not set (for unrelated messages), or there is no need to send the extra stats (for older clients/tools), then there is no reason to break compatibility. If some other flag is set that the tool doesn't understand, then either it can return an error, or (better) parse the parts of the struct it understands, and ignore the rest.

The best mechanism is if there are "compat" and "incompat" regions of the version (e.g. __u16 compat; __u16 incompat) and where a particular feature flag goes depends on what kind of changes are being made. Just having additional fields added to the end of the struct is typically a harmless "compat" change and is preferable. Changing the meaning of the fields (as was done in the original patch) would be an "incompat" change and should be avoided if at all possible (as it likely can be in this case).

I also now see that there is also a msg_ses_feats field that appears to allow peers to negotiate which mutually compatible features they support, but I'm not sure if this would allow anything other than returning an error if an unknown feature is advertised. For Lustre RPC connections, the client sends a bitmask of features that it supports, and the server masks out any feature bits that it doesn't understand (using its own "feature supported" bitmask) and returns it to the client rather than returning an error. This feature mask mechanism has served us very well for about 15 years, and it was inherited from ext2/3/4 which had another 10 years of use for managing on-disk vs. user tools vs. kernel compatibility. It would be much better if this were changed to have a permissive connection that accepted any features in the request, but masked off the unknown ones and returned it the the client. Otherwise, we just have a gratuitous "request with all known features, get an error reply with supported features, retry with supported features, continue as if it had done it right the first time" dance.

As for fixing this particular problem, I see some possible solutions, depending on how this is handled in the code:

  • if the addition of the extra fields to lnet_counters makes the message size grow (i.e. it is (now) the largest member of msg_body) then this might need additional handling to maintain compatibility
    • if the tools/peer return an error for larger srpc_msg then that should be fixed in its own right
      • allowing a message size >= the current sizeof(srpc_msg), at least to avoid such issues in the future
      • split the stats message into "old_stats" and "local_stats" and send them in two RPCs, assuming they are not constantly used)
  • if this is accepted, then these flags should only be set in stats messages when they are needed
  • if it is just a matter of determining whether the extra stat fields are present, then the srpc_msg version == feature field may be enough
Comment by Amir Shehata (Inactive) [ 25/Sep/18 ]

My main problem with lnet_selftest's use of the lnet_counters structure is that it creates a coupling where none should exist. We're taking an LNet internal structure and creating a wire protocol dependency. Now when we want to add extra statistics, then we have to jump through unnecessary hoops. IMO, there shouldn't have been this coupling between a test tool and LNet. That's why I'm proposing we break up this dependency in the first place. selftest wire protocol shouldn't be dependent on lnet's internal structure.

I think it makes more sense to me to separate selftest from LNet by the use of APIs. selftest shouldn't be accessing LNet structures directly. There should be an API that pulls whatever information selftest needs. And then selftest can store this data in whatever form it needs.

Comment by Gerrit Updater [ 26/Sep/18 ]

Sonia Sharma (sharmaso@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33242
Subject: LU-11422 lnet: Fix selftest backward compatibility post health
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a1dae49acc92b48b1b3c463065a8d8ed0f30b155

Comment by Gerrit Updater [ 10/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33242/
Subject: LU-11422 lnet: Fix selftest backward compatibility post health
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 60f6f2b480b482f2022cbea416d8bea87f848bec

Comment by Peter Jones [ 10/Oct/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:43:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.