Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
None
-
3
-
9223372036854775807
Description
LNet Selftest post LNet Health landing loses backward compatibility, which means lnet-selftest cannot be run between cross-version peers (Lustre 2.12 and pre Lustre 2.12). We should fix that.
In LNet Health feature, new health related stats have been added which changes the struct lnet_counters that we previously had (patch https://review.whamcloud.com/32949 "LU-9120 lnet: add global health statistics"). Due to this, struct srpc_stat_reply is changed as it looks like this -
struct srpc_stat_reply { __u32 str_status; struct lst_sid str_sid; struct sfw_counters str_fw; struct srpc_counters str_rpc; struct lnet_counters str_lnet; } WIRE_ATTR; struct lnet_counters { __u32 msgs_alloc; __u32 msgs_max; + __u32 rst_alloc; __u32 errors; __u32 send_count; __u32 recv_count; __u32 route_count; __u32 drop_count; + __u32 resend_count; + __u32 response_timeout_count; + __u32 local_interrupt_count; + __u32 local_dropped_count; + __u32 local_aborted_count; + __u32 local_no_route_count; + __u32 local_timeout_count; + __u32 local_error_count; + __u32 remote_dropped_count; + __u32 remote_error_count; + __u32 remote_timeout_count; + __u32 network_timeout_count; __u64 send_length; __u64 recv_length; __u64 route_length; __u64 drop_length; } WIRE_ATTR;
Amir's idea -
"What we can do is make a copy of the structure which is similar to the older one. And in the post health selftest we can have a translation function which takes the new structure and copies the relevant fields to the old one. This way selftest remains backwards compatible"
Attachments
Issue Links
- is related to
-
LU-9120 LNet Network Health Feature
- Resolved