[LU-12041] Fail to set global value with lnetctl import Created: 04/Mar/19 Updated: 16/Oct/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | lnet, medium | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The global tunable "retry_count" has a dependency on the global tunable "transaction_timeout". I noticed that when using lnetctl import to configure LNet that retry_count would sometimes fail to be set because it needs to be less than or equal to "transaction_timeout". Here's the lnet.conf: sles15build01:~ # cat /tmp/lnet.conf
net:
- net type: tcp
local NI(s):
- interfaces:
0: eth0
- interfaces:
0: eth1
route:
- net: o2ib
gateway: 192.168.2.24@tcp
global:
health_sensitivity: 70
transaction_timeout: 70
retry_count: 70
recovery_interval: 70
router_sensitivity: 70
Here are the module parameter values before import: sles15build01:/sys/module/lnet/parameters # lnetctl lnet unconfigure; lustre_rmmod; modprobe lnet; lnetctl lnet configure sles15build01:/sys/module/lnet/parameters # cd $PWD; for i in lnet_health_sensitivity lnet_recovery_interval lnet_retry_count lnet_transaction_timeout router_sensitivity_percentage; do echo "$i: $(cat $i)"; done lnet_health_sensitivity: 1 lnet_recovery_interval: 1 lnet_retry_count: 3 lnet_transaction_timeout: 10 router_sensitivity_percentage: 100 And here are the values after import. Note that lnet_retry_count is unchanged: sles15build01:/sys/module/lnet/parameters # lnetctl import /tmp/lnet.conf sles15build01:/sys/module/lnet/parameters # cd $PWD; for i in lnet_health_sensitivity lnet_recovery_interval lnet_retry_count lnet_transaction_timeout router_sensitivity_percentage; do echo "$i: $(cat $i)"; done lnet_health_sensitivity: 70 lnet_recovery_interval: 70 lnet_retry_count: 3 lnet_transaction_timeout: 70 router_sensitivity_percentage: 70 sles15build01:/sys/module/lnet/parameters # And the following is logged to dmesg: [257406.875289] LNetError: 11708:0:(api-ni.c:513:retry_count_set()) Invalid value for lnet_retry_count (70). Has to be smaller than lnet_transaction_timeout (10) Note that while the error message says "Has to be smaller", the code actually allows values less than or equal. static int
retry_count_set(const char *val, cfs_kernel_param_arg_t *kp)
{
...
if (value > lnet_transaction_timeout) {
mutex_unlock(&the_lnet.ln_api_mutex);
CERROR("Invalid value for lnet_retry_count (%lu). "
"Has to be smaller than lnet_transaction_timeout (%u)\n",
value, lnet_transaction_timeout);
return -EINVAL;
}
|
| Comments |
| Comment by Chris Horn [ 04/Mar/19 ] |
|
As an aside, I think it would be better if these variables had consistent naming between the yaml and the actual module parameters. YAML Name Mod Param Name health_sensitivity lnet_health_sensitivity transaction_timeout lnet_transaction_timeout retry_count lnet_retry_count recovery_interval lnet_recovery_interval router_sensitivity router_sensitivity_percentage numa_range lnet_numa_range max_intf lnet_interfaces_max discovery lnet_peer_discovery_disabled (has an inverse relationship!?) |
| Comment by Amir Shehata (Inactive) [ 05/Mar/19 ] |
|
I see the problem there. We should be taking into consideration the dependency between retry_count and transaction_timeout. So when you configure it through YAML, it would configure the transaction_timeout first and then the retry_count. Regarding the YAML vs Mod Param name, the only concern I have there is that the module param names are longer and might be "too much" to type. If that consensus is that's not a problem, then I don't have a problem of changing that in YAML. However, note that LNet Health is out in 2.12. I don't think it's widely used yet, so I'm not sure if that's going to be a problem with backwards compatibility.
|