Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.11.0
-
3
-
9223372036854775807
Description
Recent test logs contain several messages of the form
[15236.586059] LNet: 2816:0:(router.c:1822:lnet_notify()) Ignoring prediction from 10.9.5.183@tcp of 10.9.5.186@tcp down 15180764 seconds in the future
See for example https://testing.hpdd.intel.com/test_logs/58f7b44e-1224-11e8-a10a-52540065bddc/show_text
lnet_notify() expects callers to pass an absolute time in seconds for its when parameter. But it looks like it's getting a relative value from LNetCtl():
case IOC_LIBCFS_NOTIFY_ROUTER: { time64_t deadline = ktime_get_real_seconds() - data->ioc_u64[0]; return lnet_notify(NULL, data->ioc_nid, data->ioc_flags, deadline); }
And it's getting timestamp in jiffies in ksocknal_peer_failed():
if (notify) lnet_notify(peer_ni->ksnp_ni, peer_ni->ksnp_id.nid, 0, cfs_time_seconds(last_alive)); /* to jiffies */
The other call sites should be audited as well.
This seems to be partially due to LU-9019.
Attachments
Issue Links
- is related to
-
LU-9019 Migrate lustre to standard 64 bit time kernel API
- Resolved