[LU-10672] lnet_notify() called incorrectly Created: 15/Feb/18 Updated: 16/Aug/22 Resolved: 03/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Recent test logs contain several messages of the form [15236.586059] LNet: 2816:0:(router.c:1822:lnet_notify()) Ignoring prediction from 10.9.5.183@tcp of 10.9.5.186@tcp down 15180764 seconds in the future See for example https://testing.hpdd.intel.com/test_logs/58f7b44e-1224-11e8-a10a-52540065bddc/show_text lnet_notify() expects callers to pass an absolute time in seconds for its when parameter. But it looks like it's getting a relative value from LNetCtl(): case IOC_LIBCFS_NOTIFY_ROUTER: { time64_t deadline = ktime_get_real_seconds() - data->ioc_u64[0]; return lnet_notify(NULL, data->ioc_nid, data->ioc_flags, deadline); } And it's getting timestamp in jiffies in ksocknal_peer_failed(): if (notify) lnet_notify(peer_ni->ksnp_ni, peer_ni->ksnp_id.nid, 0, cfs_time_seconds(last_alive)); /* to jiffies */ The other call sites should be audited as well. This seems to be partially due to |
| Comments |
| Comment by Gerrit Updater [ 16/Feb/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/31339 |
| Comment by Gerrit Updater [ 03/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31339/ |
| Comment by Peter Jones [ 03/Mar/18 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 16/Aug/22 ] |
|
"Akash B <akash-b@hpe.com>" uploaded a new patch: https://review.whamcloud.com/48226 |