Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10672

lnet_notify() called incorrectly

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.11.0
    • Fix Version/s: Lustre 2.11.0
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Recent test logs contain several messages of the form

      [15236.586059] LNet: 2816:0:(router.c:1822:lnet_notify()) Ignoring prediction from 10.9.5.183@tcp of 10.9.5.186@tcp down 15180764 seconds in the future
      

      See for example https://testing.hpdd.intel.com/test_logs/58f7b44e-1224-11e8-a10a-52540065bddc/show_text

      lnet_notify() expects callers to pass an absolute time in seconds for its when parameter. But it looks like it's getting a relative value from LNetCtl():

              case IOC_LIBCFS_NOTIFY_ROUTER: {
                      time64_t deadline = ktime_get_real_seconds() - data->ioc_u64[0];
      
                      return lnet_notify(NULL, data->ioc_nid, data->ioc_flags,
                                         deadline);
              }
      

      And it's getting timestamp in jiffies in ksocknal_peer_failed():

              if (notify)
                      lnet_notify(peer_ni->ksnp_ni, peer_ni->ksnp_id.nid, 0,
                                  cfs_time_seconds(last_alive)); /* to jiffies */
      

      The other call sites should be audited as well.

      This seems to be partially due to LU-9019.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              simmonsja James A Simmons
              Reporter:
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: