Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9397

Inconsistence use of cfs_time_current() and ktime_get_real_seconds()

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      "LU-6245 libcfs: add ktime_get_real_seconds support" introduced a problem where sometimes we use cfs_time_current() and others ktime_get_real_seconds(). This leads to wrong comparison, leading to strange timing issues.

      One behavior observed were routers considering peers dead, due to these inconsistent comparison.

      We'll need to make sure that the LNet and LND code follows a consistent and correct approach.

      Attachments

        Issue Links

          Activity

            [LU-9397] Inconsistence use of cfs_time_current() and ktime_get_real_seconds()
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26813/
            Subject: LU-9397 ksocklnd: move remaining time handling to 64 bits
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 59c25356208706b4dd0920bc7754f21e2db14a0f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26813/ Subject: LU-9397 ksocklnd: move remaining time handling to 64 bits Project: fs/lustre-release Branch: master Current Patch Set: Commit: 59c25356208706b4dd0920bc7754f21e2db14a0f
            simmonsja James A Simmons added a comment - - edited

            Okay just wanted to verify that was the case for jiffies. The issue with jiffies they can also vary between nodes (LU-5443) depending on what the user selected for HZ so I like to migrate away as much as possible from them. Currently I'm working on the patch but I have to go over it very carefully. To easy to break things.

            simmonsja James A Simmons added a comment - - edited Okay just wanted to verify that was the case for jiffies. The issue with jiffies they can also vary between nodes ( LU-5443 ) depending on what the user selected for HZ so I like to migrate away as much as possible from them. Currently I'm working on the patch but I have to go over it very carefully. To easy to break things.

            As far as I can tell, the main argument for jiffies is that they're cheaper to get and use. The extra resolution doesn't matter for these cases.

            Using ktime_get_real_seconds() in particular should get you (assuming proper time sync in place) time stamp values that can (easily) be compared between nodes. That by itself might justify this.

            olaf Olaf Weber (Inactive) added a comment - As far as I can tell, the main argument for jiffies is that they're cheaper to get and use. The extra resolution doesn't matter for these cases. Using ktime_get_real_seconds() in particular should get you (assuming proper time sync in place) time stamp values that can (easily) be compared between nodes. That by itself might justify this.

            I have been looking at the socklnd code and it looks like everything is pretty much needs only second precision. It looks like everything could be changed into time64_t. Olaf do you see cases where we will need higher resolution? When we want to go to jiffies I can use cfs_time_seconds() as the case for the call to lnet_notify() and ksocknal_query(). Other than that I don't see the need for ktime_t.

            simmonsja James A Simmons added a comment - I have been looking at the socklnd code and it looks like everything is pretty much needs only second precision. It looks like everything could be changed into time64_t. Olaf do you see cases where we will need higher resolution? When we want to go to jiffies I can use cfs_time_seconds() as the case for the call to lnet_notify() and ksocknal_query(). Other than that I don't see the need for ktime_t.

            Good we are on the right track. I have to look over the ksocklnd driver and do a proper 64 bit time port.

            simmonsja James A Simmons added a comment - Good we are on the right track. I have to look over the ksocklnd driver and do a proper 64 bit time port.

            aside from Olaf's comments, I tested the fix and I don't see the issue any longer

            ashehata Amir Shehata (Inactive) added a comment - aside from Olaf's comments, I tested the fix and I don't see the issue any longer
            olaf Olaf Weber (Inactive) added a comment - - edited

            And in ksocknal_destroy_conn() the calculation in the CERROR() involving last_rcv is wrong. last_rcv is in jiffies, is subtracted from ktime_get_real_seconds(), with the result then being converted from jiffies to seconds.

            olaf Olaf Weber (Inactive) added a comment - - edited And in ksocknal_destroy_conn() the calculation in the CERROR() involving last_rcv is wrong. last_rcv is in jiffies, is subtracted from ktime_get_real_seconds() , with the result then being converted from jiffies to seconds.

            It looks like a similar issue exists for ksock_conn::ksnc_tx_last_post.

            olaf Olaf Weber (Inactive) added a comment - It looks like a similar issue exists for ksock_conn::ksnc_tx_last_post .

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/26813
            Subject: LU-9397 ksocklnd: make last_alive in units of jiffies
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7006890fcc484a8074a033517a6607e6ace15b8b

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/26813 Subject: LU-9397 ksocklnd: make last_alive in units of jiffies Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7006890fcc484a8074a033517a6607e6ace15b8b

            People

              simmonsja James A Simmons
              ashehata Amir Shehata (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: