Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1158

nanosecond timestamp support for Lustre

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.4.0
    • 3
    • 4515

    Description

      The current Lustre network protocol has support for a 64-bit timestamp of seconds, but does not have a field for passing the nanosecond timestamp from clients to servers and back again.

      It would be relatively straight-forward to put 3x __u32 nanosecond timestamps in the reserved fields in struct obdo and struct mdt_body. These fields are currently always initialized to 0, so there wouldn't even need to be a protocol change or feature to begin using these fields for nanoseconds - just copy them in/out of the RPC structures, and old clients/servers will just store 0 there, and ignore any nanosecond timestamps that are sent to them (no differently than they do today).

      It is more complex to add the nanosecond timestamps to struct ost_lvb, which is most commonly used for glimpse locks (stat) on OST objects. This will require a structure change to fit the extra 3x __u32 nanosecond timestamps into ost_lvb, which may require a protocol change. It may be possible if this structure is passed in a separate ptlrpc message buffer that the larger size will be ignored by older clients, which would avoid the need for additional complexity for interoperability.

      Attachments

        Issue Links

          Activity

            [LU-1158] nanosecond timestamp support for Lustre
            gerrit Gerrit Updater added a comment - - edited

            "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55849
            Subject: LU-1158 general: interop of nanosecond timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c8f3fc718664abbc56eb432ecaca7c2faea00942

            gerrit Gerrit Updater added a comment - - edited "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55849 Subject: LU-1158 general: interop of nanosecond timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c8f3fc718664abbc56eb432ecaca7c2faea00942

            "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53313
            Subject: LU-1158 general: support nanosecond timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3dc43cd08efc8347cdceed3939be7c33256a081c

            gerrit Gerrit Updater added a comment - "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53313 Subject: LU-1158 general: support nanosecond timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3dc43cd08efc8347cdceed3939be7c33256a081c

            Clearly I wasn't thinking of only a __u32 nanoseconds timestamp, since that won't even last until the end of this comment, let alone to Y2038, but rather in addition to the existing 64-bit seconds field, as was discussed in the original description.

            On the other hand, passing the entire timestamp as a 64-bit nanosecond timestamp gives us roughly 292 years before signed overflow, and could simplify the wire protocol change (no need to change the size of the fields) with the added complexity that there would need to be a connection flag to indicate if this client is sending timestamps in seconds or nanoseconds. Potentially we could just assume any value larger than 2^32 is going to be nanoseconds (it is unlikely that any current Lustre releases would still be running in 20 years, and conversely 2^60 ns is needed to get to 2006, so they are very unlikely to conflict), but using an OBD_CONNECT_NANOSECONDS flag is not too hard. The main complexity is that this flag would need to be checked in many places, possibly places where the client export is not easily available, so just making a decision based on the size of the timestamp is relatively safe, and the old seconds format could eventually be deprecated with little effort.

            adilger Andreas Dilger added a comment - Clearly I wasn't thinking of only a __u32 nanoseconds timestamp, since that won't even last until the end of this comment, let alone to Y2038, but rather in addition to the existing 64-bit seconds field, as was discussed in the original description. On the other hand, passing the entire timestamp as a 64-bit nanosecond timestamp gives us roughly 292 years before signed overflow, and could simplify the wire protocol change (no need to change the size of the fields) with the added complexity that there would need to be a connection flag to indicate if this client is sending timestamps in seconds or nanoseconds. Potentially we could just assume any value larger than 2^32 is going to be nanoseconds (it is unlikely that any current Lustre releases would still be running in 20 years, and conversely 2^60 ns is needed to get to 2006, so they are very unlikely to conflict), but using an OBD_CONNECT_NANOSECONDS flag is not too hard. The main complexity is that this flag would need to be checked in many places, possibly places where the client export is not easily available, so just making a decision based on the size of the timestamp is relatively safe, and the old seconds format could eventually be deprecated with little effort.

            2^32 -1 nanoseconds gives us 4.294967295 seconds until overflow.  So using just 32 bit nanoseconds time stamps are not very useful. As Atrem pointed out LNet already sends 64 bit time in seconds. We can use a 32 bit field to add nanosecond value along the already used seconds send. That is why I compared it to struct timespec64 = { time64_t tv_sec; long tv_nsec }

            Atrem all the needed infrastructure to support the linux kernel 64 bit time handling has been merged to the latest lustre. Just don't use the cfs time wrappers since they will be going away.

            simmonsja James A Simmons added a comment - 2^32 -1 nanoseconds gives us 4.294967295 seconds until overflow.  So using just 32 bit nanoseconds time stamps are not very useful. As Atrem pointed out LNet already sends 64 bit time in seconds. We can use a 32 bit field to add nanosecond value along the already used seconds send. That is why I compared it to struct timespec64 = { time64_t tv_sec; long tv_nsec } Atrem all the needed infrastructure to support the linux kernel 64 bit time handling has been merged to the latest lustre. Just don't use the cfs time wrappers since they will be going away.

            Hmm, when does ns-since-epoch overflow? Maybe 2 extra bits from the 2^30 ns in a 32-bit field... That would simplify the protocol change, and give us 4*140 years extra?

            adilger Andreas Dilger added a comment - Hmm, when does ns-since-epoch overflow? Maybe 2 extra bits from the 2^30 ns in a 32-bit field... That would simplify the protocol change, and give us 4*140 years extra?

            simmonsja So, do you think LU-9019 helps "transmit struct timespec64 over the wire" somehow?

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - simmonsja So, do you think LU-9019 helps "transmit struct timespec64 over the wire" somehow?

            Oh I see. You want to basically transmit struct timespec64 over the wire. I was thinking in terms of nanoseconds since the epoch being transmitted.

            simmonsja James A Simmons added a comment - Oh I see. You want to basically transmit struct timespec64 over the wire. I was thinking in terms of nanoseconds since the epoch being transmitted.

            James, I don't understand your comment. Why would we ever want more than a 32-bit field for nanoseconds? Surely there can't be more than 2^32 nanoseconds in a second? I understand that there can be leap seconds and other time adjustments that might result in over 10^9 nanoseconds in a second, but using a full 64-bit field for nanoseconds in the network protocol is just a waste of space.

            adilger Andreas Dilger added a comment - James, I don't understand your comment. Why would we ever want more than a 32-bit field for nanoseconds? Surely there can't be more than 2^32 nanoseconds in a second? I understand that there can be leap seconds and other time adjustments that might result in over 10^9 nanoseconds in a second, but using a full 64-bit field for nanoseconds in the network protocol is just a waste of space.

            simmonsja thanks a lot for answer! I am going to look LU-9019 now.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - simmonsja thanks a lot for answer! I am going to look  LU-9019  now.

            This overlaps with the 64 bit time work I have been doing. Now that we support ktime_t this can easily be handled. Just as a note DO NOT use 32 bit fields for nanoseconds. This with not work after 2038 and due to that upstream will reject the patch.

            simmonsja James A Simmons added a comment - This overlaps with the 64 bit time work I have been doing. Now that we support ktime_t this can easily be handled. Just as a note DO NOT use 32 bit fields for nanoseconds. This with not work after 2038 and due to that upstream will reject the patch.

            Are any plans exist currently to rewrite the patch against the variable sized LVB patch?

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Are any plans exist currently to rewrite the patch against the variable sized LVB patch?

            People

              flei Feng Lei
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated: