Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1158

nanosecond timestamp support for Lustre

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.4.0
    • 3
    • 4515

    Description

      The current Lustre network protocol has support for a 64-bit timestamp of seconds, but does not have a field for passing the nanosecond timestamp from clients to servers and back again.

      It would be relatively straight-forward to put 3x __u32 nanosecond timestamps in the reserved fields in struct obdo and struct mdt_body. These fields are currently always initialized to 0, so there wouldn't even need to be a protocol change or feature to begin using these fields for nanoseconds - just copy them in/out of the RPC structures, and old clients/servers will just store 0 there, and ignore any nanosecond timestamps that are sent to them (no differently than they do today).

      It is more complex to add the nanosecond timestamps to struct ost_lvb, which is most commonly used for glimpse locks (stat) on OST objects. This will require a structure change to fit the extra 3x __u32 nanosecond timestamps into ost_lvb, which may require a protocol change. It may be possible if this structure is passed in a separate ptlrpc message buffer that the larger size will be ignored by older clients, which would avoid the need for additional complexity for interoperability.

      Attachments

        Issue Links

          Activity

            [LU-1158] nanosecond timestamp support for Lustre

            "Sohei Koyama <skoyama@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58094
            Subject: LU-1158 general: support nanosecond timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: baf2c85fd0f9f51f93bd6903dd7e871f20904a67

            gerrit Gerrit Updater added a comment - "Sohei Koyama <skoyama@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58094 Subject: LU-1158 general: support nanosecond timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: baf2c85fd0f9f51f93bd6903dd7e871f20904a67

            Note that the POSIX compliance test lustre/tests/pjdfstests.sh is skipping the utimensat_08 subtest because of a lack of nanosecond timestamp support. That subtest should be removed from the always_except list with patch 56236 or a follow-on patch.

            adilger Andreas Dilger added a comment - Note that the POSIX compliance test lustre/tests/pjdfstests.sh is skipping the utimensat_08 subtest because of a lack of nanosecond timestamp support. That subtest should be removed from the always_except list with patch 56236 or a follow-on patch.
            gerrit Gerrit Updater added a comment - - edited

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56362
            Subject: LU-1158 general: support nanosec timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            –Commit: f737aef240075fe2933336a4cbd510e69a2d5507-

            gerrit Gerrit Updater added a comment - - edited "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56362 Subject: LU-1158 general: support nanosec timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 –Commit: f737aef240075fe2933336a4cbd510e69a2d5507-

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56236
            Subject: LU-1158 general: support nanosec timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8d3821e8eda1882ff961f49b864f4f4710b08e18

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56236 Subject: LU-1158 general: support nanosec timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8d3821e8eda1882ff961f49b864f4f4710b08e18
            flei Feng Lei added a comment - - edited

            Here is an example of bug caused by converting obdo.o_a/m/ctime to obdo.o_a/m/ctime_ns:

            In the previous version (PatchSet#37), obdo.o_mtime is changed to obdo.o_mtime_ns, and function lustre_set_wire_obdo()/lustre_get_wire_obdo() are enhanced to convert timestamps between seconds/nanoseconds. Looks perfect?

            But sanity/39c fails with new client + old server after cancelling osc lru locks. The bug is in function osc_build_rpc(). The function first generates the rpc request message with function osc_brw_prep_request(), in which all the timestamps are converted correctly. But then it grabs ost body again with body = req_capsule_client_get(&req->rq_pill, &RMF_OST_BODY), refreshes the timestamps in the message with cl_req_attr_set(env, osc2cl(obj), crattr). Here it's hard for me to find it and insert the converting function again.

            grep RMF_OST_BODY in the source code, there are tens of it. Even if I can check all of them and insert the converting function properly today, I'm not confident the patches in the future can remember to do the coverting correctly. So personally I believe it's better to add new fields for nsec for this struct.

            Similar reason for mdt_body and ost_lvb.

            flei Feng Lei added a comment - - edited Here is an example of bug caused by converting obdo.o_a/m/ctime to obdo.o_a/m/ctime_ns: In the previous version (PatchSet#37), obdo.o_mtime is changed to obdo.o_mtime_ns, and function lustre_set_wire_obdo()/lustre_get_wire_obdo() are enhanced to convert timestamps between seconds/nanoseconds. Looks perfect? But sanity/39c fails with new client + old server after cancelling osc lru locks. The bug is in function osc_build_rpc(). The function first generates the rpc request message with function osc_brw_prep_request(), in which all the timestamps are converted correctly. But then it grabs ost body again with body = req_capsule_client_get(&req->rq_pill, &RMF_OST_BODY) , refreshes the timestamps in the message with cl_req_attr_set(env, osc2cl(obj), crattr) . Here it's hard for me to find it and insert the converting function again. grep RMF_OST_BODY in the source code, there are tens of it. Even if I can check all of them and insert the converting function properly today, I'm not confident the patches in the future can remember to do the coverting correctly. So personally I believe it's better to add new fields for nsec for this struct. Similar reason for mdt_body and ost_lvb.
            flei Feng Lei added a comment - - edited

            The new design points:

            • Convert the in-memory s64 timestamps from epoch seconds to epoch nanoseconds. At the same time, the variable names are changed from a/m/ctime to a/m/ctime_ns to indicate this change.
            • All the timespec64 variables won't change.
            • If you see a variable name like a/m/ctime, it's epoch seconds. a/m/ctime_ns, it's epoch nanoseconds. a/m/ctime_nsec, it's nanosecond part of a timestamp.
            • Enable OBD_CONNECT_NANOSEC flag for connection. Only if both client and server support it, the connection will have this flag. So if either client or server is an old one which does not support nanosecond timestamps, the connection won't have this flag.
            • If connection does have OBD_CONNECT_NANOSEC, the timestamps on the wire are also in nanosecond, both client and server are new version, , the client/server can simply copy the data from/to wire.
            • If connection does not have OBD_CONNECT_NANOSEC flag, all the timestamp on the wire keep in second. In this case, old client/serer works as before; new client/server needs to convert the timestamps between seconds (on the wire) and nanoseconds (in memory).
            • The converting between seconds and nanoseconds applies to most structs. Except 3 structs: ost_lvb, mdt_body and obdo.
              • These 3 structs do not convert original timestamps from second to nanosecond, but add additional time_nsec fields, which are the nanosecond part of timestamps
              • Whether connecton has OBD_CONNECT_NANOSEC flag, these 3 structs won't do any converting. a/m/ctime_nsec fields may be filled by new client/server can be ignored by old client/server.
              • They are treated differently because:
                • They don't have regular pack/unpack functions.
                • The existing code may dereference timestamps in message body directly without any packing/unpacking. It's hard to find them, check OBD_CONNEC_NANOSEC flag, and convert timestamps correclty.
            flei Feng Lei added a comment - - edited The new design points: Convert the in-memory s64 timestamps from epoch seconds to epoch nanoseconds. At the same time, the variable names are changed from a/m/ctime to a/m/ctime_ns to indicate this change. All the timespec64 variables won't change. If you see a variable name like a/m/ctime, it's epoch seconds. a/m/ctime_ns, it's epoch nanoseconds. a/m/ctime_nsec, it's nanosecond part of a timestamp. Enable OBD_CONNECT_NANOSEC flag for connection. Only if both client and server support it, the connection will have this flag. So if either client or server is an old one which does not support nanosecond timestamps, the connection won't have this flag. If connection does have OBD_CONNECT_NANOSEC, the timestamps on the wire are also in nanosecond, both client and server are new version, , the client/server can simply copy the data from/to wire. If connection does not have OBD_CONNECT_NANOSEC flag, all the timestamp on the wire keep in second. In this case, old client/serer works as before; new client/server needs to convert the timestamps between seconds (on the wire) and nanoseconds (in memory). The converting between seconds and nanoseconds applies to most structs. Except 3 structs: ost_lvb , mdt_body and obdo . These 3 structs do not convert original timestamps from second to nanosecond, but add additional time_nsec fields, which are the nanosecond part of timestamps Whether connecton has OBD_CONNECT_NANOSEC flag, these 3 structs won't do any converting. a/m/ctime_nsec fields may be filled by new client/server can be ignored by old client/server. They are treated differently because: They don't have regular pack/unpack functions. The existing code may dereference timestamps in message body directly without any packing/unpacking. It's hard to find them, check OBD_CONNEC_NANOSEC flag, and convert timestamps correclty.
            gerrit Gerrit Updater added a comment - - edited

            "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55849
            Subject: LU-1158 general: interop of nanosecond timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c8f3fc718664abbc56eb432ecaca7c2faea00942

            gerrit Gerrit Updater added a comment - - edited "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55849 Subject: LU-1158 general: interop of nanosecond timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c8f3fc718664abbc56eb432ecaca7c2faea00942

            "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53313
            Subject: LU-1158 general: support nanosecond timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3dc43cd08efc8347cdceed3939be7c33256a081c

            gerrit Gerrit Updater added a comment - "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53313 Subject: LU-1158 general: support nanosecond timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3dc43cd08efc8347cdceed3939be7c33256a081c

            Clearly I wasn't thinking of only a __u32 nanoseconds timestamp, since that won't even last until the end of this comment, let alone to Y2038, but rather in addition to the existing 64-bit seconds field, as was discussed in the original description.

            On the other hand, passing the entire timestamp as a 64-bit nanosecond timestamp gives us roughly 292 years before signed overflow, and could simplify the wire protocol change (no need to change the size of the fields) with the added complexity that there would need to be a connection flag to indicate if this client is sending timestamps in seconds or nanoseconds. Potentially we could just assume any value larger than 2^32 is going to be nanoseconds (it is unlikely that any current Lustre releases would still be running in 20 years, and conversely 2^60 ns is needed to get to 2006, so they are very unlikely to conflict), but using an OBD_CONNECT_NANOSECONDS flag is not too hard. The main complexity is that this flag would need to be checked in many places, possibly places where the client export is not easily available, so just making a decision based on the size of the timestamp is relatively safe, and the old seconds format could eventually be deprecated with little effort.

            adilger Andreas Dilger added a comment - Clearly I wasn't thinking of only a __u32 nanoseconds timestamp, since that won't even last until the end of this comment, let alone to Y2038, but rather in addition to the existing 64-bit seconds field, as was discussed in the original description. On the other hand, passing the entire timestamp as a 64-bit nanosecond timestamp gives us roughly 292 years before signed overflow, and could simplify the wire protocol change (no need to change the size of the fields) with the added complexity that there would need to be a connection flag to indicate if this client is sending timestamps in seconds or nanoseconds. Potentially we could just assume any value larger than 2^32 is going to be nanoseconds (it is unlikely that any current Lustre releases would still be running in 20 years, and conversely 2^60 ns is needed to get to 2006, so they are very unlikely to conflict), but using an OBD_CONNECT_NANOSECONDS flag is not too hard. The main complexity is that this flag would need to be checked in many places, possibly places where the client export is not easily available, so just making a decision based on the size of the timestamp is relatively safe, and the old seconds format could eventually be deprecated with little effort.

            2^32 -1 nanoseconds gives us 4.294967295 seconds until overflow.  So using just 32 bit nanoseconds time stamps are not very useful. As Atrem pointed out LNet already sends 64 bit time in seconds. We can use a 32 bit field to add nanosecond value along the already used seconds send. That is why I compared it to struct timespec64 = { time64_t tv_sec; long tv_nsec }

            Atrem all the needed infrastructure to support the linux kernel 64 bit time handling has been merged to the latest lustre. Just don't use the cfs time wrappers since they will be going away.

            simmonsja James A Simmons added a comment - 2^32 -1 nanoseconds gives us 4.294967295 seconds until overflow.  So using just 32 bit nanoseconds time stamps are not very useful. As Atrem pointed out LNet already sends 64 bit time in seconds. We can use a 32 bit field to add nanosecond value along the already used seconds send. That is why I compared it to struct timespec64 = { time64_t tv_sec; long tv_nsec } Atrem all the needed infrastructure to support the linux kernel 64 bit time handling has been merged to the latest lustre. Just don't use the cfs time wrappers since they will be going away.

            People

              flei Feng Lei
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated: