Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1158

nanosecond timestamp support for Lustre

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.4.0
    • 3
    • 4515

    Description

      The current Lustre network protocol has support for a 64-bit timestamp of seconds, but does not have a field for passing the nanosecond timestamp from clients to servers and back again.

      It would be relatively straight-forward to put 3x __u32 nanosecond timestamps in the reserved fields in struct obdo and struct mdt_body. These fields are currently always initialized to 0, so there wouldn't even need to be a protocol change or feature to begin using these fields for nanoseconds - just copy them in/out of the RPC structures, and old clients/servers will just store 0 there, and ignore any nanosecond timestamps that are sent to them (no differently than they do today).

      It is more complex to add the nanosecond timestamps to struct ost_lvb, which is most commonly used for glimpse locks (stat) on OST objects. This will require a structure change to fit the extra 3x __u32 nanosecond timestamps into ost_lvb, which may require a protocol change. It may be possible if this structure is passed in a separate ptlrpc message buffer that the larger size will be ignored by older clients, which would avoid the need for additional complexity for interoperability.

      Attachments

        Issue Links

          Activity

            [LU-1158] nanosecond timestamp support for Lustre

            "Sohei Koyama <skoyama@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58094
            Subject: LU-1158 general: support nanosecond timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: baf2c85fd0f9f51f93bd6903dd7e871f20904a67

            gerrit Gerrit Updater added a comment - "Sohei Koyama <skoyama@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58094 Subject: LU-1158 general: support nanosecond timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: baf2c85fd0f9f51f93bd6903dd7e871f20904a67
            mrasobarnett Matt Rásó-Barnett made changes -
            Link New: This issue is related to EXR-574 [ EXR-574 ]
            adilger Andreas Dilger made changes -
            Labels New: always_except

            Note that the POSIX compliance test lustre/tests/pjdfstests.sh is skipping the utimensat_08 subtest because of a lack of nanosecond timestamp support. That subtest should be removed from the always_except list with patch 56236 or a follow-on patch.

            adilger Andreas Dilger added a comment - Note that the POSIX compliance test lustre/tests/pjdfstests.sh is skipping the utimensat_08 subtest because of a lack of nanosecond timestamp support. That subtest should be removed from the always_except list with patch 56236 or a follow-on patch.
            adilger Andreas Dilger made changes -
            Labels Original: always_except
            adilger Andreas Dilger made changes -
            Labels New: always_except
            gerrit Gerrit Updater added a comment - - edited

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56362
            Subject: LU-1158 general: support nanosec timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            –Commit: f737aef240075fe2933336a4cbd510e69a2d5507-

            gerrit Gerrit Updater added a comment - - edited "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56362 Subject: LU-1158 general: support nanosec timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 –Commit: f737aef240075fe2933336a4cbd510e69a2d5507-

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56236
            Subject: LU-1158 general: support nanosec timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8d3821e8eda1882ff961f49b864f4f4710b08e18

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56236 Subject: LU-1158 general: support nanosec timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8d3821e8eda1882ff961f49b864f4f4710b08e18
            flei Feng Lei added a comment - - edited

            Here is an example of bug caused by converting obdo.o_a/m/ctime to obdo.o_a/m/ctime_ns:

            In the previous version (PatchSet#37), obdo.o_mtime is changed to obdo.o_mtime_ns, and function lustre_set_wire_obdo()/lustre_get_wire_obdo() are enhanced to convert timestamps between seconds/nanoseconds. Looks perfect?

            But sanity/39c fails with new client + old server after cancelling osc lru locks. The bug is in function osc_build_rpc(). The function first generates the rpc request message with function osc_brw_prep_request(), in which all the timestamps are converted correctly. But then it grabs ost body again with body = req_capsule_client_get(&req->rq_pill, &RMF_OST_BODY), refreshes the timestamps in the message with cl_req_attr_set(env, osc2cl(obj), crattr). Here it's hard for me to find it and insert the converting function again.

            grep RMF_OST_BODY in the source code, there are tens of it. Even if I can check all of them and insert the converting function properly today, I'm not confident the patches in the future can remember to do the coverting correctly. So personally I believe it's better to add new fields for nsec for this struct.

            Similar reason for mdt_body and ost_lvb.

            flei Feng Lei added a comment - - edited Here is an example of bug caused by converting obdo.o_a/m/ctime to obdo.o_a/m/ctime_ns: In the previous version (PatchSet#37), obdo.o_mtime is changed to obdo.o_mtime_ns, and function lustre_set_wire_obdo()/lustre_get_wire_obdo() are enhanced to convert timestamps between seconds/nanoseconds. Looks perfect? But sanity/39c fails with new client + old server after cancelling osc lru locks. The bug is in function osc_build_rpc(). The function first generates the rpc request message with function osc_brw_prep_request(), in which all the timestamps are converted correctly. But then it grabs ost body again with body = req_capsule_client_get(&req->rq_pill, &RMF_OST_BODY) , refreshes the timestamps in the message with cl_req_attr_set(env, osc2cl(obj), crattr) . Here it's hard for me to find it and insert the converting function again. grep RMF_OST_BODY in the source code, there are tens of it. Even if I can check all of them and insert the converting function properly today, I'm not confident the patches in the future can remember to do the coverting correctly. So personally I believe it's better to add new fields for nsec for this struct. Similar reason for mdt_body and ost_lvb.
            flei Feng Lei added a comment - - edited

            The new design points:

            • Convert the in-memory s64 timestamps from epoch seconds to epoch nanoseconds. At the same time, the variable names are changed from a/m/ctime to a/m/ctime_ns to indicate this change.
            • All the timespec64 variables won't change.
            • If you see a variable name like a/m/ctime, it's epoch seconds. a/m/ctime_ns, it's epoch nanoseconds. a/m/ctime_nsec, it's nanosecond part of a timestamp.
            • Enable OBD_CONNECT_NANOSEC flag for connection. Only if both client and server support it, the connection will have this flag. So if either client or server is an old one which does not support nanosecond timestamps, the connection won't have this flag.
            • If connection does have OBD_CONNECT_NANOSEC, the timestamps on the wire are also in nanosecond, both client and server are new version, , the client/server can simply copy the data from/to wire.
            • If connection does not have OBD_CONNECT_NANOSEC flag, all the timestamp on the wire keep in second. In this case, old client/serer works as before; new client/server needs to convert the timestamps between seconds (on the wire) and nanoseconds (in memory).
            • The converting between seconds and nanoseconds applies to most structs. Except 3 structs: ost_lvb, mdt_body and obdo.
              • These 3 structs do not convert original timestamps from second to nanosecond, but add additional time_nsec fields, which are the nanosecond part of timestamps
              • Whether connecton has OBD_CONNECT_NANOSEC flag, these 3 structs won't do any converting. a/m/ctime_nsec fields may be filled by new client/server can be ignored by old client/server.
              • They are treated differently because:
                • They don't have regular pack/unpack functions.
                • The existing code may dereference timestamps in message body directly without any packing/unpacking. It's hard to find them, check OBD_CONNEC_NANOSEC flag, and convert timestamps correclty.
            flei Feng Lei added a comment - - edited The new design points: Convert the in-memory s64 timestamps from epoch seconds to epoch nanoseconds. At the same time, the variable names are changed from a/m/ctime to a/m/ctime_ns to indicate this change. All the timespec64 variables won't change. If you see a variable name like a/m/ctime, it's epoch seconds. a/m/ctime_ns, it's epoch nanoseconds. a/m/ctime_nsec, it's nanosecond part of a timestamp. Enable OBD_CONNECT_NANOSEC flag for connection. Only if both client and server support it, the connection will have this flag. So if either client or server is an old one which does not support nanosecond timestamps, the connection won't have this flag. If connection does have OBD_CONNECT_NANOSEC, the timestamps on the wire are also in nanosecond, both client and server are new version, , the client/server can simply copy the data from/to wire. If connection does not have OBD_CONNECT_NANOSEC flag, all the timestamp on the wire keep in second. In this case, old client/serer works as before; new client/server needs to convert the timestamps between seconds (on the wire) and nanoseconds (in memory). The converting between seconds and nanoseconds applies to most structs. Except 3 structs: ost_lvb , mdt_body and obdo . These 3 structs do not convert original timestamps from second to nanosecond, but add additional time_nsec fields, which are the nanosecond part of timestamps Whether connecton has OBD_CONNECT_NANOSEC flag, these 3 structs won't do any converting. a/m/ctime_nsec fields may be filled by new client/server can be ignored by old client/server. They are treated differently because: They don't have regular pack/unpack functions. The existing code may dereference timestamps in message body directly without any packing/unpacking. It's hard to find them, check OBD_CONNEC_NANOSEC flag, and convert timestamps correclty.

            People

              flei Feng Lei
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated: