Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17265

sanity test_39r: atime on client 1699192823 != ost 0x65479ff6

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/873c927b-d963-40fa-830f-f3caea45a955

      test_39r failed with the following error:

      atime on client 1699192823 != ost 0x65479ff6
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/100162 - 4.18.0-477.21.1.el8_8.aarch64
      servers: https://build.whamcloud.com/job/lustre-reviews/100162 - 4.18.0-477.21.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_39r - atime on client 1699192823 != ost 0x65479ff6

      Attachments

        Issue Links

          Activity

            [LU-17265] sanity test_39r: atime on client 1699192823 != ost 0x65479ff6
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53035/
            Subject: LU-17265 tests: allow margin for sanity/39r
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c5aa16db172afc9cbf0d4fd2c85261fef1a40d7b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53035/ Subject: LU-17265 tests: allow margin for sanity/39r Project: fs/lustre-release Branch: master Current Patch Set: Commit: c5aa16db172afc9cbf0d4fd2c85261fef1a40d7b

            "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53035
            Subject: LU-17265 tests: Add 2s (leniency) in comparing timestamps
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a166c6aed4dc3b1114969d1b3eb8ce5ce26a4370

            gerrit Gerrit Updater added a comment - "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53035 Subject: LU-17265 tests: Add 2s (leniency) in comparing timestamps Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a166c6aed4dc3b1114969d1b3eb8ce5ce26a4370

            Sure. I have assigned it to my name and thanks for explaination on the problem.

            arshad512 Arshad Hussain added a comment - Sure. I have assigned it to my name and thanks for explaination on the problem.

            Arshad, any chance you can push a fix for this, it should just be a one-line test script fix.

            adilger Andreas Dilger added a comment - Arshad, any chance you can push a fix for this, it should just be a one-line test script fix.

            The comparisons are done within "$((...))" so there is hex-to-decimal conversion, but it is a bit confusing in the error message.

            Lustre should never be using the server timestamps when storing attributes on the servers. That avoids problems like NFS where the client gets tons of errors during a build when the client and server have different timestamps.

            That said, it seems possible that there is a race between when the client is sending the read RPCs for "dd" and when the client VFS updates the attributes on the local inode. We do not send another RPC to the OST to update the timestamps, so it might be a second outdated, but shouldn't be 10s outdated (ie. didn't get updated at all).

            So it seems the test should be lenient for a 1s (or 2s?) difference in the timestamps, and that will still achieve the goals of the test.

            adilger Andreas Dilger added a comment - The comparisons are done within "$((...))" so there is hex-to-decimal conversion, but it is a bit confusing in the error message. Lustre should never be using the server timestamps when storing attributes on the servers. That avoids problems like NFS where the client gets tons of errors during a build when the client and server have different timestamps. That said, it seems possible that there is a race between when the client is sending the read RPCs for "dd" and when the client VFS updates the attributes on the local inode. We do not send another RPC to the OST to update the timestamps, so it might be a second outdated, but shouldn't be 10s outdated (ie. didn't get updated at all). So it seems the test should be lenient for a 1s (or 2s?) difference in the timestamps, and that will still achieve the goals of the test.
            arshad512 Arshad Hussain added a comment - - edited

             

            atime on client 1699192823 != ost 0x65479ff6

            Two obervations here:

             

            • Compression is done between decimal vs hex. Which is not wrong. Maybe bringing it to same unit would not be wrong
            • The drift is just  1 second. (Time on server is 1 second early than what was read on client)
            arshad512 Arshad Hussain added a comment - - edited   atime on client 1699192823 != ost 0x65479ff6 Two obervations here:   Compression is done between decimal vs hex. Which is not wrong. Maybe bringing it to same unit would not be wrong The drift is just  1 second. (Time on server is 1 second early than what was read on client)

            People

              arshad512 Arshad Hussain
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: