[LU-17265] sanity test_39r: atime on client 1699192823 != ost 0x65479ff6 Created: 06/Nov/23  Updated: 20/Dec/23  Resolved: 29/Nov/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Arshad Hussain
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-16421 sanity-flr test_61a: atime: old '1670... Resolved
Related
is related to LU-13578 sanity test_39r: atime on client != ost Resolved
is related to LU-14091 sanity test_39r: 'atime on client 160... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/873c927b-d963-40fa-830f-f3caea45a955

test_39r failed with the following error:

atime on client 1699192823 != ost 0x65479ff6

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/100162 - 4.18.0-477.21.1.el8_8.aarch64
servers: https://build.whamcloud.com/job/lustre-reviews/100162 - 4.18.0-477.21.1.el8_lustre.x86_64

<<Please provide additional information about the failure here>>

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_39r - atime on client 1699192823 != ost 0x65479ff6



 Comments   
Comment by Arshad Hussain [ 06/Nov/23 ]

 

atime on client 1699192823 != ost 0x65479ff6

Two obervations here:

 

  • Compression is done between decimal vs hex. Which is not wrong. Maybe bringing it to same unit would not be wrong
  • The drift is just  1 second. (Time on server is 1 second early than what was read on client)
Comment by Andreas Dilger [ 07/Nov/23 ]

The comparisons are done within "$((...))" so there is hex-to-decimal conversion, but it is a bit confusing in the error message.

Lustre should never be using the server timestamps when storing attributes on the servers. That avoids problems like NFS where the client gets tons of errors during a build when the client and server have different timestamps.

That said, it seems possible that there is a race between when the client is sending the read RPCs for "dd" and when the client VFS updates the attributes on the local inode. We do not send another RPC to the OST to update the timestamps, so it might be a second outdated, but shouldn't be 10s outdated (ie. didn't get updated at all).

So it seems the test should be lenient for a 1s (or 2s?) difference in the timestamps, and that will still achieve the goals of the test.

Comment by Andreas Dilger [ 07/Nov/23 ]

Arshad, any chance you can push a fix for this, it should just be a one-line test script fix.

Comment by Arshad Hussain [ 08/Nov/23 ]

Sure. I have assigned it to my name and thanks for explaination on the problem.

Comment by Gerrit Updater [ 08/Nov/23 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53035
Subject: LU-17265 tests: Add 2s (leniency) in comparing timestamps
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a166c6aed4dc3b1114969d1b3eb8ce5ce26a4370

Comment by Gerrit Updater [ 29/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53035/
Subject: LU-17265 tests: allow margin for sanity/39r
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c5aa16db172afc9cbf0d4fd2c85261fef1a40d7b

Comment by Peter Jones [ 29/Nov/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:34:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.