[LU-15728] 'relatime' is not working properly Created: 08/Apr/22  Updated: 21/Aug/23  Resolved: 23/Feb/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.7, Lustre 2.15.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: Aurelien Degremont (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Related
is related to LU-1783 sanity test_39l failed: atime is not ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

relatime behavior is properly managed by VFS, however Lustre also stores acmtime on OST objects and atime updates for OST objects should honor relatime behavior.

ci_noatime feature was introduced by LU-3832 to properly honor noatime option for OST objects. However, the relatime management was not added at the same time, even if it uses the same function.

relatime could be fixed adding the missing parts from touch_atime() to also support relatime.



 Comments   
Comment by Gerrit Updater [ 08/Apr/22 ]

"Aurelien Degremont <degremoa@amazon.com>" uploaded a new patch: https://review.whamcloud.com/47017
Subject: LU-15728 llite: fix relatime support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 88c75bdb30025744ca1cb90a94677015ab7adcdd

Comment by Gerrit Updater [ 08/Feb/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47017/
Subject: LU-15728 llite: fix relatime support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c10c6eeb37dd553166367b96369dca25183ace3b

Comment by Peter Jones [ 08/Feb/23 ]

Landed for 2.16

Comment by Alex Zhuravlev [ 09/Feb/23 ]

with this patch landed sanity/56oc started to fail sometimes:

54eb6da1f8	1	0	1	BAD	LU-16477 ldiskfs: Add ext4-enc-flag patch for RHEL9
c10c6eeb37	6	5	1	BAD	LU-15728 llite: fix relatime support
0c05dc21ab	6	6	0	GOOD	LU-6142 ldlm: minor list_entry improvements in ldlm_request.c
685fb4b17f	6	6	0	GOOD	LU-6142 ldlm: use list_for_each_entry in ldlm_lock.c
93230059ab	6	6	0	GOOD	LU-12275 tests: skip new nodemap params on old MGS
2f8f38effa	6	6	0	GOOD	LU-16412 llite: check read page past requested
4b47c233b3	6	6	0	GOOD	LU-16457 tests: wait for remote sleep in sanity-pcc/101a
5a5bd5b4da	6	6	0	GOOD	LU-16159 osp: destroy should not overtake writes
Comment by Aurelien Degremont (Inactive) [ 14/Feb/23 ]

I'll look at it

Comment by Aurelien Degremont (Inactive) [ 14/Feb/23 ]

Any tips to find all failing occurences in Maloo with some query? I'm failing to do it.

Comment by Alex Zhuravlev [ 14/Feb/23 ]

subtest search in Maloo for sanity/test_56oc ?

Comment by Aurelien Degremont (Inactive) [ 15/Feb/23 ]

I think i found the reason.

 

 It also forces atime to disk on MDD if ondisk atime is older than ondisk mtime/ctime to match relatime (even if relatime is not enabled)

This code change shows the culprit: https://review.whamcloud.com/c/fs/lustre-release/+/47017/1/lustre/tests/sanity.sh#5123

That's a change that Andreas asked for in https://review.whamcloud.com/c/fs/lustre-release/+/47017/1/lustre/tests/sanity.sh#5123

 

I reproduced the failure and did not without that change in that Gerrit patch (https://review.whamcloud.com/c/fs/lustre-release/+/50009/2)

 

I need to think about it

 

Comment by Andreas Dilger [ 15/Feb/23 ]

Any tips to find all failing occurences in Maloo with some query? I'm failing to do it.

Two easy ways to find similar failures. For any subtest failure report in Maloo, you can click on the failed subtest number and it will generate a search for failures of that subtest in the past week. You can then modify the query for specific date ranges, or limit it to specific branches, etc. There is a "link to this page" at the bottom to save the URL for future reference, like the following which searches from 2023-01-15 to 2023-02-15 and shows that the problem was hit once by the patch before landing on 2023-02-01, and then by other patches starting on 2023-02-07 after the patch landed:

https://testing.whamcloud.com/search?client_branch_type_id=24a6947e-04a9-11e1-bb5f-52540025f9af&server_branch_type_id=24a6947e-04a9-11e1-bb5f-52540025f9af&status%5B%5D=FAIL&test_set_script_id=f9516376-32bc-11e0-aaee-52540025f9ae&sub_test_script_id=cb9003f2-f26b-11e9-b62b-52540065bddc&start_date=2023-01-15&end_date=2023-02-15&source=sub_tests#redirect

Alternately, you can get to this search page from the "Results->Search" link at the top of any Maloo page, and then select "Sub tests" to limit the query to specific subtest failures, as opposed to whole "Test sets" failures.

Comment by Andreas Dilger [ 15/Feb/23 ]

Reopen to track sanity test_56oc failure triggered by this patch.

Comment by Aurelien Degremont (Inactive) [ 16/Feb/23 ]

Patch to fix the regression: https://review.whamcloud.com/c/fs/lustre-release/+/50009/

Comment by Gerrit Updater [ 23/Feb/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50009/
Subject: LU-15728 mdd: fix sanity-56oc failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8beeec77c3b426a01e1f10ca51149c7ca7e01b7e

Comment by Peter Jones [ 23/Feb/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:20:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.