[LU-10496] sanity-dom sanity test_39k: FAIL: mtime is lost on close: 1515705427, should be 1484169395 Created: 12/Jan/18  Updated: 26/Jan/24  Resolved: 09/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.13.0, Lustre 2.12.1
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-3285 Data on MDT Resolved
is related to LU-12710 sanity test_36f: FAIL: /mnt/lustre/d3... Open
is related to LU-10589 sanity-dom test_251: test_sanity fail... Resolved
is related to LU-12967 sanity test 80 silently fails to get ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-dom sanity test 39k failed as follows:

== sanity test 39k: write, utime, close, stat ======================================================== 21:17:06 (1515705426)
multiop /mnt/lustre/f39k.sanity voO_RDWR:w2097152_c
TMPPIPE=/tmp/multiop_open_wait_pipe.15091
 sanity test_39k: @@@@@@ FAIL: mtime is lost on close: 1515705427, should be 1484169395 

Maloo reports:
https://testing.hpdd.intel.com/test_sets/90c90676-f716-11e7-a6ad-52540065bddc
https://testing.hpdd.intel.com/test_sets/e1a7ef10-f6e6-11e7-bd00-52540065bddc
https://testing.hpdd.intel.com/test_sets/fc89fcf6-f655-11e7-94c7-52540065bddc



 Comments   
Comment by Bob Glossman (Inactive) [ 21/Mar/18 ]

another on master:
https://testing.hpdd.intel.com/test_sets/d7fa8c90-2d4e-11e8-b3c6-52540065bddc

Comment by Alex Zhuravlev [ 13/Apr/18 ]

I'm able to reproduce this locally with ZFS (no DNE/DOM needed)

 

Comment by Qian Yingjin (Inactive) [ 30/Jul/18 ]

another on master:

https://testing.whamcloud.com/test_sets/9665c524-920c-11e8-b0aa-52540065bddc

Comment by Mikhail Pershin [ 15/Jan/19 ]

test does the following:

Process #1: multiop oO_RDWR:w2097152_c
Process #2: wait #1 pause; setattr(mtime)

It works as expected when setattr is being issued after write and before close as expected but sometimes it happens that write is not flushed yet and this bug occurs . It can be reproduced with changing multiop parameters as oO_RDWR:_w2097152c

I'd propose to make sure that write is finished by doing data sync with 'Y' parameter

Comment by Gerrit Updater [ 15/Jan/19 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34029
Subject: LU-10496 test: update sanity tests 39 for DOM testing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a72572846e3461555affe0819c66b20a575d2459

Comment by Mikhail Pershin [ 16/Jan/19 ]

I've abandoned patch because it is incorrect replacement. Forced sync removes write vs setattr race and hides possible bug.

Comment by Gerrit Updater [ 26/Jan/19 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34115
Subject: LU-10496 tests: disable 39k for DoM for a while
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f11c35de40d8d5abe1fed303268d61a73b739f0b

Comment by Andreas Dilger [ 26/Jan/19 ]

Based on comments on Alex's last patch, the DoM code is not handling the file modification data (FMD) as is done in ofd. That tracks recent RPCs on each object and drops the modification time from writes that were generated before (lower XID) than setattr.

Comment by Alex Zhuravlev [ 26/Jan/19 ]

AFAIU, Mike has been integrating FMD code into the target thing.

Comment by Mikhail Pershin [ 26/Jan/19 ]

yes, I working exactly on that

Comment by Gerrit Updater [ 30/Jan/19 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34141
Subject: LU-10496 tgt: move FMD functionality in target
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3b6b66bc69d934dd794bf6e8529de9477ab982ce

Comment by Gerrit Updater [ 04/Feb/19 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34176
Subject: LU-10496 ofd: move FMD to the target code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3ec1fcde2ef53e59b03927d0c622be706f7d14cd

Comment by Gerrit Updater [ 05/Feb/19 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34190
Subject: LU-10496 tgt: move FMD handling from OFD to target
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 70e0e27f10b3e694997dc26095360fde1d7e31f3

Comment by Gerrit Updater [ 06/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34115/
Subject: LU-10496 tests: disable 39k for DoM for a while
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 68dd8a8acff9ad2295a1fcba318fc8ed5f140026

Comment by Gerrit Updater [ 02/Mar/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34364
Subject: LU-10496 tests: stop running sanity-dom/sanity 39k
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8c797e5cb14145ca80013ec9266d7536b01b46df

Comment by Gerrit Updater [ 15/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34176/
Subject: LU-10496 ofd: move FMD to the target code
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6459fa2b458612b5213b3b70839e340efff7aebc

Comment by Gerrit Updater [ 21/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34190/
Subject: LU-10496 tgt: move FMD handling from OFD to target
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 52e33c507b84bcaf3af9df010f5de4a282aa3fca

Comment by Peter Jones [ 21/Mar/19 ]

Landed for 2.13

Comment by Alex Zhuravlev [ 02/Apr/19 ]

https://testing.whamcloud.com/test_sets/5f72d680-5546-11e9-b98a-52540065bddc - but probably this is due to new locking. please ignore for a while.

Comment by Sarah Liu [ 10/Apr/19 ]

on b2_12 branch
https://testing.whamcloud.com/test_sets/53e192d8-5b23-11e9-a256-52540065bddc

Comment by Gerrit Updater [ 16/Apr/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34690
Subject: LU-10496 ofd: move FMD to the target code
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b21aafd60501902dadb6a48e876398b2db671380

Comment by Gerrit Updater [ 16/Apr/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34691
Subject: LU-10496 tgt: move FMD handling from OFD to target
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: c98cd9b0bf0596c7943656008bb2567922ca864c

Comment by Gerrit Updater [ 21/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34690/
Subject: LU-10496 ofd: move FMD to the target code
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 2eb6dee24a9f5804143da6258190387a3c50dee6

Comment by Gerrit Updater [ 21/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34691/
Subject: LU-10496 tgt: move FMD handling from OFD to target
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 5456b1bdbf94bdb1769000d4ca5a8131528ddf5d

Comment by James Nunez (Inactive) [ 09/Sep/19 ]

We are still seeing sanity-dom/sanity test 39k fail with the "mtime is lost on close" error message. A couple of recent failures on master are at
https://testing.whamcloud.com/test_sets/73af0226-d2b5-11e9-9fc9-52540065bddc
https://testing.whamcloud.com/test_sets/1ab44488-d2ce-11e9-a25b-52540065bddc

Comment by James Nunez (Inactive) [ 11/Sep/19 ]

We were seeing an uptick in sanity-dom/sanity test 39k since moving to ZFS 0.8.1. We are going to revert to ZFS 0.7.13 and this may reduce or stop this test from failing.

We will reopen this ticket if we see this test failing again once we are running ZFS 0.7.13.

Comment by Li Xi [ 14/Sep/19 ]

Another one on master:

https://testing.whamcloud.com/test_sets/c8b9f6de-d609-11e9-a2b6-52540065bddc

Comment by James Nunez (Inactive) [ 16/Sep/19 ]

Reopening ticket because we are seeing this issue with ZFS 0.7.13-1. Here are a couple of recent failures:
https://testing.whamcloud.com/test_sets/26f6de30-d6b2-11e9-a25b-52540065bddc
https://testing.whamcloud.com/test_sets/c8b9f6de-d609-11e9-a2b6-52540065bddc

Comment by Mikhail Pershin [ 18/Sep/19 ]

James, I wonder about the following - in test log I see that:
excepting tests: 39k 42a 42b 42c 407 312 180

but test 39k is not in the test ALWAYS_EXCEPT list, so that looks like it is set by SANITY_EXCEPT, where is this set? And why 39k is excluded? I suppose that sanity-dom.sh ask to run 39k explicitly so it is ran but maybe it shouldn't?

Comment by James Nunez (Inactive) [ 18/Sep/19 ]

I agree that there is something not right with the SANITY_EXCEPT list in sanity-dom.sh. I will look
into this issue.

From your comment, it sound like sanity-dom/sanity test 39k should be skipped and this ticket should remain open until a fix can be found. If this is wrong, please let me know.

Comment by Gerrit Updater [ 02/Oct/19 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36352
Subject: LU-10496 tests: enable 39k for DoM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 21d4843dec95997008416f26616ea0abb9b26054

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36228/
Subject: LU-10496 ofd: serialize fmd check and set
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 97a10cf9797bbed02fb131f6a205b6a0ceeb0525

Comment by Gerrit Updater [ 09/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36352/
Subject: LU-10496 tests: enable 39k for DoM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cf9bb79ebc5998c35a208ed011b64f9d4a62e7f3

Comment by Peter Jones [ 09/Oct/19 ]

Looks like everything has landed for 2.13

Generated at Sat Feb 10 02:35:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.