[LU-7310] sanityn test_39a fails with 'mtime is not updated on write: 1444890978 <= 1413354977 <= 1444890979' Created: 16/Oct/15  Updated: 13/Sep/16  Resolved: 13/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

autotest review-zfs-part-1


Issue Links:
Related
is related to LU-6137 Update timestamps arbitrarily on MDS Resolved
is related to LU-7072 sanityn test_78: Expected set_param t... Resolved
is related to LU-7108 Remove sanityn tests 14b, 19, 29, and... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanityn test 39a fails only on review-zfs-part-1 with the error message

'mtime is not updated on write: 1444890978 <= 1413354977 <= 1444890979'

Logs for the most recent failure is at https://testing.hpdd.intel.com/test_sets/560c385e-7325-11e5-ae2e-5254006e85c2.

sanityn test_39a has failed with this error about 18 times in the past two months only on review-zfs-part-1. All of these failures took place when testing patches for LU-7108 or as a results of LU-7072 with patches http://review.whamcloud.com/#/c/16383 and http://review.whamcloud.com/#/c/16164, respectively. Both of these patches remove tests from the sanityn ALWAYS_EXCEPT list; sanityn tests 14b, 19, 29, and 35. The patch for LU-7072 added sanityn test_78 to the ALWAYS_EXCEPT list and, by accident, removed all other tests from the except list allowing these test to run. More recently, a patch for LU-7108, removing sanityn tests 14b, 19, 29, and 35 from the ALWAYS_EXCEPT list is working its way through autotest. What both of these tickets/patch have in common is that their autotest results show that by reintroducing these sanityn tests back into the mix, they are probably causing test 39a to occasionally fail.

Recent logs with sanityn_test39a failures are at
2015-10-10 10:13:27 - https://testing.hpdd.intel.com/test_sets/1ef9bb16-6f56-11e5-b7e0-5254006e85c2
2015-10-06 04:23:35 - https://testing.hpdd.intel.com/test_sets/85fef54c-6c09-11e5-9ae6-5254006e85c2
2015-09-22 10:55:32 - https://testing.hpdd.intel.com/test_sets/5ec3968e-6139-11e5-b495-5254006e85c2
2015-09-17 23:36:18 - https://testing.hpdd.intel.com/test_sets/989a1506-5dd1-11e5-a637-5254006e85c2
2015-09-12 10:40:34 - https://testing.hpdd.intel.com/test_sets/170cc7ba-596a-11e5-88c5-5254006e85c2
2015-09-05 04:32:19 - https://testing.hpdd.intel.com/test_sets/89465b6c-53ba-11e5-83ee-5254006e85c2



 Comments   
Comment by Andreas Dilger [ 19/Oct/15 ]

I suspect that the enabling of these tests may cause the server VM to be slowed down enough cause the clock to skew, and there is a bug in the server code updating some timestamp that uses the server clock instead of the timestamp passed from the client.

This could be exposed by running this test with a wildly different time set on the MDS and OSS (e.g. ahead or behind by 1 day) so that it is clear where the timestamp is coming from. This was partially done in http://review.whamcloud.com/14122 but needs to be done in a better manner to be able to include into our regular testing.

Comment by Saurabh Tandan (Inactive) [ 22/Oct/15 ]

I ran some manual test on single node with various combinations of 14b, 19, 29, 35 in ALWAYS_EXCEPT list and found out whenever test_35 was excluded from ALWAYS_EXCEPT list it produced above mentioned error for test_39a. So there is a high possibility of this issue being related to test_35 in someway.

Comment by James Nunez (Inactive) [ 26/Jan/16 ]

Another failure:
2016-01-25 22:00:44 - https://testing.hpdd.intel.com/test_sets/69450134-c3cb-11e5-b763-5254006e85c2

Comment by Niu Yawei (Inactive) [ 23/Jun/16 ]

https://testing.hpdd.intel.com/test_sets/9f0e675e-3920-11e6-bbf5-5254006e85c2

Comment by Niu Yawei (Inactive) [ 28/Jun/16 ]
00010000:00010000:0.0:1444890979.493775:0:30329:0:(ldlm_lockd.c:1923:ldlm_handle_gl_callback()) ### client glimpse AST callback handler ns: lustre-OST0000-osc-ffff880037e41c00 lock: ffff88007a6fda80/0x4e64250580a5fff6 lrc: 2/0,0 mode: PW/PW res: [0x7f8f:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20000020000 nid: local remote: 0x57fb8313db314742 expref: -99 pid: 1011 timeout: 0 lvb_type: 1
00000020:00010000:0.0:1444890979.493817:0:30329:0:(cl_object.c:290:cl_object_glimpse()) header@ffff88005e632978[0x0, 2, [0x200002b10:0x5e:0x0] hash]

00000020:00010000:0.0:1444890979.493821:0:30329:0:(cl_object.c:290:cl_object_glimpse()) size: 6 mtime: 1413354977 atime: 1444890978 ctime: 1444890978 blocks: 1
00000020:00010000:0.0:1444890979.493828:0:30329:0:(cl_object.c:290:cl_object_glimpse()) header@ffff88007a767090[0x0, 2, [0x100000000:0x7f8f:0x0] hash]

00000020:00010000:0.0:1444890979.493830:0:30329:0:(cl_object.c:290:cl_object_glimpse()) size: 6 mtime: 1413354977 atime: 1444890978 ctime: 1444890978 blocks: 1
00000100:00100000:0.0:1444890979.493881:0:30329:0:(service.c:2121:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cb00_000:LOV_OSC_UUID+4:24810:x1515071670949744:12345-10.1.5.15@tcp:106 Request procesed in 125us (196us total) trans 0 rc 0/0

The client appending file returns wrong mtime on glimpse, looks some defect in clio that can only be triggered in zfs testing?

Comment by Gerrit Updater [ 29/Jun/16 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/21063
Subject: LU-7310 clio: sync write should update mtime
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d9ebc5916afad38629def87e17e0e363a9d5c027

Comment by Gerrit Updater [ 13/Sep/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21063/
Subject: LU-7310 clio: sync write should update mtime
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e130c75827b2f40d89cb69ac5c64bb94d67c520c

Comment by Peter Jones [ 13/Sep/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:07:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.