[LU-221] sanityN.sh test_4 failed Created: 18/Apr/11  Updated: 03/Dec/14  Resolved: 27/Jul/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Niu Yawei (Inactive) Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

2.6.18-194.17.1.el5


Issue Links:
Duplicate
is duplicated by LU-222 sanityn test_39c failed Resolved
Related
is related to LU-1042 1.8 clients show wrong dates with 2.1... Resolved
is related to LU-5977 Remove correction for bad timestamp Resolved
Severity: 3
Bugzilla ID: 18,169
Epic: metadata
Rank (Obsolete): 4066

 Description   

I hit this problem several times while running sanityn:

== sanityn test 4: fstat validation on multiple mount points ========================================= 19:19:59 (1303179599)
Mtimes don't match 1303179601, 1303179727
sanityn test_4: @@@@@@ FAIL: test_4 failed with 1

test result on maloo:
https://maloo.whamcloud.com/test_sets/4d51c720-6a37-11e0-b32b-52540025f9af

It's same to bug 18169.



 Comments   
Comment by Peter Jones [ 19/Apr/11 ]

HongChao

Could you please look into this failure?

Thanks

Peter

Comment by Hongchao Zhang [ 20/Apr/11 ]

Okay, take this issue.

Comment by Hongchao Zhang [ 22/Apr/11 ]

this issue should be caused by the time difference between client and OST,
the atime of the file "f4" is even larger than the current time in client,

00000020:00010000:3.0:1303179601.011757:0:21145:0:(cl_object.c:308:cl_object_glimpse()) size: 5 mtime: 1303179601 atime: 1303179727 ctime: 1303179601 blocks: 0

the mtime and ctime is sane, but atime(1303179727) is "126" seconds later than the current time(1303179601.011757)

in filter, the LVB will be initialized according to the value of its object, then
for client1, the ctime will be the current time at client for it just write some data and there is no need to send glimpse lock request,
but for client2, the glimpse lock will be sent to update the LVB and the ctime will be set to the time in OST (1303179727)

Comment by Niu Yawei (Inactive) [ 24/Apr/11 ]

Hi, Hongchao

The test_4 is to compare mtime on two clients, but not ctime or atime. I think in theory, the mtime seen by client2 should be the mtime on client1, and I didn't see why server time is involved, did I miss anything?

Comment by Niu Yawei (Inactive) [ 25/Apr/11 ]

After look closer to the code, I see that the time difference between OSS and client could affect the test result:

  • client 1 create file on time_1, MDS create file and set mtime as client1's time_1, the mtime of object on OSS is OSS local server time time_s; (time_s > time_1, if OSS clock is faster than the client_1)
  • client 2 open file, it get time_1 from MDS, and glimpse time_s from OSS, because time_s > time_1, it take the time_s as it's mtime;
  • client 1 write to the file on client1's time_2;
  • client 1 stat file, it take local time_2 as mtime; (since time_s > time_2, no matter if the dirty is flushed, the mtime of object on OSS should always be time_s)
  • client 2 stat file, it glimpse time from OSS, OSS issues glimpse callback to client1, but because the time_s is even later than client1's time_2, client2 still take the later one time_s as it's mtime;
  • at last, client1's mtime (time_2) != client2's mtime (time_s);

Hi, Hongchao

Is this what you described in your previous comment? I'm wondering if it's a bug, in my opinion, lustre should always take client's time as file's a/c/mtime, server time should never been involved, but if we do require the time be synced on all nodes (include clients and servers), then it's not neccesarry a bug.

If we regard it as a bug, I think the fix could be quite simple, we can just set the object's a/c/mtime as zero (or a very small time) when pre-creating them, then the untouched objects on OSS will no longer contribute time value in the glimpse mechanism, until it's written or truncated by some client.

Comment by Niu Yawei (Inactive) [ 25/Apr/11 ]

Hi, Andreas, any comments on this?

Comment by Andreas Dilger [ 25/Apr/11 ]

It should already be possible to handle this today by checking the SUID/SGID flags on the inode, and if both of them are set then the object a/m/ctime should be ignored (return 0 for all of them). I'd prefer not to set those values to 0 on disk because some parts of the code consider ctime = 0 an invalid inode.

It is definitely correct that the times should be based on the client and not the server. While it is desirable to have clock in sync on all nodes, this is not required.

Comment by Andreas Dilger [ 06/Jul/11 ]

Any progress on this bug?

I see this is still failing in Maloo, e.g. https://maloo.whamcloud.com/test_sets/fa0a9a24-a78f-11e0-bd2a-52540025f9af. Fixing the failing regression tests makes our testing much more efficient.

I think the proposed fix is relatively straight forward to implement - in filter_commitrw_write(), if both SUID and SGID are still set (i.e. this is the first time this was done), then OBD_MD_FLMTIME|OBD_MD_FLCTIME|OBD_MD_FLATIME should be OR'd into the "i" flag passed to iattr_from_obdo(), and ATTR_MTIME | ATTR_CTIME | ATTR_ATIME should be left in the ia_valid mask, so that the values sent from the client overwrite those on disk in the call to fsfilt_setattr().

Comment by Hongchao Zhang [ 06/Jul/11 ]

the initial patch has been created (some improvement is still needed), but it was suspended for there are several other high
priority bugs needed to investigate. I'll complete it soon.

Comment by Peter Jones [ 07/Jul/11 ]

I think that we need to make fixing this issue a priority. As Andreas says it causes regular autotest failures

Comment by Hongchao Zhang [ 11/Jul/11 ]

the patch is at http://review.whamcloud.com/#change,1084

some notes about the patch,
1, S_ISUID, S_ISGID can't be used to determine whether the inode's a/c/m time is valid, for the a/c/m can be set individually
by users(say, "touch"), test_39m in sanity.sh just tests this case.

in this patch, the a/c/m time is set as the minimal value of inode->i_a(m,c)time (LONG_MIN), and it will be ignored
by clients for only the newest time can be used.

2, in this patch, the a/c/m time is initialized after creating the inode, and it can be moved into ldiskfs if it degrade the
creation performance.

Comment by Hongchao Zhang [ 21/Jul/11 ]

there is an error in the Maloo's test for this patch
https://maloo.whamcloud.com/test_sets/b3babb64-b1fb-11e0-b33f-52540025f9af

there is a bug (https://bugzilla.lustre.org/show_bug.cgi?id=23161) in buzilla that tracked this issue.

not sure whether the new occurrence is related to the patch yet.

Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,server,el5,ofa #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,client,el5,ofa #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Build Master (Inactive) [ 27/Jul/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #231
LU-221 don't use a/c/m time for newly allocated object in OST

Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
Files :

  • lustre/obdfilter/filter.c
Comment by Peter Jones [ 27/Jul/11 ]

Landed for 2.1

Generated at Sat Feb 10 01:04:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.