[LU-221] sanityN.sh test_4 failed Created: 18/Apr/11 Updated: 03/Dec/14 Resolved: 27/Jul/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Niu Yawei (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.6.18-194.17.1.el5 |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Bugzilla ID: | 18,169 | ||||||||||||||||||||
| Epic: | metadata | ||||||||||||||||||||
| Rank (Obsolete): | 4066 | ||||||||||||||||||||
| Description |
|
I hit this problem several times while running sanityn: == sanityn test 4: fstat validation on multiple mount points ========================================= 19:19:59 (1303179599) test result on maloo: It's same to bug 18169. |
| Comments |
| Comment by Peter Jones [ 19/Apr/11 ] |
|
HongChao Could you please look into this failure? Thanks Peter |
| Comment by Hongchao Zhang [ 20/Apr/11 ] |
|
Okay, take this issue. |
| Comment by Hongchao Zhang [ 22/Apr/11 ] |
|
this issue should be caused by the time difference between client and OST, 00000020:00010000:3.0:1303179601.011757:0:21145:0:(cl_object.c:308:cl_object_glimpse()) size: 5 mtime: 1303179601 atime: 1303179727 ctime: 1303179601 blocks: 0 the mtime and ctime is sane, but atime(1303179727) is "126" seconds later than the current time(1303179601.011757) in filter, the LVB will be initialized according to the value of its object, then |
| Comment by Niu Yawei (Inactive) [ 24/Apr/11 ] |
|
Hi, Hongchao The test_4 is to compare mtime on two clients, but not ctime or atime. I think in theory, the mtime seen by client2 should be the mtime on client1, and I didn't see why server time is involved, did I miss anything? |
| Comment by Niu Yawei (Inactive) [ 25/Apr/11 ] |
|
After look closer to the code, I see that the time difference between OSS and client could affect the test result:
Hi, Hongchao Is this what you described in your previous comment? I'm wondering if it's a bug, in my opinion, lustre should always take client's time as file's a/c/mtime, server time should never been involved, but if we do require the time be synced on all nodes (include clients and servers), then it's not neccesarry a bug. If we regard it as a bug, I think the fix could be quite simple, we can just set the object's a/c/mtime as zero (or a very small time) when pre-creating them, then the untouched objects on OSS will no longer contribute time value in the glimpse mechanism, until it's written or truncated by some client. |
| Comment by Niu Yawei (Inactive) [ 25/Apr/11 ] |
|
Hi, Andreas, any comments on this? |
| Comment by Andreas Dilger [ 25/Apr/11 ] |
|
It should already be possible to handle this today by checking the SUID/SGID flags on the inode, and if both of them are set then the object a/m/ctime should be ignored (return 0 for all of them). I'd prefer not to set those values to 0 on disk because some parts of the code consider ctime = 0 an invalid inode. It is definitely correct that the times should be based on the client and not the server. While it is desirable to have clock in sync on all nodes, this is not required. |
| Comment by Andreas Dilger [ 06/Jul/11 ] |
|
Any progress on this bug? I see this is still failing in Maloo, e.g. https://maloo.whamcloud.com/test_sets/fa0a9a24-a78f-11e0-bd2a-52540025f9af. Fixing the failing regression tests makes our testing much more efficient. I think the proposed fix is relatively straight forward to implement - in filter_commitrw_write(), if both SUID and SGID are still set (i.e. this is the first time this was done), then OBD_MD_FLMTIME|OBD_MD_FLCTIME|OBD_MD_FLATIME should be OR'd into the "i" flag passed to iattr_from_obdo(), and ATTR_MTIME | ATTR_CTIME | ATTR_ATIME should be left in the ia_valid mask, so that the values sent from the client overwrite those on disk in the call to fsfilt_setattr(). |
| Comment by Hongchao Zhang [ 06/Jul/11 ] |
|
the initial patch has been created (some improvement is still needed), but it was suspended for there are several other high |
| Comment by Peter Jones [ 07/Jul/11 ] |
|
I think that we need to make fixing this issue a priority. As Andreas says it causes regular autotest failures |
| Comment by Hongchao Zhang [ 11/Jul/11 ] |
|
the patch is at http://review.whamcloud.com/#change,1084 some notes about the patch, in this patch, the a/c/m time is set as the minimal value of inode->i_a(m,c)time (LONG_MIN), and it will be ignored 2, in this patch, the a/c/m time is initialized after creating the inode, and it can be moved into ldiskfs if it degrade the |
| Comment by Hongchao Zhang [ 21/Jul/11 ] |
|
there is an error in the Maloo's test for this patch there is a bug (https://bugzilla.lustre.org/show_bug.cgi?id=23161) in buzilla that tracked this issue. not sure whether the new occurrence is related to the patch yet. |
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Build Master (Inactive) [ 27/Jul/11 ] |
|
Integrated in Oleg Drokin : 414251797ed178eec5d431e1f5aa4a889d2b159f
|
| Comment by Peter Jones [ 27/Jul/11 ] |
|
Landed for 2.1 |