[LU-1042] 1.8 clients show wrong dates with 2.1 servers Created: 26/Jan/12 Updated: 03/Dec/14 Resolved: 15/Feb/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 2.2.0, Lustre 2.1.1 |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.1.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Christopher Morrone | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
1.8.5-6chaos clients, 2.1.0-17chaos servers |
||
| Issue Links: |
|
||||||||||||||||||||
| Story Points: | 1 | ||||||||||||||||||||
| Severity: | 1 | ||||||||||||||||||||
| Rank (Obsolete): | 4733 | ||||||||||||||||||||
| Description |
|
With 2.1 servers, our 1.8 clients are showing an incorrect date of: 1901-12-14 12:45:52.000000000 -0800 for newly created files. Andreas has pointed us at a patch for We are in production with some 2.1 servers now, so we will need a quick fix on this. The |
| Comments |
| Comment by Peter Jones [ 26/Jan/12 ] |
|
Oleg Could you please look at this one? Thanks Peter |
| Comment by Oleg Drokin [ 26/Jan/12 ] |
|
It seems you are running 1.8.5+tons of patches. I now somewhat remember there was an unrelated to compatibbility patch that changed the way we recalculate times on the clients |
| Comment by Andreas Dilger [ 26/Jan/12 ] |
|
I think it makes sense to revert 414251797ed178eec5d431e1f5aa4a889d2b159f (http://review.whamcloud.com/1084) for 2.1.1 and 2.2, since this bug is only hit if the clocks on the OSTs are significantly different from the clocks on the clients. This should be done ASAP to ensure it is included in 2.1.1 and 2.2.0, and it will give us more time to figure out the root cause and fix it properly. The first question is why this problem appeared only on 1.8 clients with 2.1 servers, and not with 2.1 clients? Since this code has been running out in the field for some time, we may also need to add a workaround in the 2.1.1/2.2 OST code to special-case (INT_MIN + 24 * 3600) and convert it to 0 on the fly. A reasonable and easy fix would be to initialize the timestamps of the files with a small positive value like (24 * 3600), instead of a large negative value like INT_MIN + (24 * 3600). The latter can be confusing between 32-bit and 64-bit values, either being treated as either (_u64)INT_MIN - (24 * 3600) or (_u64)(INT_MIN - 24 * 3600) (a large positive 32-bit value) depending on usage. Having a large negative value might also complicate future on-disk compatibility issues, since it may be that the ext4 timestamp will be treated as an unsigned 32-bit value in the future to avoid Y2038 time wrapping problems. There is a secondary concern that I found when looking at the output from an lfsck run on a 2.1 OST (e2fsck -d): e2fsck_pass1:1402: increase inode 7296 badness 0 to 1 This is marking the inode a little bit bad (+1) because the atime and mtime are before the filesystem creation time, and a bit more bad (+2) because the ctime (which cannot be set by applications) is also before the filesystem creation time. e2fsck uses a heuristic to determine whether inodes with a number of errors should be considered corrupt, and the default "badness" threshold is 8, of which these negative timestamps are already contributing 3 points. Having a file over 2GB also contributes a point, so this seems to be rapidly approaching a state where some other minor corruption to the inode would cause it to get marked as corrupt. My question is whether it makes sense to set this timestamp internally to the OSD when the object is first created? For ldiskfs it can use the minimum filesystem creation time (s_mkfs_time) for this, to avoid spurious errors and ZFS could just use 0 or 1 or (24 * 3600). |
| Comment by Andreas Dilger [ 26/Jan/12 ] |
|
Just to clarify, the reason why I singled out http://review.whamcloud.com/1084 is because the timestamp (INT_MIN + 24 * 3600) is "1901-12-14 12:45:52.000000000 -0800" (i.e. about 68 years before the epoch, just like Y2038 is about 68 years after the epoch and when signed 32-bit unix counters will overflow). |
| Comment by Christopher Morrone [ 26/Jan/12 ] |
|
As expected, after reverting A more 1.8-compatible fix will be needed for |
| Comment by Andreas Dilger [ 27/Jan/12 ] |
|
Chris, thanks for the update. I'm leaving this as a blocker, to ensure that the fix gets landed on the 2.1.1 and 2.2 release branches. |
| Comment by Andreas Dilger [ 27/Jan/12 ] |
|
Submitted http://review.whamcloud.com/2030 to revert this patch. I'm working on a new patch to fix it properly. |
| Comment by Andreas Dilger [ 27/Jan/12 ] |
|
Submitted http://review.whamcloud.com/2036, which is a separate (parallel) patch to hopefully fix this properly. |
| Comment by Andreas Dilger [ 13/Feb/12 ] |
|
The 2.1.1 release has reverted the original change 1084, since this introduces the minimal risk for that release. The full fix is still targetted for 2.2. |
| Comment by Peter Jones [ 15/Feb/12 ] |
|
Fix landed for 2.2 |
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 15/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 17/Feb/12 ] |
|
Integrated in Result = ABORTED
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|