[LU-988] sanity.sh subtest test_39j failed with "mtime is lost on close" Created: 12/Jan/12 Updated: 23/Mar/23 Resolved: 05/Jul/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | Lustre 2.3.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4568 |
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: The sub-test test_39j failed with the following error:
Info required for matching: sanity 39j |
| Comments |
| Comment by nasf (Inactive) [ 05/Feb/12 ] |
|
Another failure instance: https://maloo.whamcloud.com/test_sets/c1550c2c-4ff6-11e1-9ebc-5254004bbbd3 |
| Comment by nasf (Inactive) [ 09/Feb/12 ] |
|
Another failure instance: |
| Comment by Andreas Dilger [ 10/Feb/12 ] |
|
https://maloo.whamcloud.com/test_sets/78e215b0-5259-11e1-aa4b-5254004bbbd3 |
| Comment by Li Wei (Inactive) [ 16/Feb/12 ] |
|
https://maloo.whamcloud.com/test_sets/7a0e2f22-5840-11e1-9df1-5254004bbbd3 |
| Comment by Li Wei (Inactive) [ 26/Feb/12 ] |
|
https://maloo.whamcloud.com/test_sets/2b8d2222-5f16-11e1-ab6b-5254004bbbd3 |
| Comment by Zhenyu Xu [ 08/Mar/12 ] |
|
In a failure instance https://maloo.whamcloud.com/test_sessions/43b80858-68ee-11e1-91b4-5254004bbbd3 158313:00020000:00000002:0.0:1331173465.105834:0:25793:0:(lov_merge.c:135:lov_merge_lvb()) merged: 2097152 1331173465 1331173463 1331173465 0 167853:00020000:00000002:0.0:1331173465.114431:0:25794:0:(lov_merge.c:135:lov_merge_lvb()) merged: 2097152 1331173465 1331173463 1331173464 2048 170695:00020000:00000002:0.0:1331173465.120790:0:25795:0:(lov_merge.c:135:lov_merge_lvb()) merged: 2097152 1331173465 1331173463 1331173465 2048 240587:00020000:00000002:0.0:1331173465.185512:0:25811:0:(lov_merge.c:135:lov_merge_lvb()) merged: 2097152 1331173464 1331173463 1331173465 4096 |
| Comment by Zhenyu Xu [ 08/Mar/12 ] |
|
patch tracking at http://review.whamcloud.com/2274 |
| Comment by Andreas Dilger [ 13/Mar/12 ] |
|
Hit this again: Bobijam, do you know why the time packed into the LVB is going backward? Is it because the clock in the VM itself is going backward, or because the timestamps on the inode are being lost? |
| Comment by Zhenyu Xu [ 13/Mar/12 ] |
|
I don't have evidence that the inode timestamps on server are lost, it is most likely due to the clock in the VM drifted a little bit backward, I don't have strong evidence either, except for the "lov_merge.c" log I posted in my previous comment. |
| Comment by Zhenyu Xu [ 13/Mar/12 ] |
|
I'll push a debugging patch to print out inode timestamp on OSTs in lov_merge_lvb. |
| Comment by Jian Yu [ 29/Mar/12 ] |
|
Again: |
| Comment by Jinshan Xiong (Inactive) [ 27/Apr/12 ] |
|
another occurrence: https://maloo.whamcloud.com/test_sets/d60a69fc-9040-11e1-98a1-525400d2bfa6 |
| Comment by Xuezhao Liu [ 25/May/12 ] |
|
Another failure case: |
| Comment by nasf (Inactive) [ 28/May/12 ] |
|
another failure: https://maloo.whamcloud.com/test_sets/4193f5ec-a8ad-11e1-9ad2-52540035b04c |
| Comment by Ian Colle (Inactive) [ 01/Jun/12 ] |
|
This seems to be happening more and more often: May 28th - https://maloo.whamcloud.com/sub_tests/08d3c54e-a898-11e1-9ad2-52540035b04c |
| Comment by Zhenyu Xu [ 03/Jun/12 ] |
|
Got an log from another hit (https://maloo.whamcloud.com/test_sets/b4fc6bd0-acc3-11e1-a542-52540035b04c) client2 log sanity.test_39j.debug_log.client-26vm2.1338628522.log [ cancel osc lock ] 240484:00020000:00000002:0.0:1338628522.238526:0:27748:0:(lov_merge.c:138:lov_merge_lvb()) merged for FID [0x200000400:0x287:0x0] s=2097152 m=1338628521 a=1338628520 c=1338628522 b=4096 |
| Comment by Zhenyu Xu [ 09/Jun/12 ] |
|
m=1338628521 a=1338628520 c=1338628522 shows that the mtime and ctime are not consistent on OST object, will push another debug patch to show what m/ctime used on OSS side. |
| Comment by Ian Colle (Inactive) [ 12/Jun/12 ] |
|
Hit it again: https://maloo.whamcloud.com/test_sets/e9396c4a-b493-11e1-bdae-52540035b04c |
| Comment by Ian Colle (Inactive) [ 15/Jun/12 ] |
|
And again: |
| Comment by Keith Mannthey (Inactive) [ 18/Jun/12 ] |
|
And again: |
| Comment by Peter Jones [ 25/Jun/12 ] |
|
Diagnostic patch landed on June 19th so need a hit after then |
| Comment by Zhenyu Xu [ 26/Jun/12 ] |
|
from https://maloo.whamcloud.com/sub_tests/c70a1fde-bf65-11e1-acb7-52540035b04c OST write time extracted from OST debug log $ grep "write: s=" sanity.test_39j.debug_log.client-27vm4.1340673637.log let's check client log *extent lock cl_wait()=>ccc_lock_state()=>cl_object_attr_get()=>lov_attr_get_raid0() *glimpse lock part 1: cl_glimpse_lock()->cl_lock_request()=>cl_wait() *glimpse lock part 2: cl_glimpse_lock() *glimpse lock part 1 *glimpse lock part 2 *glimpse lock part 2, glimpse lock state unchanged *local locks cancelled *glimpse lock part 1: request glimpse lock *glimpse lock part 2 |
| Comment by Zhenyu Xu [ 27/Jun/12 ] |
|
patch tracking at http://review.whamcloud.com/3200
When building OST write RPC, inode's c/mtime is out-dated until |
| Comment by nasf (Inactive) [ 30/Jun/12 ] |
|
another failure instance: |
| Comment by Gerrit Updater [ 17/Oct/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48887 |
| Comment by Gerrit Updater [ 23/Mar/23 ] |
|
"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50387 |