[LU-12363] sanity-dom: test_fsx failed with 120 Created: 31/May/19 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.5, Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/86c090e6-830a-11e9-8c65-52540065bddc == sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 16:12:36 (1559232756)
Chance of close/open is 1 in 50
Seed set to 2757
fd 0: /mnt/lustre/ffsx.sanity-dom
fd 1: /mnt/lustre2/ffsx.sanity-dom
truncating to largest ever: 0xd3ab8
000001[1] 1559232757.193757 trunc from 00000000 to 0x0d3ab7 (0xd3ab8 bytes)
1559232757.196511 trunc done
000002[0] 1559232757.197183 trunc from 0x0d3ab8 to 0x1a6bda (0xd3123 bytes)
1559232757.200406 trunc done
000003[0] 1559232757.203269 write 0x03f41a thru 0x048669 (0x9250 bytes)
1559232757.205390 write done
000004[0] 1559232757.206069 mapread 0x0634ca thru 0x06bc8d (0x87c4 bytes)
1559232757.206138 mmap done
1559232757.230198 memcpy done
1559232757.230234 munmap done
000005[1] 1559232757.232410 write 0x027f54 thru 0x03489f (0xc94c bytes)
1559232757.238426 write done
Size error: expected 0xd3123 stat 0xd3ab8 seek 0xd3ab8
LOG DUMP (5 total operations):
1[1]: 1559232757.193757 TRUNCATE UP from 0x0 to 0xd3ab8
2[0]: 1559232757.197183 TRUNCATE DOWN from 0xd3ab8 to 0xd3123
3[0]: 1559232757.203269 WRITE 0x3f41a thru 0x48669 (0x9250 bytes)
4[0]: 1559232757.206069 MAPREAD 0x634ca thru 0x6bc8d (0x87c4 bytes)
5[1]: 1559232757.232410 WRITE 0x27f54 thru 0x3489f (0xc94c bytes)
Correct content saved for comparison
(maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood")
sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 120
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:5877:error()
= /usr/lib64/lustre/tests/test-framework.sh:6164:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:6203:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:6050:run_test()
= /usr/lib64/lustre/tests/sanity-dom.sh:130:main()
|
| Comments |
| Comment by Mikhail Pershin [ 13/Oct/20 ] |
|
Issue still exists in b2_12 at least and starts occuring quite often recently. |
| Comment by Mikhail Pershin [ 13/Oct/20 ] |
|
Investigation shows simple use case reproduces that: [client 1] truncate UP to some size SIZE0 [client 2] truncate DOWN to SIZE1 [client 1] write in between SIZE1 and SIZE0 check filesize is last write END while both truncates works correctly (file size will be SIZE1 on both clients without last write), addition of the last write on client1 sets file size to SIZE0 after all. |
| Comment by Mikhail Pershin [ 16/Oct/20 ] |
|
It looks like this is result of the |
| Comment by James Nunez (Inactive) [ 19/Nov/20 ] |
|
I'm still seeing what looks like this issue on b2_12 even after the Most recent failure at https://testing.whamcloud.com/test_sets/33e75ba3-d6fe-4d96-9acd-8d280671f396 . From the client test_log == sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 03:11:21 (1605669081)
Chance of close/open is 1 in 50
Seed set to 9082
fd 0: /mnt/lustre/ffsx.sanity-dom
fd 1: /mnt/lustre2/ffsx.sanity-dom
skipping zero size read
truncating to largest ever: 0x7f60b
000002[0] 1605669082.928997 trunc from 00000000 to 0x07f60a (0x7f60b bytes)
1605669082.931277 trunc done
...
000167[1] 1605669084.691482 mapread 0x081b59 thru 0x08703b (0x54e3 bytes)
1605669084.691550 mmap done
1605669084.691577 memcpy done
1605669084.691589 munmap done
READ BAD DATA: offset = 0x81b59, size = 0x54e3
OFFSET GOOD BAD RANGE
0x81b59 0xa3a2 000000 0x2e63
operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops
LOG DUMP (170 total operations):
1[1]: 1605669082.925562 SKIPPED (no operation)
2[0]: 1605669082.928997 TRUNCATE UP from 0x0 to 0x7f60b
...
169[1]: 1605669084.689675 READ 0x9d146 thru 0xa5942 (0x87fd bytes)
170[1]: 1605669084.691482 MAPREAD 0x81b59 thru 0x8703b (0x54e3 bytes) ***RRRR***
Correct content saved for comparison
(maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood")
sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 110
|
| Comment by Mikhail Pershin [ 19/Nov/20 ] |
|
James, I am checking that, probably some other DoM patch is still missing in b2_12 |
| Comment by Mikhail Pershin [ 24/Nov/20 ] |
|
at least this patch is still needed: |
| Comment by Mikhail Pershin [ 24/Nov/20 ] |
|
James, also in 2_12 we ar missing two LVB fixes which are also about correct file size: https://review.whamcloud.com/40739 https://review.whamcloud.com/40740
|
| Comment by Mikhail Pershin [ 24/Nov/20 ] |
|
And another missong patch that can be related: |