[LU-12363] sanity-dom: test_fsx failed with 120 Created: 31/May/19  Updated: 16/Jan/22  Resolved: 16/Jan/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5, Lustre 2.12.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-10498 sanity-dom test_fsx: FAIL: test_fsx f... Closed
Related
is related to LU-11835 DOM open resend doesn't return size Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/86c090e6-830a-11e9-8c65-52540065bddc

== sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 16:12:36 (1559232756)
Chance of close/open is 1 in 50
Seed set to 2757
fd 0: /mnt/lustre/ffsx.sanity-dom
fd 1: /mnt/lustre2/ffsx.sanity-dom
truncating to largest ever: 0xd3ab8
000001[1] 1559232757.193757 trunc from 00000000  to  0x0d3ab7	(0xd3ab8 bytes)
       1559232757.196511 trunc done
000002[0] 1559232757.197183 trunc from 0x0d3ab8  to  0x1a6bda	(0xd3123 bytes)
       1559232757.200406 trunc done
000003[0] 1559232757.203269 write      0x03f41a thru 0x048669	(0x9250 bytes)
       1559232757.205390 write done
000004[0] 1559232757.206069 mapread    0x0634ca thru 0x06bc8d	(0x87c4 bytes)
       1559232757.206138 mmap done
       1559232757.230198 memcpy done
       1559232757.230234 munmap done
000005[1] 1559232757.232410 write      0x027f54 thru 0x03489f	(0xc94c bytes)
       1559232757.238426 write done
Size error: expected 0xd3123 stat 0xd3ab8 seek 0xd3ab8
LOG DUMP (5 total operations):
1[1]: 1559232757.193757 TRUNCATE UP	from 0x0 to 0xd3ab8
2[0]: 1559232757.197183 TRUNCATE DOWN	from 0xd3ab8 to 0xd3123
3[0]: 1559232757.203269 WRITE    0x3f41a thru 0x48669 (0x9250 bytes)
4[0]: 1559232757.206069 MAPREAD  0x634ca thru 0x6bc8d (0x87c4 bytes)
5[1]: 1559232757.232410 WRITE    0x27f54 thru 0x3489f (0xc94c bytes)
Correct content saved for comparison
(maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood")
 sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 120 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5877:error()
  = /usr/lib64/lustre/tests/test-framework.sh:6164:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6203:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6050:run_test()
  = /usr/lib64/lustre/tests/sanity-dom.sh:130:main()


 Comments   
Comment by Mikhail Pershin [ 13/Oct/20 ]

Issue still exists in b2_12 at least and starts occuring quite often recently.

Comment by Mikhail Pershin [ 13/Oct/20 ]

Investigation shows simple use case reproduces that:

[client 1] truncate UP to some size SIZE0
[client 2] truncate DOWN to SIZE1
[client 1] write in between SIZE1 and SIZE0
check filesize is last write END

while both truncates works correctly (file size will be SIZE1 on both clients without last write), addition of the last write on client1 sets file size to SIZE0 after all.
I am checking this further to find the reason of that

Comment by Mikhail Pershin [ 16/Oct/20 ]

It looks like this is result of the LU-12296 issue. Patch is pushed for b2_12: https://review.whamcloud.com/40277/

Comment by James Nunez (Inactive) [ 19/Nov/20 ]

I'm still seeing what looks like this issue on b2_12 even after the LU-12296 patch, https://review.whamcloud.com/#/c/40296/ (LU-12296 llite: improve ll_dom_lock_cancel), landed to b2_12.

Most recent failure at https://testing.whamcloud.com/test_sets/33e75ba3-d6fe-4d96-9acd-8d280671f396 . From the client test_log

== sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 03:11:21 (1605669081)
Chance of close/open is 1 in 50
Seed set to 9082
fd 0: /mnt/lustre/ffsx.sanity-dom
fd 1: /mnt/lustre2/ffsx.sanity-dom
skipping zero size read
truncating to largest ever: 0x7f60b
000002[0] 1605669082.928997 trunc from 00000000  to  0x07f60a	(0x7f60b bytes)
       1605669082.931277 trunc done
...
000167[1] 1605669084.691482 mapread    0x081b59 thru 0x08703b	(0x54e3 bytes)
       1605669084.691550 mmap done
       1605669084.691577 memcpy done
       1605669084.691589 munmap done
READ BAD DATA: offset = 0x81b59, size = 0x54e3
OFFSET	GOOD	BAD	RANGE
0x81b59	0xa3a2	000000	 0x2e63
operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops
LOG DUMP (170 total operations):
1[1]: 1605669082.925562 SKIPPED (no operation)
2[0]: 1605669082.928997 TRUNCATE UP	from 0x0 to 0x7f60b
...
169[1]: 1605669084.689675 READ     0x9d146 thru 0xa5942 (0x87fd bytes)
170[1]: 1605669084.691482 MAPREAD  0x81b59 thru 0x8703b (0x54e3 bytes)	***RRRR***
Correct content saved for comparison
(maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood")
 sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 110 
Comment by Mikhail Pershin [ 19/Nov/20 ]

James, I am checking that, probably some other DoM patch is still missing in b2_12

Comment by Mikhail Pershin [ 24/Nov/20 ]

at least this patch is still needed:

https://review.whamcloud.com/#/c/40302/

Comment by Mikhail Pershin [ 24/Nov/20 ]

James, also in 2_12 we ar missing two LVB fixes which are also about correct file size:

https://review.whamcloud.com/40739

https://review.whamcloud.com/40740

 

Comment by Mikhail Pershin [ 24/Nov/20 ]

And another missong patch that can be related:

https://review.whamcloud.com/#/c/40223/

Generated at Sat Feb 10 02:51:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.