Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for jianyu <yujian@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/37931af2-71b5-415f-8fcd-87eb522e6a34
test_312 failed with the following error:
CMD: trevis-69vm3 zdb -e -p /dev/vg_Role_OSS -ddddd lustre-ost3/ost3 CMD: trevis-69vm3 zdb -e -p /dev/vg_Role_OSS -ddddd lustre-ost3/ost3 646 16837 17009 17837 17869 19718 20845 20938 20970 21005 21037 21293 21325 21357 21389 21421 21453 21485 21517 21549 21951 22048 23138 23489 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.256961 s, 15.9 kB/s CMD: trevis-69vm3 zdb -e -p /dev/vg_Role_OSS -dddd lustre-ost3/ost3 23489 sanity test_312: @@@@@@ FAIL: blksz error, actual 4096, expected: 2 * 1 * 4096
Test session details:
clients: https://build.whamcloud.com/job/lustre-b2_15/94 - 4.18.0-477.27.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-master/4581 - 5.14.0-427.31.1_lustre.el9.x86_64
<<Please provide additional information about the failure here>>
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_312 - blksz error, actual 4096, expected: 2 * 1 * 4096
Attachments
Issue Links
- is related to
-
LU-9054 sanity test_312: FAIL: blksz error: , expected: 4096
-
- Reopened
-
-
LU-15963 sanityn test_56b: OSS OOM with ZFS
-
- Reopened
-
-
LU-18356 test-framework to fetch except list from server for interop testing
-
- Resolved
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
It looks like sanity test_312 was modified by patch https://review.whamcloud.com/46293 "
LU-14692tests: allow FID_SEQ_NORMAL for MDT0000" (commit v2_15_53-67-geaae465556 on master on 2023-01-19, commit v2_15_2-37-g1a337b4a5b on b2_15 on 2023-04-11) to fix a test issue with the MDS. It was previously excluded on ZFS due to LU-9054.However, I don't see how that change is affecting interop testing, since it is present on both branches, and the test failure appears more related to the ZFS blocksize detection (4096 bytes vs 8192 bytes), which I think is unrelated to the patch (which is changing how ZFS objects are identified. It also doesn't explain why these tests are not failing under normal testing (both test_312 and zfs_get_blksz() functions are the same between branches).
There was another change to how ZFS is detecting blocksize for fragmented file IO patterns in patch https://review.whamcloud.com/47768 "LU-15963 osd-zfs: use contiguous chunk to grow blocksize" (commit v2_15_63-134-gdacc4b6d38 only on master) that might affect the test results. Looking at the test results, there are intermittent failures with b2_15 servers and clients but never with master servers (regardless of client version). So it seems this is just an intermittently failing test on b2_15?
I see that this subtest started failing multiple times a day since 2024-06-10, which is (coincidentally?) the same day that the LU-15963 patch landed on master (2024-06-10), though it didn't modify the test case at all, so it isn't clear why this would only affect b2_15 server testing but not normal master review testing?
I'm tempted to just skip this subtest during interop from b2_15 clients and master servers (maybe via
LU-18356), since I don't see how this can be a regression on master.