Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18311

interop: sanity test_312: FAIL: blksz error, actual 4096, expected: 2 * 1 * 4096

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/37931af2-71b5-415f-8fcd-87eb522e6a34

      test_312 failed with the following error:

      CMD: trevis-69vm3 zdb -e -p /dev/vg_Role_OSS -ddddd lustre-ost3/ost3
      CMD: trevis-69vm3 zdb -e -p /dev/vg_Role_OSS -ddddd lustre-ost3/ost3 646 16837 17009 17837 17869 19718 20845 20938 20970 21005 21037 21293 21325 21357 21389 21421 21453 21485 21517 21549 21951 22048 23138 23489
      1+0 records in
      1+0 records out
      4096 bytes (4.1 kB, 4.0 KiB) copied, 0.256961 s, 15.9 kB/s
      CMD: trevis-69vm3 zdb -e -p /dev/vg_Role_OSS -dddd lustre-ost3/ost3 23489
       sanity test_312: @@@@@@ FAIL: blksz error, actual 4096,  expected: 2 * 1 * 4096
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_15/94 - 4.18.0-477.27.1.el8_8.x86_64
      servers: https://build.whamcloud.com/job/lustre-master/4581 - 5.14.0-427.31.1_lustre.el9.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_312 - blksz error, actual 4096, expected: 2 * 1 * 4096

      Attachments

        Issue Links

          Activity

            [LU-18311] interop: sanity test_312: FAIL: blksz error, actual 4096, expected: 2 * 1 * 4096

            It looks like sanity test_312 was modified by patch https://review.whamcloud.com/46293 "LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000" (commit v2_15_53-67-geaae465556 on master on 2023-01-19, commit v2_15_2-37-g1a337b4a5b on b2_15 on 2023-04-11) to fix a test issue with the MDS. It was previously excluded on ZFS due to LU-9054.

            However, I don't see how that change is affecting interop testing, since it is present on both branches, and the test failure appears more related to the ZFS blocksize detection (4096 bytes vs 8192 bytes), which I think is unrelated to the patch (which is changing how ZFS objects are identified. It also doesn't explain why these tests are not failing under normal testing (both test_312 and zfs_get_blksz() functions are the same between branches).

            There was another change to how ZFS is detecting blocksize for fragmented file IO patterns in patch https://review.whamcloud.com/47768 "LU-15963 osd-zfs: use contiguous chunk to grow blocksize" (commit v2_15_63-134-gdacc4b6d38 only on master) that might affect the test results. Looking at the test results, there are intermittent failures with b2_15 servers and clients but never with master servers (regardless of client version). So it seems this is just an intermittently failing test on b2_15?

            I see that this subtest started failing multiple times a day since 2024-06-10, which is (coincidentally?) the same day that the LU-15963 patch landed on master (2024-06-10), though it didn't modify the test case at all, so it isn't clear why this would only affect b2_15 server testing but not normal master review testing?

            I'm tempted to just skip this subtest during interop from b2_15 clients and master servers (maybe via LU-18356), since I don't see how this can be a regression on master.

            adilger Andreas Dilger added a comment - It looks like sanity test_312 was modified by patch https://review.whamcloud.com/46293 " LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000 " (commit v2_15_53-67-geaae465556 on master on 2023-01-19, commit v2_15_2-37-g1a337b4a5b on b2_15 on 2023-04-11) to fix a test issue with the MDS. It was previously excluded on ZFS due to LU-9054 . However, I don't see how that change is affecting interop testing, since it is present on both branches, and the test failure appears more related to the ZFS blocksize detection (4096 bytes vs 8192 bytes), which I think is unrelated to the patch (which is changing how ZFS objects are identified. It also doesn't explain why these tests are not failing under normal testing (both test_312 and zfs_get_blksz() functions are the same between branches). There was another change to how ZFS is detecting blocksize for fragmented file IO patterns in patch https://review.whamcloud.com/47768 " LU-15963 osd-zfs: use contiguous chunk to grow blocksize " (commit v2_15_63-134-gdacc4b6d38 only on master) that might affect the test results. Looking at the test results, there are intermittent failures with b2_15 servers and clients but never with master servers (regardless of client version). So it seems this is just an intermittently failing test on b2_15? I see that this subtest started failing multiple times a day since 2024-06-10, which is (coincidentally?) the same day that the LU-15963 patch landed on master (2024-06-10), though it didn't modify the test case at all, so it isn't clear why this would only affect b2_15 server testing but not normal master review testing? I'm tempted to just skip this subtest during interop from b2_15 clients and master servers (maybe via LU-18356 ), since I don't see how this can be a regression on master.
            yujian Jian Yu added a comment -

            Lustre 2.16.0 RC5 rolling-downgrade-client1-zfs test session:
            https://testing.whamcloud.com/test_sets/c937e793-aabc-4686-a2fa-112b4b4ad70f

            yujian Jian Yu added a comment - Lustre 2.16.0 RC5 rolling-downgrade-client1-zfs test session: https://testing.whamcloud.com/test_sets/c937e793-aabc-4686-a2fa-112b4b4ad70f
            yujian Jian Yu added a comment -

            The failure occurred consistently in the following test sessions:
            lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs
            lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-mds-zfs
            lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-client1-zfs
            lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs
            lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs
            lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-mds-zfs
            lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-client1-zfs
            lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs
            lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-mds-zfs
            lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-client1-zfs
            lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs
            lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs

            yujian Jian Yu added a comment - The failure occurred consistently in the following test sessions: lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-mds-zfs lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-client1-zfs lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-mds-zfs lustre-master-el8.10-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-client1-zfs lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-mds-zfs lustre-master-el8.8-x86_64_lustre-b2_15-el8.8-x86_64-rolling-downgrade-client1-zfs lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs lustre-master-el9.3-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-mds-zfs lustre-master-el9.4-x86_64_lustre-b2_15-el8.8-x86_64-rolling-upgrade-oss-zfs
            yujian Jian Yu added a comment -

            The test started failing in b2_15<->master rolling-downgrade-client1-zfs test session on 2024-01-21:
            https://testing.whamcloud.com/sub_tests/add4fef0-cf96-4335-8914-a1d24c9cd629

            yujian Jian Yu added a comment - The test started failing in b2_15<->master rolling-downgrade-client1-zfs test session on 2024-01-21: https://testing.whamcloud.com/sub_tests/add4fef0-cf96-4335-8914-a1d24c9cd629

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: