Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1436

Test failure on test suite sanity, subtest test_24w

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0
    • Lustre 2.3.0
    • None
    • 3
    • 4518

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/78395658-a4b3-11e1-adce-52540035b04c.

      The sub-test test_24w failed with the following error:

      == sanity test 24w: Reading a file larger than 4Gb =================================================== 14:31:53 (1337635913)
      1+0 records in
      1+0 records out
      1048576 bytes (1.0 MB) copied, 0.0721251 seconds, 14.5 MB/s
      1+0 records in
      1+0 records out
      234852 bytes (235 kB) copied, 0.0710902 seconds, 3.3 MB/s
      0+0 records in
      0+0 records out
      0 bytes (0 B) copied, 0.00325134 seconds, 0.0 kB/s
      sanity test_24w: @@@@@@ FAIL: Error reading at the end of the file f24w

      Info required for matching: sanity 24w

      Attachments

        Issue Links

          Activity

            [LU-1436] Test failure on test suite sanity, subtest test_24w
            pjones Peter Jones added a comment -

            Landed for 2.3

            pjones Peter Jones added a comment - Landed for 2.3

            status update:
            the tests against the patch has passed and is still under inspection(currently one +)

            hongchao.zhang Hongchao Zhang added a comment - status update: the tests against the patch has passed and is still under inspection(currently one +)

            could someone check lustre builds with --disable-liblustre please ?

            bzzz Alex Zhuravlev added a comment - could someone check lustre builds with --disable-liblustre please ?
            hongchao.zhang Hongchao Zhang added a comment - the patch is tracked at http://review.whamcloud.com/#change,3072

            Marking for a blocker on 2.1.2 as well.

            adilger Andreas Dilger added a comment - Marking for a blocker on 2.1.2 as well.
            pjones Peter Jones added a comment -

            Hongchao

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please look into this one? Thanks Peter

            Unfortunately, it was my proposal that caused this problem. I'm surprised that it only caused so few issues during testing.

            The patch is fairly straight forward - change libcfs_log_return() to return long long. Even with sign extension, the original value should be OK after casting the value down to the original type again.

            That must mean the same problem also exists for 32-bit clients on master. Are we no longer doing any 32-bit testing on master? I thought we would drop i686 server and RHEL5 server build/test, but I thought we would keep RHEL5/i686 client testing going to ensure we have at least some 32-bit test coverage.

            adilger Andreas Dilger added a comment - Unfortunately, it was my proposal that caused this problem. I'm surprised that it only caused so few issues during testing. The patch is fairly straight forward - change libcfs_log_return() to return long long. Even with sign extension, the original value should be OK after casting the value down to the original type again. That must mean the same problem also exists for 32-bit clients on master. Are we no longer doing any 32-bit testing on master? I thought we would drop i686 server and RHEL5 server build/test, but I thought we would keep RHEL5/i686 client testing going to ensure we have at least some 32-bit test coverage.
            yujian Jian Yu added a comment -

            The problem here was introduced by LU-969 http://review.whamcloud.com/2668

            Yes. After it was committed on 11 May, the sanity tests 24w,34

            {c,d,g}

            , started failing on i686 client. Here are the search results of the historical Maloo reports:

            test 24w: http://tinyurl.com/cf9uvj2
            test 34c: http://tinyurl.com/clzdw2k

            yujian Jian Yu added a comment - The problem here was introduced by LU-969 http://review.whamcloud.com/2668 Yes. After it was committed on 11 May, the sanity tests 24w,34 {c,d,g} , started failing on i686 client. Here are the search results of the historical Maloo reports: test 24w: http://tinyurl.com/cf9uvj2 test 34c: http://tinyurl.com/clzdw2k
            green Oleg Drokin added a comment -

            The problem here was introduced by LU-969 http://review.whamcloud.com/2668

            +long libcfs_log_return(struct libcfs_debug_msg_data *, long rc);

            This truncates all return values passed through RETURN() macro to long, which is only 32 bit on i686.

            green Oleg Drokin added a comment - The problem here was introduced by LU-969 http://review.whamcloud.com/2668 +long libcfs_log_return(struct libcfs_debug_msg_data *, long rc); This truncates all return values passed through RETURN() macro to long, which is only 32 bit on i686.

            I would say that this is a blocker for the 2.1.2 release, and is likely caused by the same problem as LU-1437.

            Can you check some of the earlier builds to see when it was introduced? Does it fail repeatedly, or is it intermittent?

            adilger Andreas Dilger added a comment - I would say that this is a blocker for the 2.1.2 release, and is likely caused by the same problem as LU-1437 . Can you check some of the earlier builds to see when it was introduced? Does it fail repeatedly, or is it intermittent?

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: