Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2620

Failure on test suite replay-ost-single test_6: test_6 failed with 1

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0, Lustre 2.1.5
    • Lustre 2.4.0, Lustre 2.1.4, Lustre 1.8.9
    • 3
    • 6130

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/478c299a-5ef8-11e2-b507-52540035b04c.

      The sub-test test_6 failed with the following error:

      test_6 failed with 1

      == replay-ost-single test 6: Fail OST before obd_destroy == 17:27:58 (1358126878)
      Waiting for orphan cleanup...
      CMD: client-32vm3 /usr/sbin/lctl get_param -n osp.*osc*.old_sync_processed
      Waiting for local destroys to complete
      1280+0 records in
      1280+0 records out
      5242880 bytes (5.2 MB) copied, 0.970226 s, 5.4 MB/s
      /mnt/lustre/d0.replay-ost-single/f.replay-ost-single.6
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_layout_gen:     0
      lmm_stripe_offset:  0
      	obdidx		 objid		objid		 group
      	     0	           193	         0xc1	             0
      
      CMD: client-32vm3 lctl set_param fail_loc=0x80000119
      fail_loc=0x80000119
      before: 12650184 after_dd: 13693644
       replay-ost-single test_6: @@@@@@ FAIL: test_6 failed with 1 
      

      Attachments

        Issue Links

          Activity

            [LU-2620] Failure on test suite replay-ost-single test_6: test_6 failed with 1
            yujian Jian Yu added a comment - Lustre client: http://build.whamcloud.com/job/lustre-b1_8/258/ (1.8.9-wc1) Lustre server: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) replay-ost-single test 6 hit the same failure: https://maloo.whamcloud.com/test_sets/6c0c8652-15c3-11e3-87cb-52540035b04c
            bogl Bob Glossman (Inactive) added a comment - - edited

            I think this failure is expected. The patch of http://review.whamcloud.com/5042 was cherry picked to b2_1, which fixed the problem for 2.1/2.4 interop. This was never done for b1_8 as far as I can see so the problem was never fixed for 1.8.9/2.4 interop.

            bogl Bob Glossman (Inactive) added a comment - - edited I think this failure is expected. The patch of http://review.whamcloud.com/5042 was cherry picked to b2_1, which fixed the problem for 2.1/2.4 interop. This was never done for b1_8 as far as I can see so the problem was never fixed for 1.8.9/2.4 interop.
            sarah Sarah Liu added a comment -

            Hit this bug in interop between 1.8.9 client and 2.4 server, the server build is #1338 which should include the fix of LU-2903

            https://maloo.whamcloud.com/test_sets/b683542a-948d-11e2-93c6-52540035b04c

            sarah Sarah Liu added a comment - Hit this bug in interop between 1.8.9 client and 2.4 server, the server build is #1338 which should include the fix of LU-2903 https://maloo.whamcloud.com/test_sets/b683542a-948d-11e2-93c6-52540035b04c
            pjones Peter Jones added a comment -

            closing again as the new issue is being tracked under LU-2903

            pjones Peter Jones added a comment - closing again as the new issue is being tracked under LU-2903
            yujian Jian Yu added a comment -

            Hello Oleg,

            Could you please cherry-pick the patch of http://review.whamcloud.com/5042 to Lustre b2_1 branch since the failure occurs in 2.1.4<->2.4.0 interop testing? Thanks.

            yujian Jian Yu added a comment - Hello Oleg, Could you please cherry-pick the patch of http://review.whamcloud.com/5042 to Lustre b2_1 branch since the failure occurs in 2.1.4<->2.4.0 interop testing? Thanks.

            Andreas, Looks like we are on the same page suspecting a different cause.

            bogl Bob Glossman (Inactive) added a comment - Andreas, Looks like we are on the same page suspecting a different cause.

            Let's task LU-2903 for this new failure, since it does appear that this one was only hit when test_5 was being run.

            adilger Andreas Dilger added a comment - Let's task LU-2903 for this new failure, since it does appear that this one was only hit when test_5 was being run.

            Also original failure was seen regardless of fstype. New ones appear to be only on zfs, if I'm not mistaken.

            bogl Bob Glossman (Inactive) added a comment - Also original failure was seen regardless of fstype. New ones appear to be only on zfs, if I'm not mistaken.

            Looks like all of the new failures are on review-zfs test runs, so it may be due to a different cause?

            adilger Andreas Dilger added a comment - Looks like all of the new failures are on review-zfs test runs, so it may be due to a different cause?

            I think the failures being seen now look different. The original reported bug showed only test_6 failing, and that only after running test_5 due to SLOW=yes. New failures now show test_6 and following tests all failing. Possibly an entirely new underlying cause.

            bogl Bob Glossman (Inactive) added a comment - I think the failures being seen now look different. The original reported bug showed only test_6 failing, and that only after running test_5 due to SLOW=yes. New failures now show test_6 and following tests all failing. Possibly an entirely new underlying cause.

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: