Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1966

Test failure on test suite replay-ost-single, subtest test_6: Destroys weren't done in 5 sec

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.3.0
    • None
    • 3
    • 3989

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0d43f964-fff5-11e1-9f3c-52540035b04c.

      The sub-test test_6 failed with the following error:

      == replay-ost-single test 6: Fail OST before obd_destroy == 19:17:36 (1347761856)
      Waiting for orphan cleanup...
      CMD: client-28vm4 /usr/sbin/lctl get_param -n obdfilter.*.mds_sync
      Waiting for destroy to be done...
      Waiting 0 secs for destroys to be done.
      Waiting 1 secs for destroys to be done.
      Waiting 2 secs for destroys to be done.
      Waiting 3 secs for destroys to be done.
      Waiting 4 secs for destroys to be done.
      Destroys weren't done in 5 sec.
       replay-ost-single test_6: @@@@@@ FAIL: test_6 failed with 5
      

      Info required for matching: replay-ost-single 6

      Attachments

        Issue Links

          Activity

            [LU-1966] Test failure on test suite replay-ost-single, subtest test_6: Destroys weren't done in 5 sec
            yujian Jian Yu added a comment -

            Lustre Tag: v2_3_0_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
            Distro/Arch: RHEL6.3/x86_64
            Test Group: failover

            The same issue occurred: https://maloo.whamcloud.com/test_sets/c1790860-1509-11e2-9adb-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_3_0_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32 Distro/Arch: RHEL6.3/x86_64 Test Group: failover The same issue occurred: https://maloo.whamcloud.com/test_sets/c1790860-1509-11e2-9adb-52540035b04c
            pjones Peter Jones added a comment -

            I think that there is enough evidence to suggest that this is purely a testing issue. We should still continue to work on resolving this and include a fix for an RC2 if one is ready and we need one, but it would not warrant holding the release on its own merits

            pjones Peter Jones added a comment - I think that there is enough evidence to suggest that this is purely a testing issue. We should still continue to work on resolving this and include a fix for an RC2 if one is ready and we need one, but it would not warrant holding the release on its own merits
            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/23 FAILURE_MODE=HARD autotest run failed: https://maloo.whamcloud.com/test_sets/b5c9fc00-0694-11e2-9b17-52540035b04c manual run passed: https://maloo.whamcloud.com/test_sets/52192992-0716-11e2-ac99-52540035b04c
            bobijam Zhenyu Xu added a comment -

            I checked the patch landed between 09/07 and 09/13, I think nothing about OSS change during that period could cause this issue, I tend to think it's test issue.

            bobijam Zhenyu Xu added a comment - I checked the patch landed between 09/07 and 09/13, I think nothing about OSS change during that period could cause this issue, I tend to think it's test issue.
            yujian Jian Yu added a comment -

            Here is a link to the historical Maloo reports for replay-ost-single test 6 in failover test group:
            http://tinyurl.com/8s62nfj

            As we can see, the last successful run for this test is on 2012-09-07, and the test has been occurring since 2012-09-13.

            yujian Jian Yu added a comment - Here is a link to the historical Maloo reports for replay-ost-single test 6 in failover test group: http://tinyurl.com/8s62nfj As we can see, the last successful run for this test is on 2012-09-07, and the test has been occurring since 2012-09-13.
            bobijam Zhenyu Xu added a comment -

            http://review.whamcloud.com/4067 just for issue reproduction.

            bobijam Zhenyu Xu added a comment - http://review.whamcloud.com/4067 just for issue reproduction.
            yujian Jian Yu added a comment -

            Hi Bobi,
            If you wanna reproduce or debug this issue, you can upload a patch to Gerrit with the following test parameters:

            Test-Parameters: fortestonly envdefinitions=SLOW=yes \
            clientcount=4 osscount=2 mdscount=2 austeroptions=-R \
            failover=true useiscsi=true testlist=replay-ost-single
            
            yujian Jian Yu added a comment - Hi Bobi, If you wanna reproduce or debug this issue, you can upload a patch to Gerrit with the following test parameters: Test-Parameters: fortestonly envdefinitions=SLOW=yes \ clientcount=4 osscount=2 mdscount=2 austeroptions=-R \ failover=true useiscsi=true testlist=replay-ost-single
            bobijam Zhenyu Xu added a comment -

            Status update:

            Booked one toro node (only one is available), ran "bash /usr/lib64/lustre/tests/auster -rsv replay-ost-single", and it finished w/o error.

            bobijam Zhenyu Xu added a comment - Status update: Booked one toro node (only one is available), ran "bash /usr/lib64/lustre/tests/auster -rsv replay-ost-single", and it finished w/o error.
            bobijam Zhenyu Xu added a comment -

            looks like previous tests residue interference.

            OSS debug log

            line:9295 00000100:00100000:0.0:1347761823.542774:0:2427:0:(service.c:1786:ptlrpc_server_handle_req_in()) got req x1413227639926489
            line:9296 00000100:00080000:0.0:1347761823.542777:0:2427:0:(service.c:1000:ptlrpc_update_export_timer()) updating export 3c3577cd-0751-66ec-7591-46dd3a204f77 at 1347761823 exp ffff88007c77a800
            line:9297 00000100:00100000:0.0:1347761823.542788:0:2427:0:(service.c:1961:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_io00_009:3c3577cd-0751-66ec-7591-46dd3a204f77+996:3262:x1413227639926489:12345-10.10.4.172@tcp:6
            ...
            line:10263 00000001:02000400:0.0:1347761856.433208:0:4282:0:(debug.c:445:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl mark == replay-ost-single test 6: Fail OST before obd_destroy == 19:17:36 (1347761856)

            ... pid 2427 leaves no footprint in OSS log thereafter, even after test 6 and test 7 fails, OSS log does not show that it processes the OST_DESTROY request

            bobijam Zhenyu Xu added a comment - looks like previous tests residue interference. OSS debug log line:9295 00000100:00100000:0.0:1347761823.542774:0:2427:0:(service.c:1786:ptlrpc_server_handle_req_in()) got req x1413227639926489 line:9296 00000100:00080000:0.0:1347761823.542777:0:2427:0:(service.c:1000:ptlrpc_update_export_timer()) updating export 3c3577cd-0751-66ec-7591-46dd3a204f77 at 1347761823 exp ffff88007c77a800 line:9297 00000100:00100000:0.0:1347761823.542788:0:2427:0:(service.c:1961:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_io00_009:3c3577cd-0751-66ec-7591-46dd3a204f77+996:3262:x1413227639926489:12345-10.10.4.172@tcp:6 ... line:10263 00000001:02000400:0.0:1347761856.433208:0:4282:0:(debug.c:445:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl mark == replay-ost-single test 6: Fail OST before obd_destroy == 19:17:36 (1347761856) ... pid 2427 leaves no footprint in OSS log thereafter, even after test 6 and test 7 fails, OSS log does not show that it processes the OST_DESTROY request
            pjones Peter Jones added a comment -

            Bobijam

            Could you please look into this one?

            Peter

            pjones Peter Jones added a comment - Bobijam Could you please look into this one? Peter

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: