Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7103

ost-pools test_7a: test failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • client and server: lustre-master build # 3167 RHEL7.1
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/30ff2db8-5288-11e5-920d-5254006e85c2.

      The sub-test test_7a failed with the following error:

      test failed to respond and timed out
      

      test hang but cannot find useful information

      Attachments

        Issue Links

          Activity

            [LU-7103] ost-pools test_7a: test failed to respond and timed out

            I would suspect that cat /dev/urandom is getting blocked for some reason in the VM. Please add a patch that removes the use of this and instead use something like echo $$$RANDOM$RANDOM which should generate a string between 3 and 15 characters long that can be cut shorter if needed.

            adilger Andreas Dilger added a comment - I would suspect that cat /dev/urandom is getting blocked for some reason in the VM. Please add a patch that removes the use of this and instead use something like echo $$$RANDOM$RANDOM which should generate a string between 3 and 15 characters long that can be cut shorter if needed.
            bobijam Zhenyu Xu added a comment - - edited

            The debug patch hit two instances of the failure (https://testing.hpdd.intel.com/test_sessions/3eeb4454-6bb8-11e5-b85e-5254006e85c2), while only client2 test log shows following info

            == ost-pools test 7a: create various pool name ======================================================= 22:18:26 (1444083506)
            '[' 8 -lt 2 ']'
            mkdir -p /mnt/lustre/d7a.ost-pools
            for i in 1 9 15
            cat /dev/urandom
            tr -dc a-zA-Z0-9
            fold -w 1
            head -n 1
            

            No other logs show any trace of the shell command echo, don't understand it.

            bobijam Zhenyu Xu added a comment - - edited The debug patch hit two instances of the failure ( https://testing.hpdd.intel.com/test_sessions/3eeb4454-6bb8-11e5-b85e-5254006e85c2 ), while only client2 test log shows following info == ost-pools test 7a: create various pool name ======================================================= 22:18:26 (1444083506) '[' 8 -lt 2 ']' mkdir -p /mnt/lustre/d7a.ost-pools for i in 1 9 15 cat /dev/urandom tr -dc a-zA-Z0-9 fold -w 1 head -n 1 No other logs show any trace of the shell command echo, don't understand it.
            bobijam Zhenyu Xu added a comment -

            Bob,

            I meant other TIMEOUT failure of ost-pools tests, since test_7a was added in LU-6234 patch, so you would not find test_7a failure before that time.

            bobijam Zhenyu Xu added a comment - Bob, I meant other TIMEOUT failure of ost-pools tests, since test_7a was added in LU-6234 patch, so you would not find test_7a failure before that time.
            bogl Bob Glossman (Inactive) added a comment - bobijam, you say in an earlier comment that there are failures from before the LU-6234 patch landed. I can't find any. Searching https://testing.hpdd.intel.com/sub_tests/query?commit=Update+results&page=2&sub_test%5Bquery_bugs%5D=&sub_test%5Bstatus%5D=TIMEOUT&sub_test%5Bsub_test_script_id%5D=e1c3310c-da20-11e4-b0f3-5254006e85c2&test_node%5Barchitecture_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=&test_node%5Bos_type_id%5D=&test_node_network%5Bnetwork_type_id%5D=&test_session%5Bquery_date%5D=&test_session%5Bquery_recent_period%5D=&test_session%5Btest_group%5D=&test_session%5Btest_host%5D=&test_session%5Buser_id%5D=&test_set%5Btest_set_script_id%5D=6bea3250-3db2-11e0-80c0-52540025f9af&utf8= ✓ reports the earliest instance as 8/20. That is after the LU-6234 patch landed.
            bobijam Zhenyu Xu added a comment -

            I've pushed the verbose echo shell debug patch at http://review.whamcloud.com/16676

            bobijam Zhenyu Xu added a comment - I've pushed the verbose echo shell debug patch at http://review.whamcloud.com/16676

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16676
            Subject: LU-7103 debug: add verbose shell echo
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bb1f19082b2a227daa7eee3ba2840b84e479e3b0

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16676 Subject: LU-7103 debug: add verbose shell echo Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bb1f19082b2a227daa7eee3ba2840b84e479e3b0

            Bobijam, I'm not so sure I agree that this timeout wasn't added by test_7a itself. There are a bunch of test_11 timeouts from 2015-07-25 to 2015-08-05, but those all related to a single patch (version 3 of http://review.whamcloud.com/15767) which was no longer hit for any of the later versions of that patch.

            adilger Andreas Dilger added a comment - Bobijam, I'm not so sure I agree that this timeout wasn't added by test_7a itself. There are a bunch of test_11 timeouts from 2015-07-25 to 2015-08-05, but those all related to a single patch (version 3 of http://review.whamcloud.com/15767 ) which was no longer hit for any of the later versions of that patch.

            Bob, can you please push a patch to add set -vx at the start of test_7a and set +vx at the end, and add Test-Parameters: testlist=ost-pools,ost-pools,... to run this test maybe 10 times (if you put them on two lines they would run in parallel). Then we might see what the test is doing when it is failing. The current logs are not helpful.

            adilger Andreas Dilger added a comment - Bob, can you please push a patch to add set -vx at the start of test_7a and set +vx at the end, and add Test-Parameters: testlist=ost-pools,ost-pools,... to run this test maybe 10 times (if you put them on two lines they would run in parallel). Then we might see what the test is doing when it is failing. The current logs are not helpful.

            This is now the top failing test in autotest.

            adilger Andreas Dilger added a comment - This is now the top failing test in autotest.
            bobijam Zhenyu Xu added a comment - no, from https://testing.hpdd.intel.com/test_sets/query?utf8=%E2%9C%93&test_set%5Btest_set_script_id%5D=6bea3250-3db2-11e0-80c0-52540025f9af&test_set%5Bstatus%5D=TIMEOUT&test_set%5Bquery_bugs%5D=&test_session%5Btest_host%5D=&test_session%5Btest_group%5D=&test_session%5Buser_id%5D=&test_session%5Bquery_date%5D=&test_session%5Bquery_recent_period%5D=&test_node%5Bos_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Barchitecture_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=&test_node_network%5Bnetwork_type_id%5D=&commit=Update+results there are ost-pools test TIMEOUT failure before LU-6234 patch, and also has test_7a pass ( https://testing.hpdd.intel.com/test_sets/02cddbde-4e9f-11e5-b0b8-5254006e85c2 ), I tend to think that there is a hidden issue other than LU-6234 in these TIMEOUT failure.
            pjones Peter Jones added a comment -

            Bobijam

            Do you think that this issue could be related to the LU-6234 patch?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobijam Do you think that this issue could be related to the LU-6234 patch? Thanks Peter

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: