Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7103

ost-pools test_7a: test failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • client and server: lustre-master build # 3167 RHEL7.1
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/30ff2db8-5288-11e5-920d-5254006e85c2.

      The sub-test test_7a failed with the following error:

      test failed to respond and timed out
      

      test hang but cannot find useful information

      Attachments

        Issue Links

          Activity

            [LU-7103] ost-pools test_7a: test failed to respond and timed out

            Landed for 2.8.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17350/
            Subject: LU-7103 test: avoid cat of /dev/urandom
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cddbef5fae44d09756639c4cd9de62f28f5b34cf

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17350/ Subject: LU-7103 test: avoid cat of /dev/urandom Project: fs/lustre-release Branch: master Current Patch Set: Commit: cddbef5fae44d09756639c4cd9de62f28f5b34cf

            Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/17350
            Subject: LU-7103 test: avoid cat of /dev/urandom
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 72bf3691eb56e3b187b73ec1bb4a173feab81dd1

            gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/17350 Subject: LU-7103 test: avoid cat of /dev/urandom Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 72bf3691eb56e3b187b73ec1bb4a173feab81dd1

            I would suspect that cat /dev/urandom is getting blocked for some reason in the VM. Please add a patch that removes the use of this and instead use something like echo $$$RANDOM$RANDOM which should generate a string between 3 and 15 characters long that can be cut shorter if needed.

            adilger Andreas Dilger added a comment - I would suspect that cat /dev/urandom is getting blocked for some reason in the VM. Please add a patch that removes the use of this and instead use something like echo $$$RANDOM$RANDOM which should generate a string between 3 and 15 characters long that can be cut shorter if needed.
            bobijam Zhenyu Xu added a comment - - edited

            The debug patch hit two instances of the failure (https://testing.hpdd.intel.com/test_sessions/3eeb4454-6bb8-11e5-b85e-5254006e85c2), while only client2 test log shows following info

            == ost-pools test 7a: create various pool name ======================================================= 22:18:26 (1444083506)
            '[' 8 -lt 2 ']'
            mkdir -p /mnt/lustre/d7a.ost-pools
            for i in 1 9 15
            cat /dev/urandom
            tr -dc a-zA-Z0-9
            fold -w 1
            head -n 1
            

            No other logs show any trace of the shell command echo, don't understand it.

            bobijam Zhenyu Xu added a comment - - edited The debug patch hit two instances of the failure ( https://testing.hpdd.intel.com/test_sessions/3eeb4454-6bb8-11e5-b85e-5254006e85c2 ), while only client2 test log shows following info == ost-pools test 7a: create various pool name ======================================================= 22:18:26 (1444083506) '[' 8 -lt 2 ']' mkdir -p /mnt/lustre/d7a.ost-pools for i in 1 9 15 cat /dev/urandom tr -dc a-zA-Z0-9 fold -w 1 head -n 1 No other logs show any trace of the shell command echo, don't understand it.
            bobijam Zhenyu Xu added a comment -

            Bob,

            I meant other TIMEOUT failure of ost-pools tests, since test_7a was added in LU-6234 patch, so you would not find test_7a failure before that time.

            bobijam Zhenyu Xu added a comment - Bob, I meant other TIMEOUT failure of ost-pools tests, since test_7a was added in LU-6234 patch, so you would not find test_7a failure before that time.
            bogl Bob Glossman (Inactive) added a comment - bobijam, you say in an earlier comment that there are failures from before the LU-6234 patch landed. I can't find any. Searching https://testing.hpdd.intel.com/sub_tests/query?commit=Update+results&page=2&sub_test%5Bquery_bugs%5D=&sub_test%5Bstatus%5D=TIMEOUT&sub_test%5Bsub_test_script_id%5D=e1c3310c-da20-11e4-b0f3-5254006e85c2&test_node%5Barchitecture_type_id%5D=&test_node%5Bdistribution_type_id%5D=&test_node%5Bfile_system_type_id%5D=&test_node%5Blustre_branch_id%5D=&test_node%5Bos_type_id%5D=&test_node_network%5Bnetwork_type_id%5D=&test_session%5Bquery_date%5D=&test_session%5Bquery_recent_period%5D=&test_session%5Btest_group%5D=&test_session%5Btest_host%5D=&test_session%5Buser_id%5D=&test_set%5Btest_set_script_id%5D=6bea3250-3db2-11e0-80c0-52540025f9af&utf8= ✓ reports the earliest instance as 8/20. That is after the LU-6234 patch landed.
            bobijam Zhenyu Xu added a comment -

            I've pushed the verbose echo shell debug patch at http://review.whamcloud.com/16676

            bobijam Zhenyu Xu added a comment - I've pushed the verbose echo shell debug patch at http://review.whamcloud.com/16676

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16676
            Subject: LU-7103 debug: add verbose shell echo
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bb1f19082b2a227daa7eee3ba2840b84e479e3b0

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16676 Subject: LU-7103 debug: add verbose shell echo Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bb1f19082b2a227daa7eee3ba2840b84e479e3b0

            Bobijam, I'm not so sure I agree that this timeout wasn't added by test_7a itself. There are a bunch of test_11 timeouts from 2015-07-25 to 2015-08-05, but those all related to a single patch (version 3 of http://review.whamcloud.com/15767) which was no longer hit for any of the later versions of that patch.

            adilger Andreas Dilger added a comment - Bobijam, I'm not so sure I agree that this timeout wasn't added by test_7a itself. There are a bunch of test_11 timeouts from 2015-07-25 to 2015-08-05, but those all related to a single patch (version 3 of http://review.whamcloud.com/15767 ) which was no longer hit for any of the later versions of that patch.

            Bob, can you please push a patch to add set -vx at the start of test_7a and set +vx at the end, and add Test-Parameters: testlist=ost-pools,ost-pools,... to run this test maybe 10 times (if you put them on two lines they would run in parallel). Then we might see what the test is doing when it is failing. The current logs are not helpful.

            adilger Andreas Dilger added a comment - Bob, can you please push a patch to add set -vx at the start of test_7a and set +vx at the end, and add Test-Parameters: testlist=ost-pools,ost-pools,... to run this test maybe 10 times (if you put them on two lines they would run in parallel). Then we might see what the test is doing when it is failing. The current logs are not helpful.

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: