[LU-7103] ost-pools test_7a: test failed to respond and timed out Created: 04/Sep/15 Updated: 02/Dec/15 Resolved: 02/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client and server: lustre-master build # 3167 RHEL7.1 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/30ff2db8-5288-11e5-920d-5254006e85c2. The sub-test test_7a failed with the following error: test failed to respond and timed out test hang but cannot find useful information |
| Comments |
| Comment by Peter Jones [ 04/Sep/15 ] |
|
Bob You could please try and identify which commit introduced this recent regression? Thanks Peter |
| Comment by Andreas Dilger [ 04/Sep/15 ] |
|
This first started failing on 2015-08-20 13:07:43 so is likely related to a patch that landed just before that time. |
| Comment by Bob Glossman (Inactive) [ 10/Sep/15 ] |
|
I'm quite suspicious of the fix for " commit 7c25eb1ba2b1db3009a0e88b3ecf229134f8ac92 |
| Comment by Peter Jones [ 10/Sep/15 ] |
|
Bobijam Do you think that this issue could be related to the Thanks Peter |
| Comment by Zhenyu Xu [ 11/Sep/15 ] |
|
there are ost-pools test TIMEOUT failure before |
| Comment by Andreas Dilger [ 29/Sep/15 ] |
|
This is now the top failing test in autotest. |
| Comment by Andreas Dilger [ 29/Sep/15 ] |
|
Bob, can you please push a patch to add set -vx at the start of test_7a and set +vx at the end, and add Test-Parameters: testlist=ost-pools,ost-pools,... to run this test maybe 10 times (if you put them on two lines they would run in parallel). Then we might see what the test is doing when it is failing. The current logs are not helpful. |
| Comment by Andreas Dilger [ 29/Sep/15 ] |
|
Bobijam, I'm not so sure I agree that this timeout wasn't added by test_7a itself. There are a bunch of test_11 timeouts from 2015-07-25 to 2015-08-05, but those all related to a single patch (version 3 of http://review.whamcloud.com/15767) which was no longer hit for any of the later versions of that patch. |
| Comment by Gerrit Updater [ 30/Sep/15 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16676 |
| Comment by Zhenyu Xu [ 30/Sep/15 ] |
|
I've pushed the verbose echo shell debug patch at http://review.whamcloud.com/16676 |
| Comment by Bob Glossman (Inactive) [ 05/Oct/15 ] |
|
bobijam, you say in an earlier comment that there are failures from before the reports the earliest instance as 8/20. That is after the |
| Comment by Zhenyu Xu [ 08/Oct/15 ] |
|
Bob, I meant other TIMEOUT failure of ost-pools tests, since test_7a was added in |
| Comment by Zhenyu Xu [ 08/Oct/15 ] |
|
The debug patch hit two instances of the failure (https://testing.hpdd.intel.com/test_sessions/3eeb4454-6bb8-11e5-b85e-5254006e85c2), while only client2 test log shows following info == ost-pools test 7a: create various pool name ======================================================= 22:18:26 (1444083506) '[' 8 -lt 2 ']' mkdir -p /mnt/lustre/d7a.ost-pools for i in 1 9 15 cat /dev/urandom tr -dc a-zA-Z0-9 fold -w 1 head -n 1 No other logs show any trace of the shell command echo, don't understand it. |
| Comment by Andreas Dilger [ 24/Nov/15 ] |
|
I would suspect that cat /dev/urandom is getting blocked for some reason in the VM. Please add a patch that removes the use of this and instead use something like echo $$$RANDOM$RANDOM which should generate a string between 3 and 15 characters long that can be cut shorter if needed. |
| Comment by Gerrit Updater [ 24/Nov/15 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/17350 |
| Comment by Gerrit Updater [ 02/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17350/ |
| Comment by Joseph Gmitter (Inactive) [ 02/Dec/15 ] |
|
Landed for 2.8.0 |