[LU-480] Test failure on test suite sanityn: LBUG/LASSERT detected Created: 04/Jul/11 Updated: 07/Jul/11 Resolved: 07/Jul/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6590 |
| Description |
|
This issue was created by maloo for yujian <yujian@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/8771913e-a451-11e0-b786-52540025f9af. The sub-test test_36 failed with the following error:
|
| Comments |
| Comment by Jian Yu [ 04/Jul/11 ] |
|
The issue occurred on many sanityn sub-tests: |
| Comment by Zhenyu Xu [ 04/Jul/11 ] |
|
Chris, What is the pdsh full command being used in the autotest? |
| Comment by Chris Gearing (Inactive) [ 04/Jul/11 ] |
|
You'll have to give me a bit more info to enable me to understand that What other output could help you? |
| Comment by Zhenyu Xu [ 04/Jul/11 ] |
|
those error was due to pdsh command times out, and pdsh IMO is set when we call auster.sh. Yu Jian points out that there is a autotest_config.sh or similar script which she saw in Provisioning-1.Provision_1.autotest.client-21vm1.log: I suggest we add "-t 120" option for pdsh, like "pdsh -w -t 120 xxxxxxxxxxxxxxxx" in it. pdsh default use 10 seconds as timeout, -t 120 can extend it to 120 seconds. |
| Comment by Jian Yu [ 04/Jul/11 ] |
|
Per this provisioning log: https://maloo.whamcloud.com/test_logs/bcb00110-a450-11e0-b786-52540025f9af/show_text 12:00:25:(/home/autotest/lab/tools/run_test.sh:55): 12:00:25:cat /home/autotest/lab/tools/mecturk.sh 12:00:25:((/home/autotest/lab/tools/run_test.sh:57): 12:00:25:hostname -s 12:00:25:(/home/autotest/lab/tools/run_test.sh:57): 12:00:25:CPYLINE='scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/' 12:00:27:(/home/autotest/lab/tools/run_test.sh:58): 12:00:27:pdsh -w client-21vm1,client-21vm2 scp client-21vm1:/root/autotest_config.sh /usr/lib64/lustre/tests/cfg/ The PDSH variable was set in /home/autotest/lab/tools/mecturk.sh: $ grep PDSH /home/autotest/lab/tools/mecturk.sh PDSH="pdsh -S -Rrsh -w" Chris, could you please change it to "pdsh -t 120 -S -Rrsh -w" to see whether the upcoming autotest results hit those timeout issues or not? |
| Comment by Chris Gearing (Inactive) [ 04/Jul/11 ] |
|
OK, I've change the pdsh in mecturk, let's see if it stops failing. |
| Comment by Peter Jones [ 07/Jul/11 ] |
|
Closing for now. Reopen if this reoccurs with the change in place |