[LU-7500] lnet-selftest: test failed to respond and timed out Created: 01/Dec/15 Updated: 26/Jun/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
EL7.1 Server/SLES11 SP3 Client |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a417b6d0-945f-11e5-a5ac-5254006e85c2. The sub-test lnet-selftest failed with the following error: test failed to respond and timed out lent-selftest got timed out and no other subtest ran. Could not find any useful information as log files were absent. |
| Comments |
| Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ] |
|
master, build# 3264, 2.7.64 tag |
| Comment by Peter Jones [ 16/Dec/15 ] |
|
Doug Could you please advise on this one? Thanks Peter |
| Comment by Andreas Dilger [ 05/Jan/16 ] |
|
The one suite_stdout log from https://testing.hpdd.intel.com/test_sets/6c6a9940-9f0a-11e5-ba94-5254006e85c2 shows this is a hang at unmount: 16:37:57:CMD: shadow-9vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 16:37:59:CMD: shadow-9vm3 grep -c /mnt/ost5' ' /proc/mounts 16:37:59:Stopping /mnt/ost5 (opts:-f) on shadow-9vm3 16:37:59:CMD: shadow-9vm3 umount -d -f /mnt/ost5 16:37:59:CMD: shadow-9vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 16:37:59:CMD: shadow-9vm3 grep -c /mnt/ost6' ' /proc/mounts 16:37:59:Stopping /mnt/ost6 (opts:-f) on shadow-9vm3 16:37:59:CMD: shadow-9vm3 umount -d -f /mnt/ost6 16:38:00:CMD: shadow-9vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 16:38:00:CMD: shadow-9vm3 grep -c /mnt/ost7' ' /proc/mounts 16:38:00:Stopping /mnt/ost7 (opts:-f) on shadow-9vm3 16:38:00:CMD: shadow-9vm3 umount -d -f /mnt/ost7 17:37:42:********** Timeout by autotest system ********** Unfortunately, there are no console logs from the OST that might indicate what the problem is. |
| Comment by James Nunez (Inactive) [ 06/Jan/16 ] |
|
After speaking with Saurabh, the failure at https://testing.hpdd.intel.com/test_sets/6c6a9940-9f0a-11e5-ba94-5254006e85c2, and that Andreas posted the portion of the suite_stdout, is probably The two remaining failures listed in this ticket so far are for tests between SLES clients and CentOS servers and the only information we have about the failures is from the suite_stdout log: 19:44:42:-----============= acceptance-small: lnet-selftest ============----- Wed Dec 9 18:44:38 PST 2015 19:44:42:Running: bash /usr/lib64/lustre/tests/lnet-selftest.sh 19:44:42:CMD: shadow-14vm12,shadow-14vm7 /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' 19:44:42:CMD: shadow-14vm5,shadow-14vm6 /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@' 19:44:42:Stopping clients: shadow-14vm5,shadow-14vm6 /mnt/lustre (opts:) 19:44:42:CMD: shadow-14vm5,shadow-14vm6 running=\$(grep -c /mnt/lustre' ' /proc/mounts); 19:44:42:if [ \$running -ne 0 ] ; then 19:44:42:echo Stopping client \$(hostname) /mnt/lustre opts:; 19:44:42:lsof /mnt/lustre || need_kill=no; 19:44:42:if [ x != x -a x\$need_kill != xno ]; then 19:44:42: pids=\$(lsof -t /mnt/lustre | sort -u); 19:44:42: if [ -n \"\$pids\" ]; then 19:44:42: kill -9 \$pids; 19:44:42: fi 19:44:42:fi; 19:44:42:while umount /mnt/lustre 2>&1 | grep -q busy; do 19:44:42: echo /mnt/lustre is still busy, wait one second && sleep 1; 19:44:42:done; 19:44:42:fi 20:45:17:********** Timeout by autotest system ********** |
| Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ] |
|
Another instance found for interop tag 2.7.66 - EL6.7 Server/2.7.1 Client, build# 3316 |
| Comment by nasf (Inactive) [ 07/Jun/16 ] |
|
I hit similar trouble in conf-sanity |