[LU-5810] sanity: rm: cannot remove `/mnt/lustre/d0.tar-shadow-23vm5/etc/init.d/rc3.d': Directory not empty Created: 27/Oct/14 Updated: 11/Sep/20 Resolved: 11/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16296 | ||||||||
| Description |
|
This issue was created by maloo for John Hammond <john.hammond@intel.com> -----============= acceptance-small: sanity ============----- Sat Oct 25 17:04:33 UTC 2014
Running: bash /usr/lib64/lustre/tests/sanity.sh
== sanity test complete, duration -o sec == 17:04:34 (1414256674)
CMD: shadow-23vm10.shadow.whamcloud.com,shadow-23vm9 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_config_client /mnt/lustre
shadow-23vm9: Checking config lustre mounted on /mnt/lustre
shadow-23vm10: Checking config lustre mounted on /mnt/lustre
Checking servers environments
CMD: shadow-23vm11 running=\$(grep -c /mnt/ost1' ' /proc/mounts);
mpts=\$(mount | grep -c /mnt/ost1' ');
if [ \$running -ne \$mpts ]; then
echo \$(hostname) env are INSANE!;
exit 1;
fi
...
CMD: shadow-23vm12 lctl get_param -n timeout
Using TIMEOUT=20
CMD: shadow-23vm12 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
CMD: shadow-23vm10.shadow.whamcloud.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
disable quota as required
CMD: shadow-23vm11,shadow-23vm12,shadow-23vm8,shadow-23vm9 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4
CMD: shadow-23vm11,shadow-23vm12,shadow-23vm8 /usr/sbin/lctl set_param osd-ldiskfs.track_declares_assert=1 || true
osd-ldiskfs.track_declares_assert=1
osd-ldiskfs.track_declares_assert=1
osd-ldiskfs.track_declares_assert=1
rm: cannot remove `/mnt/lustre/d0.tar-shadow-23vm5/etc/init.d/rc3.d': Directory not empty
status script Total(sec) E(xcluded) S(low)
------------------------------------------------------------------------------------
test-framework exiting on error
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/6ad49578-5c8e-11e4-b08a-5254006e85c2. |
| Comments |
| Comment by Andreas Dilger [ 28/Oct/14 ] |
|
It is strange that there is a shadow-23vm5 directory that is not empty, yet according to the config for the test session, the nodes listed are shadow-23vm[8-12]. This implies that the shadow-23vm5 node mounted the wrong filesystem for some reason and proceeded to write there. |
| Comment by Andreas Dilger [ 28/Oct/14 ] |
|
A similar configuration problem appeared in They have been marked duplicates of TEI-1993. |
| Comment by Minh Diep [ 28/Oct/14 ] |
|
After researched, I doubt this is a problem where we cross mount. |
| Comment by Andreas Dilger [ 28/Oct/14 ] |
|
Can you please check if shadow-23vm5 is reserved for some user job, or if it is some stuck or forgotten process that is still running there? That also happened with |
| Comment by Minh Diep [ 29/Oct/14 ] |
|
shadow-23vm5 has always been in autotest. at the time around the failure shadow-23vm5 wasn't running recover-mds-scale. |
| Comment by James Nunez (Inactive) [ 19/Dec/14 ] |
|
I've experienced a similar problem on the OpenSFS cluster; the test framework can't remove a directory from a previous test, not another node/VM. If you think this is a different problem, I can open a new ticket for this. Results are at https://testing.hpdd.intel.com/test_sessions/f13ba544-8618-11e4-ac52-5254006e85c2 replay-dual had several tests fail including 22a and 22c. When replay-vbr starts up, no tests run due to the remove at the top of the script. This remove fails with rm: cannot remove `/lustre/scratch/d22a.replay-dual': Directory not empty rm: cannot remove `/lustre/scratch/d22c.replay-dual': Directory not empty status script Total(sec) E(xcluded) S(low) ------------------------------------------------------------------------------------ test-framework exiting on error replay-vbr is marked as FAIL with 0/0 subtests passed. Then insanity starts running and runs 16 tests. The test suite is marked as FAIL with no subtest actually failing, but the remove during the test cleanup must have triggered the failure. In the test logs, we see: == insanity test complete, duration 2157 sec == 14:50:03 (1418770203) rm: cannot remove `/lustre/scratch/d22a.replay-dual': Directory not empty rm: cannot remove `/lustre/scratch/d22c.replay-dual': Directory not empty insanity : @@@@@@ FAIL: remove sub-test dirs failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4665:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4696:error() = /usr/lib64/lustre/tests/test-framework.sh:4210:check_and_cleanup_lustre() = /usr/lib64/lustre/tests/insanity.sh:781:main() Dumping lctl log to /tmp/test_logs/2014-12-15/220919/insanity..*.1418770204.log |
| Comment by Andreas Dilger [ 19/May/16 ] |
|
Debug patch for this: LU-5810 tests: add client hostname to lctl mark Improve debug messages to include the originating hostname. Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Change-Id: I441bf8294c38135276a5a0f0853dbebf4358c563 |
| Comment by Gerrit Updater [ 27/May/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13113/ |
| Comment by Gerrit Updater [ 21/Jun/16 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/20894 |
| Comment by Gerrit Updater [ 22/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20894/ |