Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.10.6
-
None
-
3
-
9223372036854775807
Description
The sanityn test suite fails, but no subtests fail. If you look at the end of the suite_stdout log, we see that one of the directories that test_80b created cannot be deleted
03:30:00:== sanityn test complete, duration 3688 sec ========================================================== 03:29:56 (1501730996) 03:30:11:rm: cannot remove '/mnt/lustre/d80b.sanityn/migrate_dir/link_file': Stale file handle 03:30:11: sanityn : @@@@@@ FAIL: remove sub-test dirs failed 03:30:11: Trace dump: 03:30:11: = /usr/lib64/lustre/tests/test-framework.sh:4980:error() 03:30:11: = /usr/lib64/lustre/tests/test-framework.sh:4499:check_and_cleanup_lustre() 03:30:11: = /usr/lib64/lustre/tests/sanityn.sh:4012:main()
test 80b does not fail, but we see the following in the test_log:
16:02:42:== sanityn test 80b: Accessing directory during migration ============================================ 16:02:39 (1501516959) 16:02:42:start migration thread 11920 16:02:42:accessing the migrating directory for 5 minutes... 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:...10 seconds 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle … 16:03:41:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:41:...60 seconds 16:03:41:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:45:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:45:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:diff: /mnt/lustre2/d80b.sanityn/migrate_dir/file1: No such file or directory 16:03:46:access file1 fails 16:03:46:Resetting fail_loc on all nodes.../usr/lib64/lustre/tests/test-framework.sh: line 3146: 11920 Killed ( while true; do 16:03:46: mdt_idx=$((RANDOM % MDSCOUNT)); $LFS migrate -m $mdt_idx $migrate_dir1 &>/dev/null || rc=$?; [ $rc -ne 0 -o $rc -ne 16 ] || break; 16:03:46:done ) 16:03:46:CMD: trevis-3vm1.trevis.hpdd.intel.com,trevis-3vm2,trevis-3vm3,trevis-3vm4,trevis-3vm8 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null 16:03:46:done.
We’ve seen this test suite failure with a Lustre setup with DNE:
https://testing.hpdd.intel.com/test_sets/e98fd2a4-7826-11e7-9a7b-5254006e85c2
https://testing.hpdd.intel.com/test_sets/e5ffaca8-763d-11e7-bbe0-5254006e85c2
https://testing.hpdd.intel.com/test_sets/42bd9ada-7289-11e7-bb95-5254006e85c2
Attachments
Issue Links
- is related to
-
LU-10553 d23b.replay-dual: Directory not empty, FAIL: remove sub-test dirs failed
-
- Open
-
-
LU-10789 lustre-rsync-test: @@@@@@ FAIL: remove sub-test dirs failed
-
- Open
-
-
LU-10690 sanity-hsm: remove sub-test dirs failed
-
- Resolved
-
-
LU-9927 sanityn fails on clean up with 'FAIL: remove sub-test dirs failed'
-
- Closed
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...