[LU-9827] sanityn fail “remove sub-test dirs failed d80b.sanityn/migrate_dir” Created: 03/Aug/17 Updated: 24/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.10.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | James Nunez (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
The sanityn test suite fails, but no subtests fail. If you look at the end of the suite_stdout log, we see that one of the directories that test_80b created cannot be deleted 03:30:00:== sanityn test complete, duration 3688 sec ========================================================== 03:29:56 (1501730996) 03:30:11:rm: cannot remove '/mnt/lustre/d80b.sanityn/migrate_dir/link_file': Stale file handle 03:30:11: sanityn : @@@@@@ FAIL: remove sub-test dirs failed 03:30:11: Trace dump: 03:30:11: = /usr/lib64/lustre/tests/test-framework.sh:4980:error() 03:30:11: = /usr/lib64/lustre/tests/test-framework.sh:4499:check_and_cleanup_lustre() 03:30:11: = /usr/lib64/lustre/tests/sanityn.sh:4012:main() test 80b does not fail, but we see the following in the test_log: 16:02:42:== sanityn test 80b: Accessing directory during migration ============================================ 16:02:39 (1501516959) 16:02:42:start migration thread 11920 16:02:42:accessing the migrating directory for 5 minutes... 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:02:53:...10 seconds 16:02:53:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle … 16:03:41:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:41:...60 seconds 16:03:41:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:45:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:45:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:rm: cannot remove '/mnt/lustre2/d80b.sanityn/migrate_dir/link_file': Stale file handle 16:03:46:diff: /mnt/lustre2/d80b.sanityn/migrate_dir/file1: No such file or directory 16:03:46:access file1 fails 16:03:46:Resetting fail_loc on all nodes.../usr/lib64/lustre/tests/test-framework.sh: line 3146: 11920 Killed ( while true; do 16:03:46: mdt_idx=$((RANDOM % MDSCOUNT)); $LFS migrate -m $mdt_idx $migrate_dir1 &>/dev/null || rc=$?; [ $rc -ne 0 -o $rc -ne 16 ] || break; 16:03:46:done ) 16:03:46:CMD: trevis-3vm1.trevis.hpdd.intel.com,trevis-3vm2,trevis-3vm3,trevis-3vm4,trevis-3vm8 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null 16:03:46:done. We’ve seen this test suite failure with a Lustre setup with DNE: |
| Comments |