[LU-15977] sanityn test_80b: migration stopped 2 Created: 28/Jun/22 Updated: 19/Dec/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Li Xi <pkuelelixi@gmail.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/06a33583-a064-42b7-8610-d0ee29e60d1e test_80b failed with the following error: == sanityn test 80b: Accessing directory during migration ========================================================== 07:06:08 (1656313568) start migration thread 777660 accessing the migrating directory for 5 minutes... touch file failed with 0 /usr/lib64/lustre/tests/sanityn.sh: line 4770: kill: (777660) - No such process sanityn test_80b: @@@@@@ FAIL: migration stopped 2 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6522:error() = /usr/lib64/lustre/tests/sanityn.sh:4770:test_80b() = /usr/lib64/lustre/tests/test-framework.sh:6857:run_one() = /usr/lib64/lustre/tests/test-framework.sh:6904:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:6745:run_test() = /usr/lib64/lustre/tests/sanityn.sh:4773:main() Dumping lctl log to /autotest/autotest-2/2022-06-27/lustre-reviews_review-dne-part-5_88105_1_7_01a12c8c-47bf-4fb1-9cfa-e97b369d8874//sanityn.test_80b.*.1656313569.log CMD: trevis-103vm1.trevis.whamcloud.com,trevis-103vm2,trevis-103vm3,trevis-103vm4,trevis-103vm5 /usr/sbin/lctl dk > /autotest/autotest-2/2022-06-27/lustre-reviews_review-dne-part-5_88105_1_7_01a12c8c-47bf-4fb1-9cfa-e97b369d8874//sanityn.test_80b.debug_log.\$(hostname -s).1656313569.log; dmesg > /autotest/autotest-2/2022-06-27/lustre-reviews_review-dne-part-5_88105_1_7_01a12c8c-47bf-4fb1-9cfa-e97b369d8874//sanityn.test_80b.dmesg.\$(hostname -s).1656313569.log /usr/lib64/lustre/tests/sanityn.sh: line 4656: kill: (777660) - No such process VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Alex Zhuravlev [ 21/Apr/23 ] |
|
the test itself is not quite correct AFAICS - it's supposed to run for 5 minutes, but all runs in Maloo stops in 3-4 seconds still reporting success. |
| Comment by Etienne Aujames [ 01/Dec/23 ] |
|
The test is not working on b2_15: #migrate the directories among MDTs ( while true; do mdt_idx=$((RANDOM % MDSCOUNT)) $LFS migrate -m $mdt_idx $migrate_dir1 &>/dev/null || rc=$? (( $rc != 0 && $rc != 16 )) || break done ) & migrate_pid=$! echo "start migration thread $migrate_pid" #Access the files at the same time start_time=$SECONDS echo "accessing the migrating directory for 5 minutes..." The migration process always exits at the first iteration, the test is supposed to run 5 min. #migrate the directories among MDTs ( while true; do mdt_idx=$((RANDOM % MDSCOUNT)) $LFS migrate -m $mdt_idx $migrate_dir1 &>/dev/null || rc=$? (( $rc == 0 || $rc == 16 )) || break done ) & Also, the test always returns with success: echo "aaaaa" > $migrate_dir2/file4 > /dev/null || { echo "access file4 fails" break } This should be: echo "aaaaa" > $migrate_dir2/file4 > /dev/null || error "access file4 fails" The test completely changed on master with the: https://review.whamcloud.com/40891 (" |