[LU-6764] Test directory access during migration for DNE2 Created: 24/Jun/15  Updated: 22/Dec/15  Resolved: 24/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Task Priority: Blocker
Reporter: Richard Henwood (Inactive) Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-6858 Demonstrate DNE2 functionality Open
Rank (Obsolete): 9223372036854775807

 Description   

Access the directory during migration

Setup lustre with 4 MDTs, 4 OSTs and 2 clients.

Create 1 directory and some files under the directory

     
    mkdir /mnt/lustre/migrate_dir
    for F in {1,2,3,4,5}; do
        echo "$F$F$F$F$F" > /mnt/lustre/migrate_dir/file$F
    done

On one client, migrate the directory among 4 MDTs

     
while true; do
    mdt_idx=$((RANDOM % MDTCOUNT))
    lfs migration -m $mdt_idx /mnt/lustre/migrate_dir || break
done
 
echo "migrate directory failed"
return 1

Simultaneously, on another client access these files under the migrating directory

     
while true; do
    N=$((N + 1 % 5))
    stat /mnt/lustre/migrate_dir/file1 > /dev/null || break
    cat /mnt/lustre/migrate_dir/file2 > /dev/null || break
    > /mnt/lustre/migrate_dir/file3 > /dev/null || break
    echo "aaaaa" > /mnt/lustre/migrate_dir/file4 > /dev/null || break
    stat /mnt/lustre/migrate_dir/file5 > /dev/null || break
done
 
echo "access migrating files failed"
return 1

Steps 3 and 4 should keep running at least 5 minutes and will not return error.



 Comments   
Comment by Richard Henwood (Inactive) [ 24/Jun/15 ]

See patch of the test
http://review.whamcloud.com/#/c/14497/8/lustre/tests/sanityn.sh test_80b
See test_80b in
https://testing.hpdd.intel.com/test_logs/7b3125e8-17f6-11e5-89cc-5254006e85
c2/show_text

== sanityn test 80b: Accessing directory during migration == 04:32:18
(1434861138)
start migration thread 2958
accessing the migrating directory for 5 minutes...
...10 seconds
...20 seconds
...30 seconds
...40 seconds
...50 seconds
...60 seconds
...70 seconds
...80 seconds
...90 seconds
...100 seconds
...110 seconds
...120 seconds
...130 seconds
...140 seconds
...150 seconds
...160 seconds
...170 seconds
...180 seconds
...190 seconds
...200 seconds
...210 seconds
...220 seconds
...230 seconds
...240 seconds
...250 seconds
...260 seconds
...270 seconds
...280 seconds
...290 seconds
...300 seconds
Resetting fail_loc on all
nodes.../usr/lib64/lustre/tests/test-framework.sh: line 2969:  2958 Killed
                 ( while true; do
    mdt_idx=$((RANDOM % MDSCOUNT)); $LFS mv -M $mdt_idx $migrate_dir1 2
&>/dev/null || rc=$?; [ $rc -ne 0 -o $rc -ne 16 ] || break;
done )
CMD: 
shadow-17vm10.shadow.whamcloud.com,shadow-17vm11,shadow-17vm12,shadow-17vm8
,shadow-17vm9 lctl set_param -n fail_loc=0          fail_val=0 2>/dev/null ||
true
done.
CMD: 
shadow-17vm10.shadow.whamcloud.com,shadow-17vm11,shadow-17vm12,shadow-17vm8
,shadow-17vm9 rc=0;
val=\$(/usr/sbin/lctl get_param -n catastrophe 2>&1);
if [[ \$? -eq 0 && \$val -ne 0 ]]; then
        echo \$(hostname -s): \$val;
        rc=\$val;
fi;
exit \$rc
PASS 80b (300s)
Generated at Sat Feb 10 02:03:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.