Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6764

Test directory access during migration for DNE2

Details

    • Task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • None
    • None
    • 9223372036854775807

    Description

      Access the directory during migration

      Setup lustre with 4 MDTs, 4 OSTs and 2 clients.

      Create 1 directory and some files under the directory

           
          mkdir /mnt/lustre/migrate_dir
          for F in {1,2,3,4,5}; do
              echo "$F$F$F$F$F" > /mnt/lustre/migrate_dir/file$F
          done
      

      On one client, migrate the directory among 4 MDTs

           
      while true; do
          mdt_idx=$((RANDOM % MDTCOUNT))
          lfs migration -m $mdt_idx /mnt/lustre/migrate_dir || break
      done
       
      echo "migrate directory failed"
      return 1
      

      Simultaneously, on another client access these files under the migrating directory

           
      while true; do
          N=$((N + 1 % 5))
          stat /mnt/lustre/migrate_dir/file1 > /dev/null || break
          cat /mnt/lustre/migrate_dir/file2 > /dev/null || break
          > /mnt/lustre/migrate_dir/file3 > /dev/null || break
          echo "aaaaa" > /mnt/lustre/migrate_dir/file4 > /dev/null || break
          stat /mnt/lustre/migrate_dir/file5 > /dev/null || break
      done
       
      echo "access migrating files failed"
      return 1
      

      Steps 3 and 4 should keep running at least 5 minutes and will not return error.

      Attachments

        Issue Links

          Activity

            [LU-6764] Test directory access during migration for DNE2

            See patch of the test
            http://review.whamcloud.com/#/c/14497/8/lustre/tests/sanityn.sh test_80b
            See test_80b in
            https://testing.hpdd.intel.com/test_logs/7b3125e8-17f6-11e5-89cc-5254006e85
            c2/show_text

            == sanityn test 80b: Accessing directory during migration == 04:32:18
            (1434861138)
            start migration thread 2958
            accessing the migrating directory for 5 minutes...
            ...10 seconds
            ...20 seconds
            ...30 seconds
            ...40 seconds
            ...50 seconds
            ...60 seconds
            ...70 seconds
            ...80 seconds
            ...90 seconds
            ...100 seconds
            ...110 seconds
            ...120 seconds
            ...130 seconds
            ...140 seconds
            ...150 seconds
            ...160 seconds
            ...170 seconds
            ...180 seconds
            ...190 seconds
            ...200 seconds
            ...210 seconds
            ...220 seconds
            ...230 seconds
            ...240 seconds
            ...250 seconds
            ...260 seconds
            ...270 seconds
            ...280 seconds
            ...290 seconds
            ...300 seconds
            Resetting fail_loc on all
            nodes.../usr/lib64/lustre/tests/test-framework.sh: line 2969:  2958 Killed
                             ( while true; do
                mdt_idx=$((RANDOM % MDSCOUNT)); $LFS mv -M $mdt_idx $migrate_dir1 2
            &>/dev/null || rc=$?; [ $rc -ne 0 -o $rc -ne 16 ] || break;
            done )
            CMD: 
            shadow-17vm10.shadow.whamcloud.com,shadow-17vm11,shadow-17vm12,shadow-17vm8
            ,shadow-17vm9 lctl set_param -n fail_loc=0          fail_val=0 2>/dev/null ||
            true
            done.
            CMD: 
            shadow-17vm10.shadow.whamcloud.com,shadow-17vm11,shadow-17vm12,shadow-17vm8
            ,shadow-17vm9 rc=0;
            val=\$(/usr/sbin/lctl get_param -n catastrophe 2>&1);
            if [[ \$? -eq 0 && \$val -ne 0 ]]; then
                    echo \$(hostname -s): \$val;
                    rc=\$val;
            fi;
            exit \$rc
            PASS 80b (300s)
            
            rhenwood Richard Henwood (Inactive) added a comment - See patch of the test http://review.whamcloud.com/#/c/14497/8/lustre/tests/sanityn.sh test_80b See test_80b in https://testing.hpdd.intel.com/test_logs/7b3125e8-17f6-11e5-89cc-5254006e85 c2/show_text == sanityn test 80b: Accessing directory during migration == 04:32:18 (1434861138) start migration thread 2958 accessing the migrating directory for 5 minutes... ...10 seconds ...20 seconds ...30 seconds ...40 seconds ...50 seconds ...60 seconds ...70 seconds ...80 seconds ...90 seconds ...100 seconds ...110 seconds ...120 seconds ...130 seconds ...140 seconds ...150 seconds ...160 seconds ...170 seconds ...180 seconds ...190 seconds ...200 seconds ...210 seconds ...220 seconds ...230 seconds ...240 seconds ...250 seconds ...260 seconds ...270 seconds ...280 seconds ...290 seconds ...300 seconds Resetting fail_loc on all nodes.../usr/lib64/lustre/tests/test-framework.sh: line 2969: 2958 Killed ( while true; do mdt_idx=$((RANDOM % MDSCOUNT)); $LFS mv -M $mdt_idx $migrate_dir1 2 &>/dev/null || rc=$?; [ $rc -ne 0 -o $rc -ne 16 ] || break; done ) CMD: shadow-17vm10.shadow.whamcloud.com,shadow-17vm11,shadow-17vm12,shadow-17vm8 ,shadow-17vm9 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true done. CMD: shadow-17vm10.shadow.whamcloud.com,shadow-17vm11,shadow-17vm12,shadow-17vm8 ,shadow-17vm9 rc=0; val=\$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ \$? -eq 0 && \$val -ne 0 ]]; then echo \$(hostname -s): \$val; rc=\$val; fi; exit \$rc PASS 80b (300s)

            People

              di.wang Di Wang
              rhenwood Richard Henwood (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: