Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.5.3
    • CentOS
    • 3
    • 17260

    Description

      When test lustre_rsync combined with racer, some strange behavior come out.
      1. follw is the test case i use:

      1. Test 13 - lustre_rsync, use racer test suite
        test_13() {
        init_src
        init_changelog

      local rrc=0
      local rc=0
      local clients=$CLIENTS
      local RDIRS
      local i

      1. 1. init racer directories
        for d in ${RACERDIRS}; do
        is_mounted $d || continue

      RDIRS="$RDIRS $d/racer"
      mkdir -p $d/racer

      1. lfs setstripe $d/racer -c -1
        if [ $MDSCOUNT -ge 2 ]; then
        for i in $(seq $((MDSCOUNT - 1))); do
        RDIRS="$RDIRS $d/racer$i"
        if [ ! -e $d/racer$i ]; then
        $LFS mkdir -i $i $d/racer$i ||
        error "lfs mkdir $i failed"
        fi
        done
        fi
        done
      1. 2. racer start
        local rpids=""
        for rdir in $RDIRS; do
        do_nodes $clients "DURATION=$DURATION MDSCOUNT=$MDSCOUNT \
        $racer $rdir $NUM_RACER_THREADS" &
        pid=$!
        rpids="$rpids $pid"
        done
        for pid in $rpids; do
        wait $pid
        rc=$?
        echo "pid=$pid rc=$rc"
        if [ $rc != 0 ]; then
        rrc=$((rrc + 1))
        fi
        done
      2. 8. Replicate the changes to $TGT and TGT2
        $LRSYNC -s $DIR -t $TGT -t $TGT2 -m $MDT0 -u $CL_USER -l $LREPL_LOG \
        -D $LRSYNC_LOG $EXTRA_FLAGS
      1. 9. check difference
        check_diff $DIR $TGT
        check_diff $DIR $TGT2
        echo "check difference on target dir"
        sleep 120
        fini_changelog
        cleanup_src_tgt
        return 0
        }
        run_test 13 "lustre_rsync, use racer test suite"
        It will cause lustre_rsync run in a endless loop, and never come out.

      Attachments

        Activity

          [LU-6167] endless loop in lustre_rsync
          pjones Peter Jones added a comment -

          Landed for 2.7

          pjones Peter Jones added a comment - Landed for 2.7

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13545/
          Subject: LU-6167 utils: fix bugs in lustre_sync
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: dcf2a82d148797b4ac204a65ec795cde141e1d3b

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13545/ Subject: LU-6167 utils: fix bugs in lustre_sync Project: fs/lustre-release Branch: master Current Patch Set: Commit: dcf2a82d148797b4ac204a65ec795cde141e1d3b
          pjones Peter Jones added a comment -

          Yang Sheng

          Could you please take care of this patch?

          Thanks

          Peter

          pjones Peter Jones added a comment - Yang Sheng Could you please take care of this patch? Thanks Peter

          I think this patch can solve this problem, but really not sure the root cause.
          http://review.whamcloud.com/#/c/13545/

          gnlwlb wu libin (Inactive) added a comment - I think this patch can solve this problem, but really not sure the root cause. http://review.whamcloud.com/#/c/13545/

          Wu Libin (gnlwlb@gmail.com) uploaded a new patch: http://review.whamcloud.com/13545
          Subject: LU-6167 utils: fix bugs in lustre_sync
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: bb7871a24da289df3b0755508c3f307b21fdbdb0

          gerrit Gerrit Updater added a comment - Wu Libin (gnlwlb@gmail.com) uploaded a new patch: http://review.whamcloud.com/13545 Subject: LU-6167 utils: fix bugs in lustre_sync Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bb7871a24da289df3b0755508c3f307b21fdbdb0

          also core dump, the stack is like:
          bt
          #0 0x0000003566232925 in raise () from /lib64/libc.so.6
          #1 0x0000003566234105 in abort () from /lib64/libc.so.6
          #2 0x0000003566270837 in __libc_message () from /lib64/libc.so.6
          #3 0x0000003566276166 in malloc_printerr () from /lib64/libc.so.6
          #4 0x0000003566278f81 in _int_free () from /lib64/libc.so.6
          #5 0x0000000000404375 in lr_cascade_move (fid=0x250a630 "[0x200000401:0x37a:0x0]", dest=0x2511d20 "/home/target/racer/11/3/11", info=0x24e0340) at lustre_rsync.c:682
          #6 0x000000000040435a in lr_cascade_move (fid=0x250d960 "[0x200000400:0x366:0x0]", dest=0x2512e30 "/home/target/racer/11/3", info=0x24e0340) at lustre_rsync.c:677
          #7 0x000000000040435a in lr_cascade_move (fid=0x24e0454 "[0x200000400:0x37a:0x0]", dest=0x24e1755 "/home/target/racer/11", info=0x24e0340) at lustre_rsync.c:677
          #8 0x0000000000405369 in lr_move (info=0x24e0340) at lustre_rsync.c:964
          #9 0x0000000000406eb8 in lr_replicate () at lustre_rsync.c:1552
          #10 0x000000000040751b in main (argc=18, argv=<value optimized out>) at lustre_rsync.c:1776

          the attached file is the test script i used.

          gnlwlb wu libin (Inactive) added a comment - also core dump, the stack is like: bt #0 0x0000003566232925 in raise () from /lib64/libc.so.6 #1 0x0000003566234105 in abort () from /lib64/libc.so.6 #2 0x0000003566270837 in __libc_message () from /lib64/libc.so.6 #3 0x0000003566276166 in malloc_printerr () from /lib64/libc.so.6 #4 0x0000003566278f81 in _int_free () from /lib64/libc.so.6 #5 0x0000000000404375 in lr_cascade_move (fid=0x250a630 " [0x200000401:0x37a:0x0] ", dest=0x2511d20 "/home/target/racer/11/3/11", info=0x24e0340) at lustre_rsync.c:682 #6 0x000000000040435a in lr_cascade_move (fid=0x250d960 " [0x200000400:0x366:0x0] ", dest=0x2512e30 "/home/target/racer/11/3", info=0x24e0340) at lustre_rsync.c:677 #7 0x000000000040435a in lr_cascade_move (fid=0x24e0454 " [0x200000400:0x37a:0x0] ", dest=0x24e1755 "/home/target/racer/11", info=0x24e0340) at lustre_rsync.c:677 #8 0x0000000000405369 in lr_move (info=0x24e0340) at lustre_rsync.c:964 #9 0x0000000000406eb8 in lr_replicate () at lustre_rsync.c:1552 #10 0x000000000040751b in main (argc=18, argv=<value optimized out>) at lustre_rsync.c:1776 the attached file is the test script i used.

          The script i used to test.

          gnlwlb wu libin (Inactive) added a comment - The script i used to test.

          People

            ys Yang Sheng
            gnlwlb wu libin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: