Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3886

sanity test_56a: @@@@@@ FAIL: /usr/bin/lfs getstripe --obd wrong: found 6, expected 3

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.5.0
    • None
    • 3
    • 10111

    Description

      This problem is similar with LU-3846 and LU-3858. The test suit should wait for a few seconds after it clear the stripe of the directory. Otherwise, the newly created entries under the directory will have 2 stripe counts rather than 1.

      Attachments

        Issue Links

          Activity

            [LU-3886] sanity test_56a: @@@@@@ FAIL: /usr/bin/lfs getstripe --obd wrong: found 6, expected 3

            Oh, sorry, plase run on lustre mount point '/mnt/lustre' rather than its directory '/mnt/lustre/dir', i.e. sh run.sh /mnt/lustre/

            I got following output:

            No error after 640 iters
            -1 != 1
            Does not become correct after 0 seconds
            Does not become correct after 1 seconds
            Does not become correct after 2 seconds
            Does not become correct after 3 seconds
            Does not become correct after 4 seconds
            Does not become correct after 5 seconds
            Does not become correct after 6 seconds
            Become correct after 7 seconds

            lixi Li Xi (Inactive) added a comment - Oh, sorry, plase run on lustre mount point '/mnt/lustre' rather than its directory '/mnt/lustre/dir', i.e. sh run.sh /mnt/lustre/ I got following output: No error after 640 iters -1 != 1 Does not become correct after 0 seconds Does not become correct after 1 seconds Does not become correct after 2 seconds Does not become correct after 3 seconds Does not become correct after 4 seconds Does not become correct after 5 seconds Does not become correct after 6 seconds Become correct after 7 seconds
            emoly.liu Emoly Liu added a comment -

            I just ran the script of "https://jira.hpdd.intel.com/secure/attachment/13414/run.sh" on my local VM. It showed me 10000 times "No errors after xxx iters".

            My step is:
            1. mount lustre
            2. mkdir /mnt/lustre/d
            3. sh run.sh /mnt/lustre/d

            I tried several times, no error happened.

            emoly.liu Emoly Liu added a comment - I just ran the script of "https://jira.hpdd.intel.com/secure/attachment/13414/run.sh" on my local VM. It showed me 10000 times "No errors after xxx iters". My step is: 1. mount lustre 2. mkdir /mnt/lustre/d 3. sh run.sh /mnt/lustre/d I tried several times, no error happened.

            What is interesting is that when I add a sleep into the test suit, the problem is gone. That makes me believe that the problem is similar with LU-3858. Emoly, would you please check the attachment of LU-3858 first? I.e. https://jira.hpdd.intel.com/secure/attachment/13414/run.sh. It shows that the effect of default stripe is delayed. Thanks!

            test_56a() { # was test_56
            rm -rf $DIR/$tdir
            $SETSTRIPE -d $DIR
            test_mkdir -p $DIR/$tdir/dir
            NUMFILES=3
            NUMFILESx2=$(($NUMFILES * 2))
            sleep 10 # This will fix the problem.
            for i in `seq 1 $NUMFILES` ; do
            touch $DIR/$tdir/file$i
            touch $DIR/$tdir/dir/file$i
            done
            ......

            lixi Li Xi (Inactive) added a comment - What is interesting is that when I add a sleep into the test suit, the problem is gone. That makes me believe that the problem is similar with LU-3858 . Emoly, would you please check the attachment of LU-3858 first? I.e. https://jira.hpdd.intel.com/secure/attachment/13414/run.sh . It shows that the effect of default stripe is delayed. Thanks! test_56a() { # was test_56 rm -rf $DIR/$tdir $SETSTRIPE -d $DIR test_mkdir -p $DIR/$tdir/dir NUMFILES=3 NUMFILESx2=$(($NUMFILES * 2)) sleep 10 # This will fix the problem. for i in `seq 1 $NUMFILES` ; do touch $DIR/$tdir/file$i touch $DIR/$tdir/dir/file$i done ......
            lixi Li Xi (Inactive) added a comment - - edited

            I've post the script (run-sanity.sh) to hit this problem (and LU-3858, LU-3846). It will hit the problem every time it runs.

            lixi Li Xi (Inactive) added a comment - - edited I've post the script (run-sanity.sh) to hit this problem (and LU-3858 , LU-3846 ). It will hit the problem every time it runs.
            emoly.liu Emoly Liu added a comment -

            "lfs setstripe -d" should be enough to clear directory striping information. LiXi, could you tell me how you hit this problem?

            IMO, we can add "lfs getstripe -v" after "setstripe -d" to print that striping information, and see if this problem will happen again.

            emoly.liu Emoly Liu added a comment - "lfs setstripe -d" should be enough to clear directory striping information. LiXi, could you tell me how you hit this problem? IMO, we can add "lfs getstripe -v" after "setstripe -d" to print that striping information, and see if this problem will happen again.
            pjones Peter Jones added a comment -

            Emoly

            Could you please comment on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please comment on this one? Thanks Peter

            People

              emoly.liu Emoly Liu
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: