Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3886

sanity test_56a: @@@@@@ FAIL: /usr/bin/lfs getstripe --obd wrong: found 6, expected 3

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.5.0
    • None
    • 3
    • 10111

    Description

      This problem is similar with LU-3846 and LU-3858. The test suit should wait for a few seconds after it clear the stripe of the directory. Otherwise, the newly created entries under the directory will have 2 stripe counts rather than 1.

      Attachments

        Issue Links

          Activity

            [LU-3886] sanity test_56a: @@@@@@ FAIL: /usr/bin/lfs getstripe --obd wrong: found 6, expected 3

            Haven't seen this in a long time.

            adilger Andreas Dilger added a comment - Haven't seen this in a long time.

            I've hit this problem with lustre-master tag 2.6.92. Results at https://testing.hpdd.intel.com/test_sets/37e63f92-9f0d-11e4-91b3-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - I've hit this problem with lustre-master tag 2.6.92. Results at https://testing.hpdd.intel.com/test_sets/37e63f92-9f0d-11e4-91b3-5254006e85c2

            Yeah, I hit the problem of 'found 6, expected 3' every time when I run sanity.sh.

            lixi Li Xi (Inactive) added a comment - Yeah, I hit the problem of 'found 6, expected 3' every time when I run sanity.sh.
            emoly.liu Emoly Liu added a comment -

            The problem you found by run.sh is probably related to the following code:

            When we set stripe for root(mount point), set_default is enabled in ll_dir_ioctl()

                    case LL_IOC_LOV_SETSTRIPE: {
            ...
                            int set_default = 0;
            ...
                            if (inode->i_sb->s_root == file->f_dentry)
                                    set_default = 1;
            
                            /* in v1 and v3 cases lumv1 points to data */
                            rc = ll_dir_setstripe(inode, lumv1, set_default);
            

            Then, in ll_dir_setstripe() if set_default=1, we will call ll_send_mgc_param() to set information asynchronously.

                    if (set_default && mgc->u.cli.cl_mgc_mgsexp) {
                            /* Set root stripesize */
                            /* Set root stripecount */
                            /* Set root stripeoffset */
                    }
            

            Since you run setstripe very frequently and many times in run.sh, the config log queue might be very long (bottleneck), and mgs will take more time to process it.

            BTW, can you hit this problem if you don't use run.sh, just run sanity.sh regularly?

            emoly.liu Emoly Liu added a comment - The problem you found by run.sh is probably related to the following code: When we set stripe for root(mount point), set_default is enabled in ll_dir_ioctl() case LL_IOC_LOV_SETSTRIPE: { ... int set_default = 0; ... if (inode->i_sb->s_root == file->f_dentry) set_default = 1; /* in v1 and v3 cases lumv1 points to data */ rc = ll_dir_setstripe(inode, lumv1, set_default); Then, in ll_dir_setstripe() if set_default=1, we will call ll_send_mgc_param() to set information asynchronously. if (set_default && mgc->u.cli.cl_mgc_mgsexp) { /* Set root stripesize */ /* Set root stripecount */ /* Set root stripeoffset */ } Since you run setstripe very frequently and many times in run.sh, the config log queue might be very long (bottleneck), and mgs will take more time to process it. BTW, can you hit this problem if you don't use run.sh, just run sanity.sh regularly?
            emoly.liu Emoly Liu added a comment -

            Yes, this time I hit that. I will investigate it.

            emoly.liu Emoly Liu added a comment - Yes, this time I hit that. I will investigate it.

            Oh, sorry, plase run on lustre mount point '/mnt/lustre' rather than its directory '/mnt/lustre/dir', i.e. sh run.sh /mnt/lustre/

            I got following output:

            No error after 640 iters
            -1 != 1
            Does not become correct after 0 seconds
            Does not become correct after 1 seconds
            Does not become correct after 2 seconds
            Does not become correct after 3 seconds
            Does not become correct after 4 seconds
            Does not become correct after 5 seconds
            Does not become correct after 6 seconds
            Become correct after 7 seconds

            lixi Li Xi (Inactive) added a comment - Oh, sorry, plase run on lustre mount point '/mnt/lustre' rather than its directory '/mnt/lustre/dir', i.e. sh run.sh /mnt/lustre/ I got following output: No error after 640 iters -1 != 1 Does not become correct after 0 seconds Does not become correct after 1 seconds Does not become correct after 2 seconds Does not become correct after 3 seconds Does not become correct after 4 seconds Does not become correct after 5 seconds Does not become correct after 6 seconds Become correct after 7 seconds

            People

              emoly.liu Emoly Liu
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: