Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8999

sanity-quota test_38: skipped id entries

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0, Lustre 2.10.7
    • Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2
    • None
    • RHEL 7.3 Server/Client
      master, build# 3486
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d6ff8e38-d464-11e6-ace4-5254006e85c2.

      The sub-test test_38 failed with the following error:

      skipped id entries
      

      test_logs:

      == sanity-quota test 38: Quota accounting iterator doesn't skip id entries =========================== 05:03:51 (1483707831)
      Waiting for local destroys to complete
      Creating test directory
      CMD: onyx-35vm3,onyx-35vm4 lctl set_param fail_val=0 fail_loc=0
      fail_val=0
      fail_loc=0
      fail_val=0
      fail_loc=0
      Create 10000 files...
      CMD: onyx-35vm3 lctl set_param -n osd*.*MDT*.force_sync=1
      CMD: onyx-35vm4 lctl set_param -n osd*.*OS*.force_sync=1
      CMD: onyx-35vm3 /usr/sbin/lctl get_param osd-ldiskfs.lustre-MDT0000.quota_slave.acct_user
      Found 10010 id entries
       sanity-quota test_38: @@@@@@ FAIL: skipped id entries
      

      Attachments

        Issue Links

          Activity

            [LU-8999] sanity-quota test_38: skipped id entries

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32567/
            Subject: LU-8999 quota: fix issue of multiple call of seq start
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: cf796725fd39f0c807f60a86d2ddaa22e88c3a8c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32567/ Subject: LU-8999 quota: fix issue of multiple call of seq start Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: cf796725fd39f0c807f60a86d2ddaa22e88c3a8c
            pjones Peter Jones added a comment -

            Should not have been reopened for ticket affecting LTS branch

            pjones Peter Jones added a comment - Should not have been reopened for ticket affecting LTS branch

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/32568
            Subject: LU-8999 quota: fix issue of multiple call of seq start
            Project: fs/lustre-release
            Branch: b2_11
            Current Patch Set: 1
            Commit: 4822e044ca9296d4083b86c3667517fb0b2744c7

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/32568 Subject: LU-8999 quota: fix issue of multiple call of seq start Project: fs/lustre-release Branch: b2_11 Current Patch Set: 1 Commit: 4822e044ca9296d4083b86c3667517fb0b2744c7

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/32567
            Subject: LU-8999 quota: fix issue of multiple call of seq start
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: a9f70b640a443eb286848736f0ad1ed183c3a702

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/32567 Subject: LU-8999 quota: fix issue of multiple call of seq start Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: a9f70b640a443eb286848736f0ad1ed183c3a702
            cliffw Cliff White (Inactive) added a comment - We are seeing this again on 2.10.4 https://testing.hpdd.intel.com/test_sets/e1084166-599e-11e8-b9d3-52540065bddc
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31721/
            Subject: LU-8999 quota: fix issue of multiple call of seq start
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 07fdba2244e748e1721232abc52617209a544b87

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31721/ Subject: LU-8999 quota: fix issue of multiple call of seq start Project: fs/lustre-release Branch: master Current Patch Set: Commit: 07fdba2244e748e1721232abc52617209a544b87

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31721
            Subject: LU-8999 quota: fix issue of multiple call of seq start
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 24c196304b1e92279ee11b3eb9e0b4f49c11991c

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31721 Subject: LU-8999 quota: fix issue of multiple call of seq start Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 24c196304b1e92279ee11b3eb9e0b4f49c11991c

            I have managed to reproduce this issue in my local VMs, and there are two problems to cause it.

            1. the problem caused by multiple calls of the "lprocfs_quota_seq_start" of the seq_operations
            During the reading of the sequence file, "seq_read" will be called multiple times to read the content,
            and start/stop operation will be called in each call of "seq_read", but the start operation will cause
            the following problem,
            Assume the last stop operation save the current read position at 2000 (qid), the saved block and index is
            [1, 2, 3, 100, 200] and [0, 0, 0, 0, 20, 50] (the level of the quota file is 4), then in the next start operation,
            "osd_it_acct_load" is called to resume the iteration, but it will change the block and index to
            [1, 2, 3, 150, 200] and [0, 0, 0, 30, 50], that is because the last block storing the quota information will be used
            sequentially (will look for the first empty slot when a new quota needs to be inserted).

            2. there is a problem in "walk_tree_dqentry"

            int walk_tree_dqentry(const struct lu_env *env, struct osd_object *obj,
                                  int type, uint blk, int depth, uint index,
                                  struct osd_it_quota *it)
            {       
                    dqbuf_t  buf = getdqbuf();
                    loff_t   ret;
                    u32     *ref = (u32 *) buf;
                    
                    ENTRY;
            
                    if (!buf)
                            RETURN(-ENOMEM);
                    ret = quota_read_blk(env, obj, type, blk, buf);
                    if (ret < 0) {
                            CERROR("Can't read quota tree block %u.\n", blk);
                            goto out_buf;
                    }
                    ret = 1;
                            
                    for (; index <= 0xff && ret > 0; index++) {
                            blk = le32_to_cpu(ref[index]);
                            if (!blk)       /* No reference */
                                    continue;
            
                            if (depth < LUSTRE_DQTREEDEPTH - 1)
                                    ret = walk_tree_dqentry(env, obj, type, blk,
                                                            depth + 1, 0, it);
                            else
                                    ret = walk_block_dqentry(env, obj, type, blk, 0, it);   <--- here, if ret == 0, the "index++" in the "for" clause
                                                                                                                      <--- will still be called and cause one entry is skipped.
                    }
                    it->oiq_blk[depth + 1] = blk;
                    it->oiq_index[depth] = index;
            
            out_buf:
                    freedqbuf(buf);
                    RETURN(ret);
            }
            
            hongchao.zhang Hongchao Zhang added a comment - I have managed to reproduce this issue in my local VMs, and there are two problems to cause it. 1. the problem caused by multiple calls of the "lprocfs_quota_seq_start" of the seq_operations During the reading of the sequence file, "seq_read" will be called multiple times to read the content, and start/stop operation will be called in each call of "seq_read", but the start operation will cause the following problem, Assume the last stop operation save the current read position at 2000 (qid), the saved block and index is [1, 2, 3, 100, 200] and [0, 0, 0, 0, 20, 50] (the level of the quota file is 4), then in the next start operation, "osd_it_acct_load" is called to resume the iteration, but it will change the block and index to [1, 2, 3, 150, 200] and [0, 0, 0, 30, 50] , that is because the last block storing the quota information will be used sequentially (will look for the first empty slot when a new quota needs to be inserted). 2. there is a problem in "walk_tree_dqentry" int walk_tree_dqentry(const struct lu_env *env, struct osd_object *obj, int type, uint blk, int depth, uint index, struct osd_it_quota *it) { dqbuf_t buf = getdqbuf(); loff_t ret; u32 *ref = (u32 *) buf; ENTRY; if (!buf) RETURN(-ENOMEM); ret = quota_read_blk(env, obj, type, blk, buf); if (ret < 0) { CERROR("Can't read quota tree block %u.\n", blk); goto out_buf; } ret = 1; for (; index <= 0xff && ret > 0; index++) { blk = le32_to_cpu(ref[index]); if (!blk) /* No reference */ continue; if (depth < LUSTRE_DQTREEDEPTH - 1) ret = walk_tree_dqentry(env, obj, type, blk, depth + 1, 0, it); else ret = walk_block_dqentry(env, obj, type, blk, 0, it); <--- here, if ret == 0, the "index++" in the "for" clause <--- will still be called and cause one entry is skipped. } it->oiq_blk[depth + 1] = blk; it->oiq_index[depth] = index; out_buf: freedqbuf(buf); RETURN(ret); }

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30896/
            Subject: LU-8999 test: ignore unrelated quota id
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: b5073eb0234e02892d22c5dead4a3a8a3da065bd

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30896/ Subject: LU-8999 test: ignore unrelated quota id Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: b5073eb0234e02892d22c5dead4a3a8a3da065bd

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: