[LU-8999] sanity-quota test_38: skipped id entries Created: 10/Jan/17  Updated: 21/Jan/19  Resolved: 08/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2
Fix Version/s: Lustre 2.12.0, Lustre 2.10.7

Type: Bug Priority: Major
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 7.3 Server/Client
master, build# 3486


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d6ff8e38-d464-11e6-ace4-5254006e85c2.

The sub-test test_38 failed with the following error:

skipped id entries

test_logs:

== sanity-quota test 38: Quota accounting iterator doesn't skip id entries =========================== 05:03:51 (1483707831)
Waiting for local destroys to complete
Creating test directory
CMD: onyx-35vm3,onyx-35vm4 lctl set_param fail_val=0 fail_loc=0
fail_val=0
fail_loc=0
fail_val=0
fail_loc=0
Create 10000 files...
CMD: onyx-35vm3 lctl set_param -n osd*.*MDT*.force_sync=1
CMD: onyx-35vm4 lctl set_param -n osd*.*OS*.force_sync=1
CMD: onyx-35vm3 /usr/sbin/lctl get_param osd-ldiskfs.lustre-MDT0000.quota_slave.acct_user
Found 10010 id entries
 sanity-quota test_38: @@@@@@ FAIL: skipped id entries


 Comments   
Comment by Nathaniel Clark [ 02/Feb/17 ]

Another failure on master:

https://testing.hpdd.intel.com/test_sets/0fa1f43e-e8c8-11e6-935d-5254006e85c2

Comment by James Casper [ 24/May/17 ]

2.9.57, b3575:
https://testing.hpdd.intel.com/test_sessions/2209839b-42d4-4fe6-91f1-96f9ce3c5a69

Comment by Gerrit Updater [ 04/Aug/17 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/28345
Subject: LU-8999 quota: fix quota iteration interface
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fa22284aed4ffdfc9d56360a2aa11f8bcb70b109

Comment by Gerrit Updater [ 14/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28530
Subject: LU-8999 quota: fix quota iteration interface
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 7b65a74570a9c9c047c4165667a55c7ae9c85a54

Comment by Gerrit Updater [ 17/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28345/
Subject: LU-8999 quota: fix quota iteration interface
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e431666d6636ce5ff3637592188fc96bcf695a31

Comment by Peter Jones [ 17/Aug/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 18/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28530/
Subject: LU-8999 quota: fix quota iteration interface
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: c9df32207c47f71f9840ed2e68cd23612b205ffb

Comment by Sarah Liu [ 19/Sep/17 ]

still hit this problem on b2_10 2.10.1 RC1 testing RHEL7.4 zfs
https://testing.hpdd.intel.com/test_sets/fff7634e-9c9e-11e7-ba27-5254006e85c2

another one on 2.10.1 after the patch landed
https://testing.hpdd.intel.com/test_sets/c81ce8c8-9c66-11e7-ba27-5254006e85c2

Comment by Jian Yu [ 22/Sep/17 ]

The original failure occurred in ldiskfs test session, while the patch fixed ZFS related codes.
The failure still exists on master branch, here is one more failure instance in ldiskfs test session:
https://testing.hpdd.intel.com/test_sets/3eff0f0a-9ef8-11e7-b778-5254006e85c2

Comment by James Nunez (Inactive) [ 25/Oct/17 ]

Hongchao -

Would you please comment on this issue?

Thank you.

Comment by Hongchao Zhang [ 16/Nov/17 ]

status update:
I have investigated the related logs and code lines, but no progress yet. will update its status once there is any progress.

Comment by Gerrit Updater [ 17/Nov/17 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/30145
Subject: LU-8999 test: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b122751e61f678b807fa87e686b90b27969cd8b2

Comment by Gerrit Updater [ 24/Nov/17 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/30243
Subject: LU-8999 test: check chown result
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e6da5d9b7c257dd24314feae85f86692ce8d9203

Comment by Gerrit Updater [ 17/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30243/
Subject: LU-8999 test: check chown result
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 817b367707d0d4f5c406b4f18cffceea2aae8406

Comment by Peter Jones [ 17/Dec/17 ]

It looks to me like a correction for the test has landed and I expect the debug patch will now be abandoned.

Comment by Gerrit Updater [ 21/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30631
Subject: LU-8999 test: check chown result
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 34983c6f01c974f10ba5b213a60b4a6ddf8ced88

Comment by Gerrit Updater [ 04/Jan/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30631/
Subject: LU-8999 test: check chown result
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 3a0fc1a8d65d4796a3da895006dcc5012e9e1b3f

Comment by Minh Diep [ 04/Jan/18 ]

doesn't look like this is fixed
https://testing.hpdd.intel.com/test_sets/8ac15c2e-f1a3-11e7-8c23-52540065bddc

Comment by Hongchao Zhang [ 05/Jan/18 ]

I have checked the log, there are 13 extra quota id, which could be left by other tests, but the quota id 0 ~ 9999 required by
this test was there and no problem.

[root@zhanghc ~]#grep "id:" quota_ids |awk '{print $3}'|sort -n
0
1
...
9998
9999
10273
14421
18394
19745
20974
23221
23332
25594
27020
29911
30457
30979
31362
- id:      10273
  usage:   { inodes:                    1, kbytes:                    4 }
- id:      14421
  usage:   { inodes:                    1, kbytes:                    4 }
- id:      18394
  usage:   { inodes:                    1, kbytes:                    0 }
- id:      19745
  usage:   { inodes:                    1, kbytes:                    4 }
- id:      20974
  usage:   { inodes:                    1, kbytes:                    4 }
- id:      23221
  usage:   { inodes:                    1, kbytes:                    0 }
- id:      23332
  usage:   { inodes:                    1, kbytes:                    0 }
- id:      25594
  usage:   { inodes:                    1, kbytes:                    4 }
- id:      27020
  usage:   { inodes:                    1, kbytes:                    0 }
- id:      29911
  usage:   { inodes:                    1, kbytes:                    0 }
- id:      30457
  usage:   { inodes:                    1, kbytes:                    0 }
- id:      30979
  usage:   { inodes:                    1, kbytes:                    4 }
- id:      31362
  usage:   { inodes:                    1, kbytes:                    0 }
Comment by Gerrit Updater [ 05/Jan/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/30730
Subject: LU-8999 test: ignore unrelated quota id
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e0d615954ba8eb1303e20296c164c30fca5e3ebe

Comment by Gerrit Updater [ 14/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30730/
Subject: LU-8999 test: ignore unrelated quota id
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8a0a9190cd0a68a88e42210e9c3dac324319fa6d

Comment by Peter Jones [ 14/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 17/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30896
Subject: LU-8999 test: ignore unrelated quota id
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 01a8b2e0d8afa058e62cac1d24c8b520b81f9671

Comment by Saurabh Tandan (Inactive) [ 31/Jan/18 ]

Reopening the issue as it still occurs for 2.10.57 "SLES 12 SP3 Server/DNE/ldiskfs SLES 12 SP3 Client" config.
https://testing.hpdd.intel.com/test_sets/acaba252-fd4e-11e7-a7cd-52540065bddc
Looks like it didn't found enough id entries

Comment by Gerrit Updater [ 31/Jan/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31098
Subject: LU-8999 test: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 30f1ac0972d068350cb1cd10146f596bf14814df

Comment by Gerrit Updater [ 09/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30896/
Subject: LU-8999 test: ignore unrelated quota id
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: b5073eb0234e02892d22c5dead4a3a8a3da065bd

Comment by Hongchao Zhang [ 22/Mar/18 ]

I have managed to reproduce this issue in my local VMs, and there are two problems to cause it.

1. the problem caused by multiple calls of the "lprocfs_quota_seq_start" of the seq_operations
During the reading of the sequence file, "seq_read" will be called multiple times to read the content,
and start/stop operation will be called in each call of "seq_read", but the start operation will cause
the following problem,
Assume the last stop operation save the current read position at 2000 (qid), the saved block and index is
[1, 2, 3, 100, 200] and [0, 0, 0, 0, 20, 50] (the level of the quota file is 4), then in the next start operation,
"osd_it_acct_load" is called to resume the iteration, but it will change the block and index to
[1, 2, 3, 150, 200] and [0, 0, 0, 30, 50], that is because the last block storing the quota information will be used
sequentially (will look for the first empty slot when a new quota needs to be inserted).

2. there is a problem in "walk_tree_dqentry"

int walk_tree_dqentry(const struct lu_env *env, struct osd_object *obj,
                      int type, uint blk, int depth, uint index,
                      struct osd_it_quota *it)
{       
        dqbuf_t  buf = getdqbuf();
        loff_t   ret;
        u32     *ref = (u32 *) buf;
        
        ENTRY;

        if (!buf)
                RETURN(-ENOMEM);
        ret = quota_read_blk(env, obj, type, blk, buf);
        if (ret < 0) {
                CERROR("Can't read quota tree block %u.\n", blk);
                goto out_buf;
        }
        ret = 1;
                
        for (; index <= 0xff && ret > 0; index++) {
                blk = le32_to_cpu(ref[index]);
                if (!blk)       /* No reference */
                        continue;

                if (depth < LUSTRE_DQTREEDEPTH - 1)
                        ret = walk_tree_dqentry(env, obj, type, blk,
                                                depth + 1, 0, it);
                else
                        ret = walk_block_dqentry(env, obj, type, blk, 0, it);   <--- here, if ret == 0, the "index++" in the "for" clause
                                                                                                          <--- will still be called and cause one entry is skipped.
        }
        it->oiq_blk[depth + 1] = blk;
        it->oiq_index[depth] = index;

out_buf:
        freedqbuf(buf);
        RETURN(ret);
}
Comment by Gerrit Updater [ 22/Mar/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31721
Subject: LU-8999 quota: fix issue of multiple call of seq start
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 24c196304b1e92279ee11b3eb9e0b4f49c11991c

Comment by Gerrit Updater [ 09/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31721/
Subject: LU-8999 quota: fix issue of multiple call of seq start
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 07fdba2244e748e1721232abc52617209a544b87

Comment by Peter Jones [ 09/Apr/18 ]

Landed for 2.12

Comment by Cliff White (Inactive) [ 17/May/18 ]

We are seeing this again on 2.10.4
https://testing.hpdd.intel.com/test_sets/e1084166-599e-11e8-b9d3-52540065bddc

Comment by Gerrit Updater [ 28/May/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/32567
Subject: LU-8999 quota: fix issue of multiple call of seq start
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: a9f70b640a443eb286848736f0ad1ed183c3a702

Comment by Gerrit Updater [ 28/May/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/32568
Subject: LU-8999 quota: fix issue of multiple call of seq start
Project: fs/lustre-release
Branch: b2_11
Current Patch Set: 1
Commit: 4822e044ca9296d4083b86c3667517fb0b2744c7

Comment by Peter Jones [ 08/Aug/18 ]

Should not have been reopened for ticket affecting LTS branch

Comment by Gerrit Updater [ 19/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32567/
Subject: LU-8999 quota: fix issue of multiple call of seq start
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: cf796725fd39f0c807f60a86d2ddaa22e88c3a8c

Generated at Sat Feb 10 02:22:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.