[LU-13636] local agent on ldiskfs can introduce quota inconsistency on MDT Created: 05/Jun/20  Updated: 30/Jan/22  Resolved: 11/Mar/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.9, Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-14032 The agent inode under directory with ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is a discrepancy spotted between group inode quota and consumption.
for example in group smith the "lfs quota status" shows:

Disk quotas for grp smith (gid 6789):
       Filesystem   kbytes     quota     limit   grace   files   quota   limit   grace
         /scratch 54380276  75497472 150994944       -  224737  264000  328000       -
scr1-MDT0000_UUID    82304         -     93548       -   34053       -   34618       -
scr1-MDT0001_UUID        0         -    244364       -       0       -       0       -
scr1-MDT0002_UUID        0         -    241368       -       0       -   33369       -
scr1-MDT0003_UUID        0         -    242804       -       0       -       0       -
scr1-MDT0004_UUID        0         -    262144       -       0       -       0       -
scr1-MDT0005_UUID        0         -     49232       -       0       -    3092       -
scr1-MDT0006_UUID        0         -     65552       -       0       -    3077       -
scr1-MDT0007_UUID        0         -     65560       -       0       -    3079       -
scr1-MDT0008_UUID        0         -     65548       -       0       -    1028       -
scr1-MDT0009_UUID        0         -        60       -       0       -    1031       -
scr1-MDT000a_UUID        0         -        40       -       0       -    1028       -
scr1-MDT000b_UUID        0         -        20       -       0       -    1031       -
scr1-MDT000c_UUID        0         -        20       -       0       -       0       -
scr1-MDT000d_UUID   414392         -    424912       -  121483       -  122271       -
scr1-MDT000e_UUID    73636         -     87852       -   34085       -   34826       -
scr1-MDT000f_UUID    68160         -     82304       -   35116       -   36051       -

whereas the filesystem scan reveals only 73757 files exist under that group.

similarly for another group jones:

Disk quotas for grp jones (gid 5678):
       Filesystem     kbytes       quota      limit   grace   files   quota   limit   grace
         /scratch 1425241616  3670016000 7340032000       -  466507  684000 1368000       -
scr1-MDT0000_UUID          0           -          0       -       5       -    1029       -
scr1-MDT0001_UUID          0           -      65536       -       0       -    4096       -
scr1-MDT0002_UUID          0           -      52216       -       0       -   18733       -
scr1-MDT0003_UUID          0           -         88       -       0       -      22       -
scr1-MDT0004_UUID          0           -         28       -       0       -       7       -
scr1-MDT0005_UUID          0           -         28       -       0       -       7       -
scr1-MDT0006_UUID          0           -         20       -       0       -       5       -
scr1-MDT0007_UUID          0           -      65536       -       0       -    4096       -
scr1-MDT0008_UUID          0           -      65536       -       0       -       0       -
scr1-MDT0009_UUID          0           -      65536       -       0       -       0       -
scr1-MDT000a_UUID    1055284           -    1908948       -  296133       -  298943       -
scr1-MDT000b_UUID      88672           -    1068664       -   51046       -   53892       -
scr1-MDT000c_UUID      88688           -    1001728       -   48391       -   51229       -
scr1-MDT000d_UUID     163564           -    1202556       -   70924       -   73763       -
scr1-MDT000e_UUID          0           -          0       -       7       -    4102       -
scr1-MDT000f_UUID          0           -          0       -       1       -    1025       -

whereas consumption is 121687 files across filesystem, hence a difference of 344820.

group dirs are striped on 4 MDTs.

observations have been across more than a week.

at the moment, group quota is bumped so user jobs don't run into disk quota exceeded errors.

also the first MDT in the allocated set, always has more entries than the rest. a quick test with a group creating 100 4-way striped dirs and no files showed same behaviour:

  Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /scratch    3520       0       0       -     880       0       0       -
scr1-MDT0004    2008       -   65536       -     502       -    4096       -
scr1-MDT0005     504       -   65536       -     126       -    4096       -
scr1-MDT0006     504       -   65536       -     126       -    4096       -
scr1-MDT0007     504       -   65536       -     126       -    4096       -

It looks like it is caused by the "agent" directory inode, which is created to track the remote striped directory inode in other MDTs,
the "agent" inode is created with its parent directory gid,

static struct inode *osd_create_local_agent_inode(const struct lu_env *env,
                                                  struct osd_device *osd,
                                                  struct osd_object *pobj,
                                                  const struct lu_fid *fid,
                                                  __u32 type,
                                                  struct thandle *th)
{
        ...
        local = ldiskfs_create_inode(oh->ot_handle, pobj->oo_inode, type,
                                     NULL);                              <----- it pass argument "owner" as "NULL"
        if (IS_ERR(local)) {
                CERROR("%s: create local error %d\n", osd_name(osd),
                       (int)PTR_ERR(local));
                RETURN(local);
        }

        /*
         * restore i_gid in case S_ISGID is set, we will inherit S_ISGID and set
         * correct gid on remote file, not agent here
         */
        local->i_gid = current_fsgid();                                  <----- restore to gid 0(ROOT)
        ldiskfs_set_inode_state(local, LDISKFS_STATE_LUSTRE_NOSCRUB);
        ...
}

struct inode *__ldiskfs_new_inode(handle_t *handle, struct inode *dir,
                               umode_t mode, const struct qstr *qstr,
                               __u32 goal, uid_t *owner, int handle_type,
                               unsigned int line_no, int nblocks)
{
        ...
        /*
         * Initalize owners and quota early so that we don't have to account
         * for quota initialization worst case in standard inode creating
         * transaction
         */
        if (owner) {
                inode->i_mode = mode;
                i_uid_write(inode, owner[0]);
                i_gid_write(inode, owner[1]);
        } else if (test_opt(sb, GRPID)) {
                inode->i_mode = mode;
                inode->i_uid = current_fsuid();
                inode->i_gid = dir->i_gid;
        } else
                inode_init_owner(inode, dir, mode);
        dquot_initialize(inode);
        ...
}

void inode_init_owner(struct inode *inode, const struct inode *dir,
                        umode_t mode)
{       
        inode->i_uid = current_fsuid();
        if (dir && dir->i_mode & S_ISGID) {           <---- create with the its parent directory GID
                inode->i_gid = dir->i_gid;
                if (S_ISDIR(mode))
                        mode |= S_ISGID;
        } else
                inode->i_gid = current_fsgid();
        inode->i_mode = mode;
}
  1. OSD create the agent inode with the argument "owner" as NULL
  2. LDiskFS (ext4) create the inode, and set its GID as its parent's GID because its "S_ISGID" is set.
  3. OSD change the GID of the newly created agent inode to GID 0(ROOT), which is not tracked by the quota

then the quota usage in LDiskFS (ext4) is damaged.



 Comments   
Comment by Gerrit Updater [ 05/Jun/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38842
Subject: LU-13636 osd: create agent inode with explicit owner
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 624c8dd0538c9740073c3c3bd8790aae9edfd86a

Comment by Gerrit Updater [ 05/Jun/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38844
Subject: LU-13636 obdclass: drop nlink if directory is removed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0454c361ee965b23596e72a7f5f89775370cacad

Comment by Gerrit Updater [ 22/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38842/
Subject: LU-13636 osd: create agent inode with explicit owner
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7805b45f1182ed21198c0cd2000ffe93b7de5340

Comment by Gerrit Updater [ 26/Oct/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40403
Subject: LU-13636 osd: create agent inode with explicit owner
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 913bd0278adeaeaef32a183303aa21b00bda3c36

Comment by Gerrit Updater [ 03/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40403/
Subject: LU-13636 osd: create agent inode with explicit owner
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 492428ec2fdf3cdcc88d39821f6751ca77fbf8c6

Comment by Gerrit Updater [ 10/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38844/
Subject: LU-13636 obdclass: drop nlink if directory is removed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c6d5c6606a38e2b550a81591935b0091faba4a2e

Comment by Peter Jones [ 11/Mar/21 ]

Landed for 2.15

Comment by Etienne Aujames [ 02/Aug/21 ]

backport on b2_12: https://review.whamcloud.com/44466 ("LU-13636 obdclass: drop nlink if directory is removed")

Comment by Gerrit Updater [ 30/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44466/
Subject: LU-13636 obdclass: drop nlink if directory is removed
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 0594393491fe6b3550873a34dd5e55ea70142624

Generated at Sat Feb 10 03:02:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.