Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13636

local agent on ldiskfs can introduce quota inconsistency on MDT

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      There is a discrepancy spotted between group inode quota and consumption.
      for example in group smith the "lfs quota status" shows:

      Disk quotas for grp smith (gid 6789):
             Filesystem   kbytes     quota     limit   grace   files   quota   limit   grace
               /scratch 54380276  75497472 150994944       -  224737  264000  328000       -
      scr1-MDT0000_UUID    82304         -     93548       -   34053       -   34618       -
      scr1-MDT0001_UUID        0         -    244364       -       0       -       0       -
      scr1-MDT0002_UUID        0         -    241368       -       0       -   33369       -
      scr1-MDT0003_UUID        0         -    242804       -       0       -       0       -
      scr1-MDT0004_UUID        0         -    262144       -       0       -       0       -
      scr1-MDT0005_UUID        0         -     49232       -       0       -    3092       -
      scr1-MDT0006_UUID        0         -     65552       -       0       -    3077       -
      scr1-MDT0007_UUID        0         -     65560       -       0       -    3079       -
      scr1-MDT0008_UUID        0         -     65548       -       0       -    1028       -
      scr1-MDT0009_UUID        0         -        60       -       0       -    1031       -
      scr1-MDT000a_UUID        0         -        40       -       0       -    1028       -
      scr1-MDT000b_UUID        0         -        20       -       0       -    1031       -
      scr1-MDT000c_UUID        0         -        20       -       0       -       0       -
      scr1-MDT000d_UUID   414392         -    424912       -  121483       -  122271       -
      scr1-MDT000e_UUID    73636         -     87852       -   34085       -   34826       -
      scr1-MDT000f_UUID    68160         -     82304       -   35116       -   36051       -
      

      whereas the filesystem scan reveals only 73757 files exist under that group.

      similarly for another group jones:

      Disk quotas for grp jones (gid 5678):
             Filesystem     kbytes       quota      limit   grace   files   quota   limit   grace
               /scratch 1425241616  3670016000 7340032000       -  466507  684000 1368000       -
      scr1-MDT0000_UUID          0           -          0       -       5       -    1029       -
      scr1-MDT0001_UUID          0           -      65536       -       0       -    4096       -
      scr1-MDT0002_UUID          0           -      52216       -       0       -   18733       -
      scr1-MDT0003_UUID          0           -         88       -       0       -      22       -
      scr1-MDT0004_UUID          0           -         28       -       0       -       7       -
      scr1-MDT0005_UUID          0           -         28       -       0       -       7       -
      scr1-MDT0006_UUID          0           -         20       -       0       -       5       -
      scr1-MDT0007_UUID          0           -      65536       -       0       -    4096       -
      scr1-MDT0008_UUID          0           -      65536       -       0       -       0       -
      scr1-MDT0009_UUID          0           -      65536       -       0       -       0       -
      scr1-MDT000a_UUID    1055284           -    1908948       -  296133       -  298943       -
      scr1-MDT000b_UUID      88672           -    1068664       -   51046       -   53892       -
      scr1-MDT000c_UUID      88688           -    1001728       -   48391       -   51229       -
      scr1-MDT000d_UUID     163564           -    1202556       -   70924       -   73763       -
      scr1-MDT000e_UUID          0           -          0       -       7       -    4102       -
      scr1-MDT000f_UUID          0           -          0       -       1       -    1025       -
      

      whereas consumption is 121687 files across filesystem, hence a difference of 344820.

      group dirs are striped on 4 MDTs.

      observations have been across more than a week.

      at the moment, group quota is bumped so user jobs don't run into disk quota exceeded errors.

      also the first MDT in the allocated set, always has more entries than the rest. a quick test with a group creating 100 4-way striped dirs and no files showed same behaviour:

        Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
          /scratch    3520       0       0       -     880       0       0       -
      scr1-MDT0004    2008       -   65536       -     502       -    4096       -
      scr1-MDT0005     504       -   65536       -     126       -    4096       -
      scr1-MDT0006     504       -   65536       -     126       -    4096       -
      scr1-MDT0007     504       -   65536       -     126       -    4096       -
      

      It looks like it is caused by the "agent" directory inode, which is created to track the remote striped directory inode in other MDTs,
      the "agent" inode is created with its parent directory gid,

      static struct inode *osd_create_local_agent_inode(const struct lu_env *env,
                                                        struct osd_device *osd,
                                                        struct osd_object *pobj,
                                                        const struct lu_fid *fid,
                                                        __u32 type,
                                                        struct thandle *th)
      {
              ...
              local = ldiskfs_create_inode(oh->ot_handle, pobj->oo_inode, type,
                                           NULL);                              <----- it pass argument "owner" as "NULL"
              if (IS_ERR(local)) {
                      CERROR("%s: create local error %d\n", osd_name(osd),
                             (int)PTR_ERR(local));
                      RETURN(local);
              }
      
              /*
               * restore i_gid in case S_ISGID is set, we will inherit S_ISGID and set
               * correct gid on remote file, not agent here
               */
              local->i_gid = current_fsgid();                                  <----- restore to gid 0(ROOT)
              ldiskfs_set_inode_state(local, LDISKFS_STATE_LUSTRE_NOSCRUB);
              ...
      }
      
      struct inode *__ldiskfs_new_inode(handle_t *handle, struct inode *dir,
                                     umode_t mode, const struct qstr *qstr,
                                     __u32 goal, uid_t *owner, int handle_type,
                                     unsigned int line_no, int nblocks)
      {
              ...
              /*
               * Initalize owners and quota early so that we don't have to account
               * for quota initialization worst case in standard inode creating
               * transaction
               */
              if (owner) {
                      inode->i_mode = mode;
                      i_uid_write(inode, owner[0]);
                      i_gid_write(inode, owner[1]);
              } else if (test_opt(sb, GRPID)) {
                      inode->i_mode = mode;
                      inode->i_uid = current_fsuid();
                      inode->i_gid = dir->i_gid;
              } else
                      inode_init_owner(inode, dir, mode);
              dquot_initialize(inode);
              ...
      }
      
      void inode_init_owner(struct inode *inode, const struct inode *dir,
                              umode_t mode)
      {       
              inode->i_uid = current_fsuid();
              if (dir && dir->i_mode & S_ISGID) {           <---- create with the its parent directory GID
                      inode->i_gid = dir->i_gid;
                      if (S_ISDIR(mode))
                              mode |= S_ISGID;
              } else
                      inode->i_gid = current_fsgid();
              inode->i_mode = mode;
      }
      
      1. OSD create the agent inode with the argument "owner" as NULL
      2. LDiskFS (ext4) create the inode, and set its GID as its parent's GID because its "S_ISGID" is set.
      3. OSD change the GID of the newly created agent inode to GID 0(ROOT), which is not tracked by the quota

      then the quota usage in LDiskFS (ext4) is damaged.

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: