Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13636

local agent on ldiskfs can introduce quota inconsistency on MDT

Details

    • 3
    • 9223372036854775807

    Description

      There is a discrepancy spotted between group inode quota and consumption.
      for example in group smith the "lfs quota status" shows:

      Disk quotas for grp smith (gid 6789):
             Filesystem   kbytes     quota     limit   grace   files   quota   limit   grace
               /scratch 54380276  75497472 150994944       -  224737  264000  328000       -
      scr1-MDT0000_UUID    82304         -     93548       -   34053       -   34618       -
      scr1-MDT0001_UUID        0         -    244364       -       0       -       0       -
      scr1-MDT0002_UUID        0         -    241368       -       0       -   33369       -
      scr1-MDT0003_UUID        0         -    242804       -       0       -       0       -
      scr1-MDT0004_UUID        0         -    262144       -       0       -       0       -
      scr1-MDT0005_UUID        0         -     49232       -       0       -    3092       -
      scr1-MDT0006_UUID        0         -     65552       -       0       -    3077       -
      scr1-MDT0007_UUID        0         -     65560       -       0       -    3079       -
      scr1-MDT0008_UUID        0         -     65548       -       0       -    1028       -
      scr1-MDT0009_UUID        0         -        60       -       0       -    1031       -
      scr1-MDT000a_UUID        0         -        40       -       0       -    1028       -
      scr1-MDT000b_UUID        0         -        20       -       0       -    1031       -
      scr1-MDT000c_UUID        0         -        20       -       0       -       0       -
      scr1-MDT000d_UUID   414392         -    424912       -  121483       -  122271       -
      scr1-MDT000e_UUID    73636         -     87852       -   34085       -   34826       -
      scr1-MDT000f_UUID    68160         -     82304       -   35116       -   36051       -
      

      whereas the filesystem scan reveals only 73757 files exist under that group.

      similarly for another group jones:

      Disk quotas for grp jones (gid 5678):
             Filesystem     kbytes       quota      limit   grace   files   quota   limit   grace
               /scratch 1425241616  3670016000 7340032000       -  466507  684000 1368000       -
      scr1-MDT0000_UUID          0           -          0       -       5       -    1029       -
      scr1-MDT0001_UUID          0           -      65536       -       0       -    4096       -
      scr1-MDT0002_UUID          0           -      52216       -       0       -   18733       -
      scr1-MDT0003_UUID          0           -         88       -       0       -      22       -
      scr1-MDT0004_UUID          0           -         28       -       0       -       7       -
      scr1-MDT0005_UUID          0           -         28       -       0       -       7       -
      scr1-MDT0006_UUID          0           -         20       -       0       -       5       -
      scr1-MDT0007_UUID          0           -      65536       -       0       -    4096       -
      scr1-MDT0008_UUID          0           -      65536       -       0       -       0       -
      scr1-MDT0009_UUID          0           -      65536       -       0       -       0       -
      scr1-MDT000a_UUID    1055284           -    1908948       -  296133       -  298943       -
      scr1-MDT000b_UUID      88672           -    1068664       -   51046       -   53892       -
      scr1-MDT000c_UUID      88688           -    1001728       -   48391       -   51229       -
      scr1-MDT000d_UUID     163564           -    1202556       -   70924       -   73763       -
      scr1-MDT000e_UUID          0           -          0       -       7       -    4102       -
      scr1-MDT000f_UUID          0           -          0       -       1       -    1025       -
      

      whereas consumption is 121687 files across filesystem, hence a difference of 344820.

      group dirs are striped on 4 MDTs.

      observations have been across more than a week.

      at the moment, group quota is bumped so user jobs don't run into disk quota exceeded errors.

      also the first MDT in the allocated set, always has more entries than the rest. a quick test with a group creating 100 4-way striped dirs and no files showed same behaviour:

        Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
          /scratch    3520       0       0       -     880       0       0       -
      scr1-MDT0004    2008       -   65536       -     502       -    4096       -
      scr1-MDT0005     504       -   65536       -     126       -    4096       -
      scr1-MDT0006     504       -   65536       -     126       -    4096       -
      scr1-MDT0007     504       -   65536       -     126       -    4096       -
      

      It looks like it is caused by the "agent" directory inode, which is created to track the remote striped directory inode in other MDTs,
      the "agent" inode is created with its parent directory gid,

      static struct inode *osd_create_local_agent_inode(const struct lu_env *env,
                                                        struct osd_device *osd,
                                                        struct osd_object *pobj,
                                                        const struct lu_fid *fid,
                                                        __u32 type,
                                                        struct thandle *th)
      {
              ...
              local = ldiskfs_create_inode(oh->ot_handle, pobj->oo_inode, type,
                                           NULL);                              <----- it pass argument "owner" as "NULL"
              if (IS_ERR(local)) {
                      CERROR("%s: create local error %d\n", osd_name(osd),
                             (int)PTR_ERR(local));
                      RETURN(local);
              }
      
              /*
               * restore i_gid in case S_ISGID is set, we will inherit S_ISGID and set
               * correct gid on remote file, not agent here
               */
              local->i_gid = current_fsgid();                                  <----- restore to gid 0(ROOT)
              ldiskfs_set_inode_state(local, LDISKFS_STATE_LUSTRE_NOSCRUB);
              ...
      }
      
      struct inode *__ldiskfs_new_inode(handle_t *handle, struct inode *dir,
                                     umode_t mode, const struct qstr *qstr,
                                     __u32 goal, uid_t *owner, int handle_type,
                                     unsigned int line_no, int nblocks)
      {
              ...
              /*
               * Initalize owners and quota early so that we don't have to account
               * for quota initialization worst case in standard inode creating
               * transaction
               */
              if (owner) {
                      inode->i_mode = mode;
                      i_uid_write(inode, owner[0]);
                      i_gid_write(inode, owner[1]);
              } else if (test_opt(sb, GRPID)) {
                      inode->i_mode = mode;
                      inode->i_uid = current_fsuid();
                      inode->i_gid = dir->i_gid;
              } else
                      inode_init_owner(inode, dir, mode);
              dquot_initialize(inode);
              ...
      }
      
      void inode_init_owner(struct inode *inode, const struct inode *dir,
                              umode_t mode)
      {       
              inode->i_uid = current_fsuid();
              if (dir && dir->i_mode & S_ISGID) {           <---- create with the its parent directory GID
                      inode->i_gid = dir->i_gid;
                      if (S_ISDIR(mode))
                              mode |= S_ISGID;
              } else
                      inode->i_gid = current_fsgid();
              inode->i_mode = mode;
      }
      
      1. OSD create the agent inode with the argument "owner" as NULL
      2. LDiskFS (ext4) create the inode, and set its GID as its parent's GID because its "S_ISGID" is set.
      3. OSD change the GID of the newly created agent inode to GID 0(ROOT), which is not tracked by the quota

      then the quota usage in LDiskFS (ext4) is damaged.

      Attachments

        Issue Links

          Activity

            [LU-13636] local agent on ldiskfs can introduce quota inconsistency on MDT

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44466/
            Subject: LU-13636 obdclass: drop nlink if directory is removed
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 0594393491fe6b3550873a34dd5e55ea70142624

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44466/ Subject: LU-13636 obdclass: drop nlink if directory is removed Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 0594393491fe6b3550873a34dd5e55ea70142624

            backport on b2_12: https://review.whamcloud.com/44466 ("LU-13636 obdclass: drop nlink if directory is removed")

            eaujames Etienne Aujames added a comment - backport on b2_12: https://review.whamcloud.com/44466  (" LU-13636 obdclass: drop nlink if directory is removed")
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38844/
            Subject: LU-13636 obdclass: drop nlink if directory is removed
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c6d5c6606a38e2b550a81591935b0091faba4a2e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38844/ Subject: LU-13636 obdclass: drop nlink if directory is removed Project: fs/lustre-release Branch: master Current Patch Set: Commit: c6d5c6606a38e2b550a81591935b0091faba4a2e

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40403/
            Subject: LU-13636 osd: create agent inode with explicit owner
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 492428ec2fdf3cdcc88d39821f6751ca77fbf8c6

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40403/ Subject: LU-13636 osd: create agent inode with explicit owner Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 492428ec2fdf3cdcc88d39821f6751ca77fbf8c6

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40403
            Subject: LU-13636 osd: create agent inode with explicit owner
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 913bd0278adeaeaef32a183303aa21b00bda3c36

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40403 Subject: LU-13636 osd: create agent inode with explicit owner Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 913bd0278adeaeaef32a183303aa21b00bda3c36

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38842/
            Subject: LU-13636 osd: create agent inode with explicit owner
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7805b45f1182ed21198c0cd2000ffe93b7de5340

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38842/ Subject: LU-13636 osd: create agent inode with explicit owner Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7805b45f1182ed21198c0cd2000ffe93b7de5340

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38844
            Subject: LU-13636 obdclass: drop nlink if directory is removed
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0454c361ee965b23596e72a7f5f89775370cacad

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38844 Subject: LU-13636 obdclass: drop nlink if directory is removed Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0454c361ee965b23596e72a7f5f89775370cacad

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38842
            Subject: LU-13636 osd: create agent inode with explicit owner
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 624c8dd0538c9740073c3c3bd8790aae9edfd86a

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38842 Subject: LU-13636 osd: create agent inode with explicit owner Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 624c8dd0538c9740073c3c3bd8790aae9edfd86a

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: