Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4504

User quota problem after Lustre upgrade (2.1.4 to 2.4.1)

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 12320

    Description

      After the upgrade at KIT, the user quotas are not reported correctly. The quota for root seems to be OK. The user quota is 0 on all OSTs, which is wrong.

      e.g for root:

      [root@pfs2n13 ~]# lfs quota -u root -v /lustre/pfs2wor2/client/
      Disk quotas for user root (uid 0):
      Filesystem kbytes quota limit grace files quota limit
      grace
      /lustre/pfs2wor2/client/
      4332006768 0 0 - 790 0
      0 -
      pfs2wor2-MDT0000_UUID
      2349176 - 0 - 790 - 0
      -
      pfs2wor2-OST0000_UUID
      134219820 - 0 - - -

      • -
        pfs2wor2-OST0001_UUID
        12 - 0 - - - -
        -
        pfs2wor2-OST0002_UUID
        134219788 - 0 - - -
      • -

      for a user
      [root@pfs2n3 ~]# lfs quota -v -u aj9102 /lustre/pfs2wor1/client/
      Disk quotas for user aj9102 (uid 3522):
      Filesystem kbytes quota limit grace files quota limit
      grace
      /lustre/pfs2wor1/client/
      448 0 0 - 3985 0 0
      -
      pfs2wor1-MDT0000_UUID
      448 - 0 - 3985 - 0
      -
      pfs2wor1-OST0000_UUID
      0 - 0 - - - -
      -
      pfs2wor1-OST0001_UUID
      0 - 0 - - - -
      -
      pfs2wor1-OST0002_UUID
      0 - 0 - - - -
      -

      Attachments

        Issue Links

          Activity

            [LU-4504] User quota problem after Lustre upgrade (2.1.4 to 2.4.1)

            I updated the way of how to fix bad ID for OST object (see my previous comment). Thanks.

            niu Niu Yawei (Inactive) added a comment - I updated the way of how to fix bad ID for OST object (see my previous comment). Thanks.

            Hello Oz,
            We've recently heard from another site that Niu's fixes have resolved the quota problems they were seeing.
            Has DDN installed a new version at this site, with those patches?
            If so, do you have any new news on this?
            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Hello Oz, We've recently heard from another site that Niu's fixes have resolved the quota problems they were seeing. Has DDN installed a new version at this site, with those patches? If so, do you have any new news on this? Thanks, ~ jfc.
            niu Niu Yawei (Inactive) added a comment - - edited

            For the huge UID/GIDs caused by the lustre defect described in
            LU-4345: Is there a way to repair the bad IDs on the OST objects?

            Fix bad IDs on existing OST objects:

            • Find the objects with bad IDs first (mount the OST device with ldiskfs to check IDs of each file or use debugfs without umount)
            • Get the correct ID from MDT (see Lustre manual 13.14 to identify which file the object belongs to) the set the correct IDs to these OST objects.
            • Set correct IDs for the objects on OST directly.
            niu Niu Yawei (Inactive) added a comment - - edited For the huge UID/GIDs caused by the lustre defect described in LU-4345 : Is there a way to repair the bad IDs on the OST objects? Fix bad IDs on existing OST objects: Find the objects with bad IDs first (mount the OST device with ldiskfs to check IDs of each file or use debugfs without umount) Get the correct ID from MDT (see Lustre manual 13.14 to identify which file the object belongs to) the set the correct IDs to these OST objects. Set correct IDs for the objects on OST directly.

            From the customer:

            It's good news that you found possible reasons for the problem.
            We will install the patches during our next maintenance which is expected to take place during the next 2 months. However, DDN will have to provide a Lustre version which includes those patches.

            For the huge UID/GIDs caused by the lustre defect described in
            LU-4345: Is there a way to repair the bad IDs on the OST objects?

            orentas Oz Rentas (Inactive) added a comment - From the customer: It's good news that you found possible reasons for the problem. We will install the patches during our next maintenance which is expected to take place during the next 2 months. However, DDN will have to provide a Lustre version which includes those patches. For the huge UID/GIDs caused by the lustre defect described in LU-4345 : Is there a way to repair the bad IDs on the OST objects?
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/10227

            The huge UID/GIDs may caused by a lustre defect described in LU-4345.

            And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G.

            static int dict_uint_cmp(const void *a, const void *b)
            {
                    unsigned int    c, d;
            
                    c = VOIDPTR_TO_UINT(a);
                    d = VOIDPTR_TO_UINT(b);
            
                    return c - d;
            }
            

            This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon.

            niu Niu Yawei (Inactive) added a comment - The huge UID/GIDs may caused by a lustre defect described in LU-4345 . And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G. static int dict_uint_cmp( const void *a, const void *b) { unsigned int c, d; c = VOIDPTR_TO_UINT(a); d = VOIDPTR_TO_UINT(b); return c - d; } This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon.

            Thanks Niu. Here is the response from the customer:

            We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006.

            orentas Oz Rentas (Inactive) added a comment - Thanks Niu. Here is the response from the customer: We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006.
            niu Niu Yawei (Inactive) added a comment - - edited

            Note the changes although clients were not mounted in the meantime.

            Orphan cleanup may removed some files.

            Note that tune2fs -O quota reported messages like these:
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35).

            I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example:

            [DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M
            [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M
            

            e2fsprogs is writing UID 2171114240 into quota file, and later on...

            [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M
            [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M
            

            e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict.

            I'll investigate further to see what happened when inserting large id into memory dict.

            Is further investigation possible with this information and with the provided tune2fs logs?

            Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs.

            niu Niu Yawei (Inactive) added a comment - - edited Note the changes although clients were not mounted in the meantime. Orphan cleanup may removed some files. Note that tune2fs -O quota reported messages like these: [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5). [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35). I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example: [DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M e2fsprogs is writing UID 2171114240 into quota file, and later on... [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict. I'll investigate further to see what happened when inserting large id into memory dict. Is further investigation possible with this information and with the provided tune2fs logs? Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs.

            we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000.
            We used our perl script which sums up all user and group quotas of acct_user/group in proc. This should show the same results for users and groups but it does not for pfs2dat2-OST0000.

            In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000:
            Sum of inodes of users: 9353416
            Sum of inodes of groups: 9447415
            Sum of kbytes of users: 11926483836
            Sum of kbytes of groups: 12132828844

            After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs):
            Sum of inodes of users: 9325574
            Sum of inodes of groups: 9446294
            Sum of kbytes of users: 11897886304
            Sum of kbytes of groups: 12132673600
            Note the changes although clients were not mounted in the meantime.

            After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs):
            Sum of inodes of users: 9325357
            Sum of inodes of groups: 9446077
            Sum of kbytes of users: 11897857144
            Sum of kbytes of groups: 12132644440
            Note that tune2fs -O quota reported messages like these:
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35).

            After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs):
            Sum of inodes of users: 9325357
            Sum of inodes of groups: 9446077
            Sum of kbytes of users: 11897857144
            Sum of kbytes of groups: 12132644440

            It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1,
            pfs2wor2 has 69 million files and default stripe count 2.

            Is further investigation possible with this information and with the provided tune2fs logs?

            If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful.

            orentas Oz Rentas (Inactive) added a comment - we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000. We used our perl script which sums up all user and group quotas of acct_user/group in proc. This should show the same results for users and groups but it does not for pfs2dat2-OST0000. In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000: Sum of inodes of users: 9353416 Sum of inodes of groups: 9447415 Sum of kbytes of users: 11926483836 Sum of kbytes of groups: 12132828844 After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs): Sum of inodes of users: 9325574 Sum of inodes of groups: 9446294 Sum of kbytes of users: 11897886304 Sum of kbytes of groups: 12132673600 Note the changes although clients were not mounted in the meantime. After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs): Sum of inodes of users: 9325357 Sum of inodes of groups: 9446077 Sum of kbytes of users: 11897857144 Sum of kbytes of groups: 12132644440 Note that tune2fs -O quota reported messages like these: [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5). [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35). After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs): Sum of inodes of users: 9325357 Sum of inodes of groups: 9446077 Sum of kbytes of users: 11897857144 Sum of kbytes of groups: 12132644440 It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1, pfs2wor2 has 69 million files and default stripe count 2. Is further investigation possible with this information and with the provided tune2fs logs? If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful.

            Oz, which uid/gid has problem on pfs2dat2-OST0000?

            niu Niu Yawei (Inactive) added a comment - Oz, which uid/gid has problem on pfs2dat2-OST0000?

            The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem.
            .
            For OST pfs2dat2-OST0000 the customer also used the patched e2fsprogs and collected all output.

            The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB)

            orentas Oz Rentas (Inactive) added a comment - The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem. . For OST pfs2dat2-OST0000 the customer also used the patched e2fsprogs and collected all output. The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB)

            People

              niu Niu Yawei (Inactive)
              orentas Oz Rentas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: