Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4504

User quota problem after Lustre upgrade (2.1.4 to 2.4.1)

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 12320

    Description

      After the upgrade at KIT, the user quotas are not reported correctly. The quota for root seems to be OK. The user quota is 0 on all OSTs, which is wrong.

      e.g for root:

      [root@pfs2n13 ~]# lfs quota -u root -v /lustre/pfs2wor2/client/
      Disk quotas for user root (uid 0):
      Filesystem kbytes quota limit grace files quota limit
      grace
      /lustre/pfs2wor2/client/
      4332006768 0 0 - 790 0
      0 -
      pfs2wor2-MDT0000_UUID
      2349176 - 0 - 790 - 0
      -
      pfs2wor2-OST0000_UUID
      134219820 - 0 - - -

      • -
        pfs2wor2-OST0001_UUID
        12 - 0 - - - -
        -
        pfs2wor2-OST0002_UUID
        134219788 - 0 - - -
      • -

      for a user
      [root@pfs2n3 ~]# lfs quota -v -u aj9102 /lustre/pfs2wor1/client/
      Disk quotas for user aj9102 (uid 3522):
      Filesystem kbytes quota limit grace files quota limit
      grace
      /lustre/pfs2wor1/client/
      448 0 0 - 3985 0 0
      -
      pfs2wor1-MDT0000_UUID
      448 - 0 - 3985 - 0
      -
      pfs2wor1-OST0000_UUID
      0 - 0 - - - -
      -
      pfs2wor1-OST0001_UUID
      0 - 0 - - - -
      -
      pfs2wor1-OST0002_UUID
      0 - 0 - - - -
      -

      Attachments

        Issue Links

          Activity

            [LU-4504] User quota problem after Lustre upgrade (2.1.4 to 2.4.1)

            From the customer:

            It's good news that you found possible reasons for the problem.
            We will install the patches during our next maintenance which is expected to take place during the next 2 months. However, DDN will have to provide a Lustre version which includes those patches.

            For the huge UID/GIDs caused by the lustre defect described in
            LU-4345: Is there a way to repair the bad IDs on the OST objects?

            orentas Oz Rentas (Inactive) added a comment - From the customer: It's good news that you found possible reasons for the problem. We will install the patches during our next maintenance which is expected to take place during the next 2 months. However, DDN will have to provide a Lustre version which includes those patches. For the huge UID/GIDs caused by the lustre defect described in LU-4345 : Is there a way to repair the bad IDs on the OST objects?
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/10227

            The huge UID/GIDs may caused by a lustre defect described in LU-4345.

            And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G.

            static int dict_uint_cmp(const void *a, const void *b)
            {
                    unsigned int    c, d;
            
                    c = VOIDPTR_TO_UINT(a);
                    d = VOIDPTR_TO_UINT(b);
            
                    return c - d;
            }
            

            This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon.

            niu Niu Yawei (Inactive) added a comment - The huge UID/GIDs may caused by a lustre defect described in LU-4345 . And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G. static int dict_uint_cmp( const void *a, const void *b) { unsigned int c, d; c = VOIDPTR_TO_UINT(a); d = VOIDPTR_TO_UINT(b); return c - d; } This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon.

            Thanks Niu. Here is the response from the customer:

            We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006.

            orentas Oz Rentas (Inactive) added a comment - Thanks Niu. Here is the response from the customer: We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006.
            niu Niu Yawei (Inactive) added a comment - - edited

            Note the changes although clients were not mounted in the meantime.

            Orphan cleanup may removed some files.

            Note that tune2fs -O quota reported messages like these:
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35).

            I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example:

            [DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M
            [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M
            

            e2fsprogs is writing UID 2171114240 into quota file, and later on...

            [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M
            [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M
            [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M
            

            e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict.

            I'll investigate further to see what happened when inserting large id into memory dict.

            Is further investigation possible with this information and with the provided tune2fs logs?

            Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs.

            niu Niu Yawei (Inactive) added a comment - - edited Note the changes although clients were not mounted in the meantime. Orphan cleanup may removed some files. Note that tune2fs -O quota reported messages like these: [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5). [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35). I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example: [DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M e2fsprogs is writing UID 2171114240 into quota file, and later on... [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict. I'll investigate further to see what happened when inserting large id into memory dict. Is further investigation possible with this information and with the provided tune2fs logs? Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs.

            we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000.
            We used our perl script which sums up all user and group quotas of acct_user/group in proc. This should show the same results for users and groups but it does not for pfs2dat2-OST0000.

            In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000:
            Sum of inodes of users: 9353416
            Sum of inodes of groups: 9447415
            Sum of kbytes of users: 11926483836
            Sum of kbytes of groups: 12132828844

            After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs):
            Sum of inodes of users: 9325574
            Sum of inodes of groups: 9446294
            Sum of kbytes of users: 11897886304
            Sum of kbytes of groups: 12132673600
            Note the changes although clients were not mounted in the meantime.

            After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs):
            Sum of inodes of users: 9325357
            Sum of inodes of groups: 9446077
            Sum of kbytes of users: 11897857144
            Sum of kbytes of groups: 12132644440
            Note that tune2fs -O quota reported messages like these:
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).
            [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35).

            After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs):
            Sum of inodes of users: 9325357
            Sum of inodes of groups: 9446077
            Sum of kbytes of users: 11897857144
            Sum of kbytes of groups: 12132644440

            It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1,
            pfs2wor2 has 69 million files and default stripe count 2.

            Is further investigation possible with this information and with the provided tune2fs logs?

            If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful.

            orentas Oz Rentas (Inactive) added a comment - we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000. We used our perl script which sums up all user and group quotas of acct_user/group in proc. This should show the same results for users and groups but it does not for pfs2dat2-OST0000. In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000: Sum of inodes of users: 9353416 Sum of inodes of groups: 9447415 Sum of kbytes of users: 11926483836 Sum of kbytes of groups: 12132828844 After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs): Sum of inodes of users: 9325574 Sum of inodes of groups: 9446294 Sum of kbytes of users: 11897886304 Sum of kbytes of groups: 12132673600 Note the changes although clients were not mounted in the meantime. After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs): Sum of inodes of users: 9325357 Sum of inodes of groups: 9446077 Sum of kbytes of users: 11897857144 Sum of kbytes of groups: 12132644440 Note that tune2fs -O quota reported messages like these: [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5). [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35). After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs): Sum of inodes of users: 9325357 Sum of inodes of groups: 9446077 Sum of kbytes of users: 11897857144 Sum of kbytes of groups: 12132644440 It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1, pfs2wor2 has 69 million files and default stripe count 2. Is further investigation possible with this information and with the provided tune2fs logs? If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful.

            Oz, which uid/gid has problem on pfs2dat2-OST0000?

            niu Niu Yawei (Inactive) added a comment - Oz, which uid/gid has problem on pfs2dat2-OST0000?

            The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem.
            .
            For OST pfs2dat2-OST0000 the customer also used the patched e2fsprogs and collected all output.

            The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB)

            orentas Oz Rentas (Inactive) added a comment - The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem. . For OST pfs2dat2-OST0000 the customer also used the patched e2fsprogs and collected all output. The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB)

            Do you have any comments or ideas about the possible reason for the problem?

            This sounds same problem as OST0007, and OST0007 can be fixed by re-run "tune2fs -O quota" (with uptodate e2fsprogs), can these problematic OSTs be fixed in same way or not?

            niu Niu Yawei (Inactive) added a comment - Do you have any comments or ideas about the possible reason for the problem? This sounds same problem as OST0007, and OST0007 can be fixed by re-run "tune2fs -O quota" (with uptodate e2fsprogs), can these problematic OSTs be fixed in same way or not?

            The customer doesn't believe that tune2fs was missed on some OSTs. Either this is a general Lustre problem or it is a problem with the vendors tunefs wrapper script.

            Concerning this wrapper script, the field engineer sent the following:


            > I just had a look at the upgrade documentation which was sent by the vendor: tunefs is problematic > and does not always work.

            I have not completely isolated the problem yet. I think it is the EXAScaler tunefs wrapper script es_tunefs. It does a second tunefs to set the MMP timeout, which may be causing the problem.
            If tunefs was not done correctly the OST do not register with the MGS correctly and the clients can not access some of the OSTs. This can be easily verified by running "lfs df" on a client. I have noticed that this behaviour is worse with Lustre 2.4.x but i have seen it with older Lustre versions as well.


            Since I had noticed a strange difference between user and group quotas I wrote a perl script which checks the sum of "acct_user/group"
            in proc. The perl script and a text file with the output on all file systems is attached.

            Here are the results:
            1. Some OSTs are affected and others are not affected, i.e.
            this is an easy way to find out which OSTs are affected.
            2. On the affected OSTs both inodes and kbytes are wrong.
            3. We have higher group values and higher user values, i.e.
            both user and group quotas are affected.
            4. On the different file systems in nearly all cases either group values or user values are higher. The reason for this behaviour is not clear.

            Do you have any comments or ideas about the possible reason for the problem?

            orentas Oz Rentas (Inactive) added a comment - The customer doesn't believe that tune2fs was missed on some OSTs. Either this is a general Lustre problem or it is a problem with the vendors tunefs wrapper script. Concerning this wrapper script, the field engineer sent the following: > I just had a look at the upgrade documentation which was sent by the vendor: tunefs is problematic > and does not always work. I have not completely isolated the problem yet. I think it is the EXAScaler tunefs wrapper script es_tunefs. It does a second tunefs to set the MMP timeout, which may be causing the problem. If tunefs was not done correctly the OST do not register with the MGS correctly and the clients can not access some of the OSTs. This can be easily verified by running "lfs df" on a client. I have noticed that this behaviour is worse with Lustre 2.4.x but i have seen it with older Lustre versions as well. Since I had noticed a strange difference between user and group quotas I wrote a perl script which checks the sum of "acct_user/group" in proc. The perl script and a text file with the output on all file systems is attached. Here are the results: 1. Some OSTs are affected and others are not affected, i.e. this is an easy way to find out which OSTs are affected. 2. On the affected OSTs both inodes and kbytes are wrong. 3. We have higher group values and higher user values, i.e. both user and group quotas are affected. 4. On the different file systems in nearly all cases either group values or user values are higher. The reason for this behaviour is not clear. Do you have any comments or ideas about the possible reason for the problem?

            1. The software is usually installed by pdsh, i.e. it is the same on all servers.
            2. This does not explain why some OSTs on the same OSS showed no problems with their quotas.
            3. Also, OST0007 did not have a quota problem for most users.

            Comparing the two versions of accounting information of OST0007 (before and after executing tune2fs), we can see lots of user accounting was fixed, so I think many users were having accounting problems, but not discovered. Maybe it's same to other OSTs on the same OSS?

            Another possibility is that customer just missed tune2fs on OST0007?

            1. The vendor had written that he had done the following for one file system and this had not fixed the quota problem:
            turn off quota first, turn it back on and run an e2fsck
            How can we be sure to have a procedure which clearly fixes the problem?

            I think first we'd make sure we are using the correct e2fsprogs. To verify if the accounting information is fixed, you can check the "acct_user/group" in proc file.

            2. The problem might move to different users after disabling and re-enabling quotas. How can we easily and quickly find out if the problem still appears?

            Disable/re-enable quota is just for triggering a quotacheck, you can verify the accounting information in proc file.

            Another interesting thing to note is both user quotas and group quotas are used, but there was not a problem with group quotas.

            I think it probably because it was just not detected.

            niu Niu Yawei (Inactive) added a comment - 1. The software is usually installed by pdsh, i.e. it is the same on all servers. 2. This does not explain why some OSTs on the same OSS showed no problems with their quotas. 3. Also, OST0007 did not have a quota problem for most users. Comparing the two versions of accounting information of OST0007 (before and after executing tune2fs), we can see lots of user accounting was fixed, so I think many users were having accounting problems, but not discovered. Maybe it's same to other OSTs on the same OSS? Another possibility is that customer just missed tune2fs on OST0007? 1. The vendor had written that he had done the following for one file system and this had not fixed the quota problem: turn off quota first, turn it back on and run an e2fsck How can we be sure to have a procedure which clearly fixes the problem? I think first we'd make sure we are using the correct e2fsprogs. To verify if the accounting information is fixed, you can check the "acct_user/group" in proc file. 2. The problem might move to different users after disabling and re-enabling quotas. How can we easily and quickly find out if the problem still appears? Disable/re-enable quota is just for triggering a quotacheck, you can verify the accounting information in proc file. Another interesting thing to note is both user quotas and group quotas are used, but there was not a problem with group quotas. I think it probably because it was just not detected.

            People

              niu Niu Yawei (Inactive)
              orentas Oz Rentas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: