[LU-4504] User quota problem after Lustre upgrade (2.1.4 to 2.4.1) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.4.1
Labels:
None

Severity:
3
Rank (Obsolete):
12320

Description

After the upgrade at KIT, the user quotas are not reported correctly. The quota for root seems to be OK. The user quota is 0 on all OSTs, which is wrong.

e.g for root:

[root@pfs2n13 ~]# lfs quota -u root -v /lustre/pfs2wor2/client/
Disk quotas for user root (uid 0):
Filesystem kbytes quota limit grace files quota limit
grace
/lustre/pfs2wor2/client/
4332006768 0 0 - 790 0
0 -
pfs2wor2-MDT0000_UUID
2349176 - 0 - 790 - 0
-
pfs2wor2-OST0000_UUID
134219820 - 0 - - -

-
pfs2wor2-OST0001_UUID
12 - 0 - - - -
-
pfs2wor2-OST0002_UUID
134219788 - 0 - - -
-

for a user
[root@pfs2n3 ~]# lfs quota -v -u aj9102 /lustre/pfs2wor1/client/
Disk quotas for user aj9102 (uid 3522):
Filesystem kbytes quota limit grace files quota limit
grace
/lustre/pfs2wor1/client/
448 0 0 - 3985 0 0
-
pfs2wor1-MDT0000_UUID
448 - 0 - 3985 - 0
-
pfs2wor1-OST0000_UUID
0 - 0 - - - -
-
pfs2wor1-OST0001_UUID
0 - 0 - - - -
-
pfs2wor1-OST0002_UUID
0 - 0 - - - -
-

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

check_quotas_bad_user_20140123.txt
7 kB
24/Jan/14 1:47 AM
compare_user_group_quotas_20140327.txt
58 kB
27/Mar/14 5:51 PM
pfs2n18-quota_slaveinfo.txt
3 kB
12/Mar/14 3:52 PM
pfs2wor2_check_quotas_bad_user_20140204.txt
9 kB
11/Feb/14 2:12 AM
pfs2wor2_check_quotas_bad_user_20140313.txt
20 kB
14/Mar/14 5:11 AM
pfs2wor2-OST0007_acct_user_20140213.txt
3 kB
20/Feb/14 9:02 PM
pfs2wor2-OST0007_acct_user_20140313.txt
6 kB
14/Mar/14 5:11 AM

Issue Links

is blocked by

LU-5018 e2fsprogs build failed

Resolved

is related to

LU-4345 failed to update accounting ZAP for user

Resolved

LU-5129 CLONE - failed to update accounting ZAP for user

Resolved

Activity

[LU-4504] User quota problem after Lustre upgrade (2.1.4 to 2.4.1)

Niu Yawei (Inactive) added a comment - 06/Aug/14 1:46 AM

I updated the way of how to fix bad ID for OST object (see my previous comment). Thanks.

Niu Yawei (Inactive) added a comment - 06/Aug/14 1:46 AM I updated the way of how to fix bad ID for OST object (see my previous comment). Thanks.

John Fuchs-Chesney (Inactive) added a comment - 05/Aug/14 3:46 PM

Hello Oz,
We've recently heard from another site that Niu's fixes have resolved the quota problems they were seeing.
Has DDN installed a new version at this site, with those patches?
If so, do you have any new news on this?
Thanks,
~ jfc.

John Fuchs-Chesney (Inactive) added a comment - 05/Aug/14 3:46 PM Hello Oz, We've recently heard from another site that Niu's fixes have resolved the quota problems they were seeing. Has DDN installed a new version at this site, with those patches? If so, do you have any new news on this? Thanks, ~ jfc.

Niu Yawei (Inactive) added a comment - 09/May/14 6:07 AM - edited

For the huge UID/GIDs caused by the lustre defect described in
~~LU-4345~~: Is there a way to repair the bad IDs on the OST objects?

Fix bad IDs on existing OST objects:

Find the objects with bad IDs first (mount the OST device with ldiskfs to check IDs of each file or use debugfs without umount)
Get the correct ID from MDT (see Lustre manual 13.14 to identify which file the object belongs to) the set the correct IDs to these OST objects.
Set correct IDs for the objects on OST directly.

Niu Yawei (Inactive) added a comment - 09/May/14 6:07 AM - edited For the huge UID/GIDs caused by the lustre defect described in LU-4345 : Is there a way to repair the bad IDs on the OST objects? Fix bad IDs on existing OST objects: Find the objects with bad IDs first (mount the OST device with ldiskfs to check IDs of each file or use debugfs without umount) Get the correct ID from MDT (see Lustre manual 13.14 to identify which file the object belongs to) the set the correct IDs to these OST objects. Set correct IDs for the objects on OST directly.

Oz Rentas (Inactive) added a comment - 09/May/14 5:23 AM

From the customer:

It's good news that you found possible reasons for the problem.
We will install the patches during our next maintenance which is expected to take place during the next 2 months. However, DDN will have to provide a Lustre version which includes those patches.

For the huge UID/GIDs caused by the lustre defect described in
~~LU-4345~~: Is there a way to repair the bad IDs on the OST objects?

Oz Rentas (Inactive) added a comment - 09/May/14 5:23 AM From the customer: It's good news that you found possible reasons for the problem. We will install the patches during our next maintenance which is expected to take place during the next 2 months. However, DDN will have to provide a Lustre version which includes those patches. For the huge UID/GIDs caused by the lustre defect described in LU-4345 : Is there a way to repair the bad IDs on the OST objects?

Niu Yawei (Inactive) added a comment - 06/May/14 9:09 AM

http://review.whamcloud.com/10227

Niu Yawei (Inactive) added a comment - 06/May/14 9:09 AM http://review.whamcloud.com/10227

Niu Yawei (Inactive) added a comment - 06/May/14 8:19 AM

The huge UID/GIDs may caused by a lustre defect described in ~~LU-4345~~.

And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G.

static int dict_uint_cmp(const void *a, const void *b)
{
        unsigned int    c, d;

        c = VOIDPTR_TO_UINT(a);
        d = VOIDPTR_TO_UINT(b);

        return c - d;
}

This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon.

Niu Yawei (Inactive) added a comment - 06/May/14 8:19 AM The huge UID/GIDs may caused by a lustre defect described in LU-4345 . And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G. static int dict_uint_cmp( const void *a, const void *b) { unsigned int c, d; c = VOIDPTR_TO_UINT(a); d = VOIDPTR_TO_UINT(b); return c - d; } This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon.

Oz Rentas (Inactive) added a comment - 05/May/14 4:30 PM

Thanks Niu. Here is the response from the customer:

We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006.

Oz Rentas (Inactive) added a comment - 05/May/14 4:30 PM Thanks Niu. Here is the response from the customer: We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006.

Niu Yawei (Inactive) added a comment - 04/May/14 3:52 AM - edited

Note the changes although clients were not mounted in the meantime.

Orphan cleanup may removed some files.

Note that tune2fs -O quota reported messages like these:
[ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).
[ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35).

I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example:

[DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M
[DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M

e2fsprogs is writing UID 2171114240 into quota file, and later on...

[DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M
[DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M
[ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M
[DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M

e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict.

I'll investigate further to see what happened when inserting large id into memory dict.

Is further investigation possible with this information and with the provided tune2fs logs?

Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs.

Niu Yawei (Inactive) added a comment - 04/May/14 3:52 AM - edited Note the changes although clients were not mounted in the meantime. Orphan cleanup may removed some files. Note that tune2fs -O quota reported messages like these: [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5). [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35). I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example: [DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M e2fsprogs is writing UID 2171114240 into quota file, and later on... [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict. I'll investigate further to see what happened when inserting large id into memory dict. Is further investigation possible with this information and with the provided tune2fs logs? Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs.

Oz Rentas (Inactive) added a comment - 30/Apr/14 4:26 PM

we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000.
We used our perl script which sums up all user and group quotas of acct_user/group in proc. This should show the same results for users and groups but it does not for pfs2dat2-OST0000.

In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000:
Sum of inodes of users: 9353416
Sum of inodes of groups: 9447415
Sum of kbytes of users: 11926483836
Sum of kbytes of groups: 12132828844

After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs):
Sum of inodes of users: 9325574
Sum of inodes of groups: 9446294
Sum of kbytes of users: 11897886304
Sum of kbytes of groups: 12132673600
Note the changes although clients were not mounted in the meantime.

After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs):
Sum of inodes of users: 9325357
Sum of inodes of groups: 9446077
Sum of kbytes of users: 11897857144
Sum of kbytes of groups: 12132644440
Note that tune2fs -O quota reported messages like these:
[ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).
[ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35).

After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs):
Sum of inodes of users: 9325357
Sum of inodes of groups: 9446077
Sum of kbytes of users: 11897857144
Sum of kbytes of groups: 12132644440

It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1,
pfs2wor2 has 69 million files and default stripe count 2.

Is further investigation possible with this information and with the provided tune2fs logs?

If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful.

Oz Rentas (Inactive) added a comment - 30/Apr/14 4:26 PM we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000. We used our perl script which sums up all user and group quotas of acct_user/group in proc. This should show the same results for users and groups but it does not for pfs2dat2-OST0000. In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000: Sum of inodes of users: 9353416 Sum of inodes of groups: 9447415 Sum of kbytes of users: 11926483836 Sum of kbytes of groups: 12132828844 After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs): Sum of inodes of users: 9325574 Sum of inodes of groups: 9446294 Sum of kbytes of users: 11897886304 Sum of kbytes of groups: 12132673600 Note the changes although clients were not mounted in the meantime. After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs): Sum of inodes of users: 9325357 Sum of inodes of groups: 9446077 Sum of kbytes of users: 11897857144 Sum of kbytes of groups: 12132644440 Note that tune2fs -O quota reported messages like these: [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5). [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 35). After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs): Sum of inodes of users: 9325357 Sum of inodes of groups: 9446077 Sum of kbytes of users: 11897857144 Sum of kbytes of groups: 12132644440 It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1, pfs2wor2 has 69 million files and default stripe count 2. Is further investigation possible with this information and with the provided tune2fs logs? If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful.

Niu Yawei (Inactive) added a comment - 29/Apr/14 1:26 AM

Oz, which uid/gid has problem on pfs2dat2-OST0000?

Niu Yawei (Inactive) added a comment - 29/Apr/14 1:26 AM Oz, which uid/gid has problem on pfs2dat2-OST0000?

Oz Rentas (Inactive) added a comment - 28/Apr/14 10:44 PM

The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem.
.
For OST pfs2dat2-OST0000 the customer also used the patched e2fsprogs and collected all output.

The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB)

Oz Rentas (Inactive) added a comment - 28/Apr/14 10:44 PM The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem. . For OST pfs2dat2-OST0000 the customer also used the patched e2fsprogs and collected all output. The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB)

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Oz Rentas (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 17/Jan/14 3:35 PM

Updated:: 02/Feb/17 5:32 PM

Resolved:: 02/Feb/17 5:32 PM