[LU-4504] User quota problem after Lustre upgrade (2.1.4 to 2.4.1) Created: 17/Jan/14 Updated: 02/Feb/17 Resolved: 02/Feb/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oz Rentas | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 12320 | ||||||||||||||||||||
| Description |
|
After the upgrade at KIT, the user quotas are not reported correctly. The quota for root seems to be OK. The user quota is 0 on all OSTs, which is wrong. e.g for root: [root@pfs2n13 ~]# lfs quota -u root -v /lustre/pfs2wor2/client/
for a user |
| Comments |
| Comment by Peter Jones [ 17/Jan/14 ] |
|
Niu Can you please help with this issue? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 20/Jan/14 ] |
|
What's the e2fsprogs version? Looks like it's dup of 1. upgrade your e2fsprogs to the latest version which have the fix of |
| Comment by Oz Rentas [ 21/Jan/14 ] |
|
The e2fsprogs RPMs installed are: The procedure outlined in step #2 has been performed - we did disable/enable quota for all the mdt and ost devices with: Same result. Any ideas on what to try next, or any debugging that can be done? Thanks. |
| Comment by Niu Yawei (Inactive) [ 22/Jan/14 ] |
|
Is this the only user who has incorrect quota usage or all other users' usage are incorrect as well? Could you try to write as the user to see if the newly written bytes can be accounted? And I want to get confirmed that the e2fsprogs installed is the latest build from the build.whamcloud.com? I'm not sure if the older 1.42.7-wc2 include the patch. |
| Comment by Oz Rentas [ 23/Jan/14 ] |
|
The problem affects at least two users on one file system, and another two users on another separate lustre file system. Today, the customer discovered the issue on another lustre fs with other users: root@ic2n992:/pfs/data2/home# lfs quota -u ho_anfuchs . root@ic2n992:/pfs/data2/home# lfs quota -u kn_pop164377 . The efsprogs was downloaded from whamcloud, with the following patch applied: http://git.whamcloud.com/?p=tools/e2fsprogs.git;a=commit;h=470ca046b1 I will have the customer do a write operation to see if the accounting changes. |
| Comment by Oz Rentas [ 24/Jan/14 ] |
|
Newly written bytes are accounted. This works for all OSTs. |
| Comment by Johann Lombardi (Inactive) [ 31/Jan/14 ] |
|
I wonder whether some of the OST objects belonging to those users did not get the proper UID/GID. Could you please check on one of the OST reporting the wrong usage if all the objects have the correct UID/GID? You can do it by unmounting the OST, mounting it with -t ldiskfs and run a find command to compute usage of the user and compare it with what is reported by lfs quota. Thanks in advance. |
| Comment by Oz Rentas [ 04/Feb/14 ] |
|
Thanks for the response. Unfortunately, the FS is in production and unmounting anything can not be easily done. Are there any other options for gathering the information you're asking for? Please advise. |
| Comment by Niu Yawei (Inactive) [ 07/Feb/14 ] |
|
Did the customer ever see the problem before upgrading? We'd make sure first that these users don't have the same problem before upgranding. |
| Comment by Oz Rentas [ 11/Feb/14 ] |
|
Response from customer: In addition, I also found a response to the question of Johann Lombardi from 31/Jan/14 9:41 PM without unmounting the OST by following chapter |
| Comment by Niu Yawei (Inactive) [ 11/Feb/14 ] |
Looks you only checked one file, it's better to run a script to check all files belong to user "es_asaramet". |
| Comment by Oz Rentas [ 12/Feb/14 ] |
|
From customer: pfs2wor2_check_quotas_bad_user_20140204.txt the command "lfs quota -v -u es_asaramet" shows 0 quota usage for pfs2wor2-OST0007. We have found a non empty file (refinementSurfaces.o) which is located on OST0007. Hence the 0 quota usage of lfs quota -v for this user is wrong. And since the file really belongs to the user with UID 900044 (es_asaramet) in the underlying ldiskfs, the assumption of Johann Lombardi that the UID/GID in the underlying ldiskfs might be wrong cannot be the reason for this 0 quota usage on pfs2wor2-OST0007. More information from another ticket the customer just today opened regarding quota: last December DDN upgraded several Lustre file systems to Lustre 2.4. I was just reading the Lustre manual and checked the following command:
BTW: I tried to check the MGS parameters according to a description of |
| Comment by Niu Yawei (Inactive) [ 13/Feb/14 ] |
I see, thanks. Could you check the "/proc/fs/lustre/osd-ldiskfs/pfs2wor2-OST0007/quota_slave/acct_user" and post the result here?
The info you checked is for metadata, you'd check the quota_slave.info on OST to see if data quota is enabled, and enabled it by 'lctl conf_param $FSNAME.quota.ost=ug". |
| Comment by Oz Rentas [ 20/Feb/14 ] |
|
The customer checked it on the MDS because the example in chapter 21.2 of the Lustre Operations Manual also displays this information for the MDT. Is this example of the manual wrong? The OSTs also showed quota enabled: none The customer checked and verified they have the same problem on their test system. After "lctl conf_param pfscdat1.quota.ost=ug" on the MGS this indeed fixed the problem. It would also be good to get a response why the llog_reader hangs permanently for some file systems while we try to check MGS parameters? |
| Comment by Niu Yawei (Inactive) [ 20/Feb/14 ] |
The command should be run on MDS or OSS to check the quota status of each MDT & OST. The example in manual displayed only the output of MDS.
It looks like |
| Comment by Oz Rentas [ 20/Feb/14 ] |
|
> The command should be run on MDS or OSS to check the quota status of each MDT & OST. The example in manual displayed only the output of MDS. Thank you for this clarification! > It looks like Yes, the file is indeed empty. The displayed error message was misleading. |
| Comment by Oz Rentas [ 20/Feb/14 ] |
|
> Niu Yawei added a comment - 13/Feb/14 8:45 AM > I see, thanks. Could you check the "/proc/fs/lustre/osd-ldiskfs/pfs2wor2-OST0007/quota_slave/acct_user" and post the result here? A file with the requested information is attached, pfs2wor2-OST0007_acct_user_20140213.txt. It is a bit strange that pretty few user IDs appear and that user ID |
| Comment by Oz Rentas [ 25/Feb/14 ] |
|
Any updates? Please let us know if you need any additional information or if there is any additional debugging we could be doing. We would like to close this one out fairly soon. Thanks! |
| Comment by Niu Yawei (Inactive) [ 26/Feb/14 ] |
|
I cooked a debug patch for e2fsprogs: http://review.whamcloud.com/#/c/9397 and it's built on http://build.whamcloud.com/job/e2fsprogs-reviews/200/ . Oz, could you install the e2fsprogs from http://build.whamcloud.com/job/e2fsprogs-reviews/200/ and collect the debug information while disabling/enabling quota for the OST0007 device? (by tune2fs -O ^quota & tune2fs -O quota command). Thanks a lot. |
| Comment by Oz Rentas [ 12/Mar/14 ] |
|
The debug e2fsprogs has been installed on the OSS, and quota has been disabled / enabled on the affected OST. The output from running the 'tune2fs |
| Comment by Niu Yawei (Inactive) [ 13/Mar/14 ] |
|
Hi, Oz [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=900044 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=34, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=36, depth=3^M [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=900044 off=34168, info->dqi_entry_size=72^M Looks the accounting information for 900044 is written back, could you verify that if the space accounting for 900044 on this OST is fixed? Thank you. |
| Comment by Oz Rentas [ 14/Mar/14 ] |
|
It appears the space accounting for 900044 on OST0007 is fixed. For details see attached file pfs2wor2-OST0007_acct_user_20140313.txt. The customer performed further investigations on all 4 of the affected file systems. The details can be seen in the attached file pfs2wor2_check_quotas_bad_user_20140313.txt. Results: Questions: |
| Comment by Niu Yawei (Inactive) [ 14/Mar/14 ] |
Good news, thank you.
It could probably because the e2fsprogs on OST0007 was not uptodate, could you verify that if the e2fsprogs on OST0007(or other problematic OSTs) was same as others'? |
| Comment by Oz Rentas [ 20/Mar/14 ] |
|
1. The software is usually installed by pdsh, i.e. it is the same on all servers. Since we see the same problem on all 4 file systems this is a general problem, i.e. not something which happened once by chance. I just had a look at the upgrade documentation which was sent by the vendor field engineer. He wrote (translated): tunefs is problematic and does not always work. Maybe Sven can comment what exactly was meant here and if this could be the reason. Anyway I wonder how we could clearly repair the problem for the remaining file systems during the next maintenance. I see 2 problems: |
| Comment by Oz Rentas [ 20/Mar/14 ] |
|
Another interesting thing to note is both user quotas and group quotas are used, but there was not a problem with group quotas. |
| Comment by Niu Yawei (Inactive) [ 21/Mar/14 ] |
Comparing the two versions of accounting information of OST0007 (before and after executing tune2fs), we can see lots of user accounting was fixed, so I think many users were having accounting problems, but not discovered. Maybe it's same to other OSTs on the same OSS? Another possibility is that customer just missed tune2fs on OST0007?
I think first we'd make sure we are using the correct e2fsprogs. To verify if the accounting information is fixed, you can check the "acct_user/group" in proc file.
Disable/re-enable quota is just for triggering a quotacheck, you can verify the accounting information in proc file.
I think it probably because it was just not detected. |
| Comment by Oz Rentas [ 27/Mar/14 ] |
|
The customer doesn't believe that tune2fs was missed on some OSTs. Either this is a general Lustre problem or it is a problem with the vendors tunefs wrapper script. Concerning this wrapper script, the field engineer sent the following: > I just had a look at the upgrade documentation which was sent by the vendor: tunefs is problematic > and does not always work. I have not completely isolated the problem yet. I think it is the EXAScaler tunefs wrapper script es_tunefs. It does a second tunefs to set the MMP timeout, which may be causing the problem. Since I had noticed a strange difference between user and group quotas I wrote a perl script which checks the sum of "acct_user/group" Here are the results: Do you have any comments or ideas about the possible reason for the problem? |
| Comment by Niu Yawei (Inactive) [ 28/Mar/14 ] |
This sounds same problem as OST0007, and OST0007 can be fixed by re-run "tune2fs -O quota" (with uptodate e2fsprogs), can these problematic OSTs be fixed in same way or not? |
| Comment by Oz Rentas [ 28/Apr/14 ] |
|
The customer ran through the "tune2fs -O quota" procedure last week during their scheduled downtime. However, did this not resolve the problem. The log file with the additional details can be downloaded from "http://ddntsr.com/ftp/2014-04-28-SR28763_tunefs_20140424.txt.gz" (69MB) |
| Comment by Niu Yawei (Inactive) [ 29/Apr/14 ] |
|
Oz, which uid/gid has problem on pfs2dat2-OST0000? |
| Comment by Oz Rentas [ 30/Apr/14 ] |
|
we do not know which uid/gid has wrong quotas on pfs2dat2-OST0000. In detail, before the maintenance and after clients were unmounted the script reported this for pfs2dat2-OST0000: After servers were upgraded to Lustre 2.4.3 and quotas were re-enabled (with normal e2fsprogs): After just re-enabling quotas again for pfs2dat2-OST0000 (with normal e2fsprogs): After re-enabling quotas again for pfs2dat2-OST0000 (with patched e2fsprogs): It is also interesting that only one OST of the pfs2dat2 has the same value for users and groups. For the pfs2wor2 file system most OSTs show the same values. pfs2dat2 has 219 million files and stripe count 1, Is further investigation possible with this information and with the provided tune2fs logs? If not, the customer will develop another script to find out uids/gids with wrong quotas on pfs2dat2-OST0000. Since this makes some effort I just wanted to check if this is really needed/helpful. |
| Comment by Niu Yawei (Inactive) [ 04/May/14 ] |
Orphan cleanup may removed some files.
I noticed the UID/GID on this system is very huge, some UIDs are larger than 2G. I think there could be some defect in the e2fsprogs which handle large ID incorrectly. For example: [DEBUG] quotaio.c:326:quota_file_create:: Creating quota ino=3, type=0^M [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=0, depth=3^M e2fsprogs is writing UID 2171114240 into quota file, and later on... [DEBUG] quotaio_tree.c:316:qtree_write_dquot:: writing ddquot 1: id=2171114240 off=0, info->dqi_entry_size=72^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=1, depth=0^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=2, depth=1^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=3, depth=2^M [DEBUG] quotaio_tree.c:253:do_insert_tree:: inserting in tree: treeblk=4, depth=3^M [ERROR] quotaio_tree.c:277:do_insert_tree:: Inserting already present quota entry (block 5).^M [DEBUG] quotaio_tree.c:330:qtree_write_dquot:: writing ddquot 2: id=2171114240 off=11543712, info->dqi_entry_size=72^M e2fsprogs tries to write some UID 2171114240 into quota file again. Looks the UID 2171114240 got duplicated in the memory dict. I'll investigate further to see what happened when inserting large id into memory dict.
Yes, no need to develop new script for now. I just want get confirmed from customer that they really have such large UID/GIDs. |
| Comment by Oz Rentas [ 05/May/14 ] |
|
Thanks Niu. Here is the response from the customer: We have pretty huge UIDs/GIDs. However, they are by far not as huge as reported. The largest UID is 901987 and the largest GID is 890006. |
| Comment by Niu Yawei (Inactive) [ 06/May/14 ] |
|
The huge UID/GIDs may caused by a lustre defect described in And looks there is a defect in e2fsprogs which could mess dict lookup when the difference of two keys greater than 2G. static int dict_uint_cmp(const void *a, const void *b) { unsigned int c, d; c = VOIDPTR_TO_UINT(a); d = VOIDPTR_TO_UINT(b); return c - d; } This function returns an unsigned int value in int type, and quota relies on this function to insert ids into dict on quotacheck. I think that's why we see dup ID on quotacheck. I'll cooke a patch to fix this soon. |
| Comment by Niu Yawei (Inactive) [ 06/May/14 ] |
| Comment by Oz Rentas [ 09/May/14 ] |
|
From the customer: It's good news that you found possible reasons for the problem. For the huge UID/GIDs caused by the lustre defect described in |
| Comment by Niu Yawei (Inactive) [ 09/May/14 ] |
Fix bad IDs on existing OST objects:
|
| Comment by John Fuchs-Chesney (Inactive) [ 05/Aug/14 ] |
|
Hello Oz, |
| Comment by Niu Yawei (Inactive) [ 06/Aug/14 ] |
|
I updated the way of how to fix bad ID for OST object (see my previous comment). Thanks. |
| Comment by Rajeshwaran Ganesan [ 12/Aug/14 ] |
|
Hello Niu, We dont think manually fixing the OST object is not a good idea. Since the filesystem have more than 100 million files. Thanks, |
| Comment by Niu Yawei (Inactive) [ 14/Aug/14 ] |
I can't think of any other good ways to fix the bad IDs. I think running a script instead of repeating the commands manually would be better? |
| Comment by Minh Diep [ 02/Feb/17 ] |
|
rganesan@ddn.com, orentas, have you resolved this issue? do you need anything else from this ticket? |
| Comment by Oz Rentas [ 02/Feb/17 ] |
|
Yes, a long time ago. Please close. Thanks! |
| Comment by Minh Diep [ 02/Feb/17 ] |
|
Thank you Sir! |