[LU-1584] error set quota fs limit Created: 29/Jun/12  Updated: 18/Aug/12  Resolved: 18/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

lustre-1.8.7-wc1 RHEL5


Severity: 3
Rank (Obsolete): 6369

 Description   

At an our customer site, an group hit quota limit regardless that group was not set quota limitation yet. (this group's quota size is 160TB, but that group exceeded quota limit around 150TB) In this case, the following messages showed up on an OSS. Is this related to LU-935?

Jun 29 14:58:47 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:58:47 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 19 previous similar messages
Jun 29 14:58:47 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:58:47 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 542 previous similar messages
Jun 29 14:58:48 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:58:48 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 1774 previous similar messages
Jun 29 14:58:49 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:58:49 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 4177 previous similar messages
Jun 29 14:58:51 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:58:51 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 8905 previous similar messages
Jun 29 14:58:55 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:58:55 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 17831 previous similar messages
Jun 29 14:59:03 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:59:03 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 42377 previous similar messages
Jun 29 14:59:19 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34)
Jun 29 14:59:19 nos031i kernel: LustreError: 8322:0:(quota_context.c:685:dqacq_completion()) Skipped 86190 previous similar messages


 Comments   
Comment by Bob Glossman (Inactive) [ 29/Jun/12 ]

Could you try upgrading to 1.8.8-wc1? It may solve your problem. There have been a number of fixes in quota code including the one from LU-935.

Comment by Shuichi Ihara (Inactive) [ 29/Jun/12 ]

Bob,
Yes, we want, but before upgrade we also wanted to make sure this is exactly same problem of LU-935.
Until upgrading, the following workaround would be help to avoid this issue that described on LU-935.

# lctl set_param lquota.*.quota_switch_qs=0
Comment by Bob Glossman (Inactive) [ 29/Jun/12 ]

By the way, where did the 1.8.7 release you are running come from, Whamcloud directly or some other supplier? Can you upgrade to a Whamcloud release or is that path blocked for you?

Comment by Johann Lombardi (Inactive) [ 29/Jun/12 ]

#define ERANGE 34 /* Math result not representable */

Looks similar to the problem reported on wc-discuss:
https://groups.google.com/a/whamcloud.com/group/wc-discuss/browse_thread/thread/91b5ceae1a663bad/01a84d777079cc46?utoken=8NerwC0AAADp76oaNbE87RxLb3AWLsFF1dK1zSSxgp828G_EjM7-wrBMBmDPXU48RC1TYmZ5gaU#01a84d777079cc46

Comment by Shuichi Ihara (Inactive) [ 29/Jun/12 ]

Johann,
Originally, we hit a quota prolbem (LU-1438). We did setquota zero to all groups, but LU-1438 still can't be fixed yet. Niu is investigating this and grateful advices us.
Two days ago, we hit "quota exceeded" for an group, but this quota's limit is more high and shouldn't be exceeded size.

Yesterday, we did setquota zero to this group, but today hit "quota exceeded" again for same group and showed up this messages on an OSS.

Comment by Peter Jones [ 30/Jun/12 ]

Niu

Are you able to help with this one?

Peter

PS/ Added Johann as a watcher so that he can see the feedback from Ihara

Comment by Niu Yawei (Inactive) [ 02/Jul/12 ]

could you run 'lfs quota -v -u $group_id $fsname' and post the output here? I don't see why OST still acquire quota from master after the limit is cleared.

And the setquota on local fs got ERANAGE, it looks like the problem the Johann mentioned, could you verify what version of e2fsprogs is using on servers? Thanks.

Comment by Shuichi Ihara (Inactive) [ 02/Jul/12 ]

it seems that quota was disabled after this issue was showed up. let me try to get "lfs quota -v -g ..".
btw, e2fsprogs-1.41.90.wc3 is running on here.

Comment by Niu Yawei (Inactive) [ 02/Jul/12 ]

The similar problem reported in wc-discuss is also reproduced by e2fsprogs-1.41.90.wc3, but I don't see how e2fsprogs can affect the quota files. BTW, could you also check the 'quota_type' for all servers? (/proc/fs/lustre/obdfilter/*/quota_type). Thanks.

Comment by Johann Lombardi (Inactive) [ 03/Jul/12 ]

Niu, could you please summarize the steps you used to reproduce the problem? Could you please also attach the output of e2fsck? TIA

Comment by Niu Yawei (Inactive) [ 03/Jul/12 ]

I didn't reporduce it, I just searched the wc-disucss.

Comment by Shuichi Ihara (Inactive) [ 03/Jul/12 ]

as far as I saw wc-discuss, finally he reformatted the filesystem with e2fsprogs-1.41.90.wc4 and confirmed the problem is gone. But, for existing filesystem, how can we fix this? just upgrade e2fsprogs to 1.41.90.wc4? the larger quota limit (175921860444160) doesn't matter?

Comment by Niu Yawei (Inactive) [ 03/Jul/12 ]

I didn't think the problem is caused by e2fsprogs-1.41.90.wc3, and I tried it in my local test environment, it can't be reproduced.

The 160TB limit should be an valid limit for the v2 qutoa file. Could you paste the output of "quota_type" & "lfs quota -v -g ..." here? Thanks.

Comment by Shuichi Ihara (Inactive) [ 04/Jul/12 ]

Niu,

Uploaded on uploads/LU-1584.

we found a related problem.

quota_type was set as "ug1" even if we set "ug2" during the format time.
We are setting "ost.quota_type=ug2" when we formatted OST/MDT.

Here is quick reproducer what I said.

# mkfs.lustre --reformat --mgs --mdt --param mdt.quota_type=ug2 /dev/sda1
# mkfs.lustre --reformat --mgsnode=s01i@o2ib --ost --param ost.quota_type=ug2 /dev/sda2
# mkfs.lustre --reformat --mgsnode=s01i@o2ib --ost --param ost.quota_type=ug3 /dev/sda3
# mkfs.lustre --reformat --mgsnode=s01i@o2ib --ost /dev/sda4

# for i in `seq 1 4`; do mount -t lustre /dev/sda$i /mnt/lustre/sda$i; done
# lctl get_param lquota.*.quota_type 
lquota.lustre-MDT0000.quota_type=2
lquota.lustre-OST0000.quota_type=1
lquota.lustre-OST0001.quota_type=3
lquota.lustre-OST0002.quota_type=3

# tunefs.lustre --dryrun /dev/sda2 
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     lustre-OST0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.100.150@o2ib ost.quota_type=ug2


   Permanent disk data:
Target:     lustre-OST0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.100.150@o2ib ost.quota_type=ug2

exiting before disk write.

OST0000's (/dev/sda2) quota_type is ug1 even it's formatted with ug2 and parameter is set on OST.

Just try removing ost.quota_type=ug2, but it's still ug1.

# umount /dev/sda2
# tunefs.lustre --erase-param /dev/sda2
# tunefs.lustre --mgsnode=192.168.100.150@o2ib /dev/sda2

# lctl get_param lquota.*.quota_type 
lquota.lustre-MDT0000.quota_type=2
lquota.lustre-OST0000.quota_type=1
lquota.lustre-OST0001.quota_type=3
lquota.lustre-OST0002.quota_type=3

Set ost.quota_type=ug3 and confirmed it's set to ug3 correctly

# tunefs.lustre --erase-param /dev/sda2
# tunefs.lustre --mgsnode=192.168.100.150@o2ib --param=ost.quota_type=ug3 /dev/sda2

# lctl get_param lquota.*.quota_type 
lquota.lustre-MDT0000.quota_type=2
lquota.lustre-OST0000.quota_type=3
lquota.lustre-OST0001.quota_type=3
lquota.lustre-OST0002.quota_type=3

MDS was no problem.

If we set ug2 to MDT, it's running with ug2 and if we set ug3, it works with ug3 as well.

# mkfs.lustre --reformat --mgs --mdt /dev/sda1
# mount -t lustre /dev/sda1 /mnt/lustre/sda1
# lctl get_param lquota.*.quota_type
lquota.lustre-MDT0000.quota_type=3
Comment by Niu Yawei (Inactive) [ 04/Jul/12 ]

Thanks, Ihara

The quota_type of OST is 1 (which means 32bit quota limit on OSTs), so 160T has exceeded the limit range, it's not supported.

Following is the comment for quuota_type:

 * MDS: u for user quotas (administrative+operational) turned on,
 *      g for group quotas (administrative+operational) turned on,
 *      1 for 32-bit operational quotas and 32-bit administrative quotas,
 *      2 for 32-bit operational quotas and 64-bit administrative quotas,
 *      3 for 64-bit operational quotas and 64-bit administrative quotas
 * OST: u for user quotas (operational) turned on,
 *      g for group quotas (operational) turned on,
 *      1 for 32-bit local operational quotas,
 *      2 for 32-bit local operational quotas,
 *      3 for 64-bit local operational quotas,

So, if you want 64bit local quota limit, you should specifiy the quota_type as 3. No matter if you specify 1 or 2 to OST quota_type, OST will use 32bit quota limit, and the proc file always dispaly 1 for OSTs.

If you want change your current MDS/OSTs to 64bit quota limit, you need to specify 3 for both MDS & OSTs, then run 'lfs quotacheck' to regenerate the quota files. (I suppose your kernel support 64 bit quota limit).

Comment by Shuichi Ihara (Inactive) [ 04/Jul/12 ]

Thanks Niu for confirmation.

So, what is "operations qutoas" and "administrative quotas", btw?

4TB is quota limitation per OST if quota if quota_type is 1 or 2.
The total 42 OSTs are running on here, 168TB is user and group quota limitation on the entire filesystem.
We have been seeing this problem once a group usage per OST is going to close of 4TB and achieved.

To enable 64bit quota on the production system, can the following changes work without restart?

# lctl set_param lquota.*.quota_type=3
# lfs quotacheck <fs>

For permanent fix, need to rewrite ost.quota_type=ug3 to OSTs with tunfs.lustre if we have an chance to stop the OSTs.

Comment by Niu Yawei (Inactive) [ 04/Jul/12 ]

So, what is "operations qutoas" and "administrative quotas", btw?

Operational quotas is the quota file of the local fs for each OST/MDS, which is used for storing usage & local limit for each target. Administrative quotas is the cluster wide quota file, which is used to store quota limit (clustre wide), the administrative quota file is located on MDS.

To enable 64bit quota on the production system, can the following changes work without restart?

Yes. Actually, you can use "lctl conf_param $MDTNAME.mdt.quota_type=ug3" & "lctl conf_param $OSTNAME.ost.quota_type=ug3" to write it in the config log permanently.

Comment by Shuichi Ihara (Inactive) [ 04/Jul/12 ]

Operational quotas is the quota file of the local fs for each OST/MDS, which is used for storing usage & local limit for each target. Administrative quotas is the cluster wide quota file, which is used to store quota limit (clustre wide), the administrative quota file is located on MDS.

OK, that makes perfect sense.

Yes. Actually, you can use "lctl conf_param $MDTNAME.mdt.quota_type=ug3" & "lctl conf_param $OSTNAME.ost.quota_type=ug3" to write it in the config log permanently.

Yes, that also worked on my test system and I've confirmed quota_type is set to ug3 permanently. if we see params by tunefs.lustre, the params of OST is not updated though (still ug2). sometimes confused. anyway, conf_param can be fixed.

Comment by Shuichi Ihara (Inactive) [ 09/Jul/12 ]

Niu,

# lctl set_param lquota.*.quota_type=3
# lfs quotacheck <fs>

It doesn't help to convert from ug2 to ug3. "lctl conf_param $MDTNAME.mdt.quota_type=ug3" & "lctl conf_param $OSTNAME.ost.quota_type=ug3" and restart OST/MDT, doesn't help either. We still got quota exceeded when an user writes 4TB data to single OST even quota_type=ug3 is set.

In order to convert from v2 to v3 and support 64bit completely, it needed umount MDT/OSTs and mount them with ldiskfs, and remove lquota.

{user,group} and lquota_v2.{user,group}

files, then restart all MDT/OST and 'lfs quotacheck', again...

Comment by Niu Yawei (Inactive) [ 09/Jul/12 ]

hmm, set the quota_type as "ug3" then run 'lfs quotacheck" doesn't help?

"lctl set_param lquota..quota_type=3" isn't correct, because it doesn't enable user or group quota, quotacheck will not recreate any quota files. I think it should be "lctl set_param lquota..quota_type=ug3", then run quotacheck. Sorry I didn't find this trap before.

Thanks for your effort on this, and I'm glad to hear that you've convert the quota files successfully.

Comment by Shuichi Ihara (Inactive) [ 18/Aug/12 ]

Originally, this is one of configuration problem, but finally, the root cause of this problem was LU-1720. So, please close this ticket, anyway. Thanks!

Comment by Peter Jones [ 18/Aug/12 ]

ok thanks Ihara!

Generated at Sat Feb 10 01:17:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.