[LU-1720] Quota doesn't work over 4TB on single OST Created: 08/Aug/12 Updated: 24/Nov/17 Resolved: 01/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8 |
| Fix Version/s: | Lustre 2.1.4, Lustre 1.8.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS5.8 Lustre-1.8.8-wc1 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 2 | ||||
| Rank (Obsolete): | 4061 | ||||
| Description |
|
We set quota "ug3" to all OSTs and MDT, then an also set 5TB quota limitation to a user. But, if user1 writes files to single OST, it exceeds quota limitation when total file size gets 4TB. # lfs quota -v -u user1 /lustre/
Disk quotas for user user1 (uid 1000):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/ 4295057504 0 5368709120 - 13 0 0 -
lustre-MDT0000_UUID
4 - 1024 - 13 - 0 -
lustre-OST0000_UUID
0 - 1024 - - - - -
lustre-OST0001_UUID
4295057500* - 4294959104 - - - - -
lustre-OST0002_UUID
0 - 1024 - - - - -
lustre-OST0003_UUID
0 - 1024 - - - - -
..
..
# lctl get_param lquota.*.quota_type lquota.lustre-OST0001.quota_type=ug3 lquota.lustre-OST0004.quota_type=ug3 lquota.lustre-OST0008.quota_type=ug3 lquota.lustre-OST000c.quota_type=ug3 lquota.lustre-OST0011.quota_type=ug3 lquota.lustre-OST0015.quota_type=ug3 lquota.lustre-OST0019.quota_type=ug3 lquota.lustre-OST001d.quota_type=ug3 lquota.lustre-OST0021.quota_type=ug3 lquota.lustre-OST0025.quota_type=ug3 lquota.lustre-OST0028.quota_type=ug3 lquota.lustre-OST002d.quota_type=ug3 lquota.lustre-OST0031.quota_type=ug3 lquota.lustre-OST0035.quota_type=ug3 lquota.lustre-OST0039.quota_type=ug3 # lctl get_param lquota.*.quota_type lquota.lustre-MDT0000.quota_type=ug3 |
| Comments |
| Comment by Niu Yawei (Inactive) [ 08/Aug/12 ] |
|
Hi, Ihara Could you collect the messages on OSTs? I'm afraid that the quota file for local fs (operational quota file) is not coverted to 64bit yet, just like |
| Comment by Shuichi Ihara (Inactive) [ 08/Aug/12 ] |
|
Hi Niu, I will send you messages latter (system was shutdown, will bootup soon), but this is completely new test system. The filesystem is formatted with {ost,mdt}.quota_type=ug3. |
| Comment by Shuichi Ihara (Inactive) [ 08/Aug/12 ] |
|
Hi Niu, tested again, it's very simple configuraiton. .quota_type=ug3, but it hits again when quota is exceeded by 4TB. # lfs quotacheck -ug /lustre/
# lfs setquota -B 5368709120 -u user1 /lustre
# su - user1
user1 writes files 10 x 450GB files to /lustre
dd: writing `/lustre/quota_test/file-10': Disk quota exceeded
dd: closing output file `/lustre/quota_test/file-10': Input/output error
$ lfs quota -v -u user1 /lustre/
Disk quotas for user user1 (uid 1000):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/ 4294935256 0 5368709120 - 13 0 0 -
lustre-MDT0000_UUID
4 - 1024 - 13 - 0 -
lustre-OST0000_UUID
4294935252* - 4294933504 - - - - -
OSS's messages when quota is exceeded. Aug 8 23:55:51 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34) Aug 8 23:55:51 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34) Aug 8 23:55:51 s02 kernel: Lustre: 19756:0:(quota_interface.c:491:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0) Aug 8 23:55:52 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34) Aug 8 23:55:52 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) Skipped 2473 previous similar messages Aug 8 23:55:53 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34) Aug 8 23:55:53 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) Skipped 1814 previous similar messages Aug 8 23:55:55 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) error set quota fs limit! (rc:-34) Aug 8 23:55:55 s02 kernel: LustreError: 18960:0:(quota_context.c:685:dqacq_completion()) Skipped 3868 previous similar messages |
| Comment by Niu Yawei (Inactive) [ 08/Aug/12 ] |
|
Hi, Ihara Which looks like the same problem reported in the lustre-discuss, and finally, when they format the filesystem with e2fsprogs-1.41.90.wc4, the problem is gone. (they can reproduce the problem with e2fsprogs-1.41.90.wc3). Could you check the e2fsprogs version on server? Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 08/Aug/12 ] |
|
Hi Niu, e2fsprogs-1.42.3.wc1 is installed and the filesystem is formated with it. |
| Comment by Niu Yawei (Inactive) [ 08/Aug/12 ] |
|
It's too bad that the error message only report an error code, could you apply this patch (which print more information) and enable D_QUOTA while running tests? Thanks. |
| Comment by Niu Yawei (Inactive) [ 08/Aug/12 ] |
|
print more information when set local limit failed. |
| Comment by Shuichi Ihara (Inactive) [ 08/Aug/12 ] |
|
Niu, could you plesae check patch again? it seems to be failing with patch. /usr/src/lustre-1.8.8/lustre/quota/quota_context.c: In function 'dqacq_completion': |
| Comment by Niu Yawei (Inactive) [ 08/Aug/12 ] |
|
fix the complile error. |
| Comment by Niu Yawei (Inactive) [ 08/Aug/12 ] |
|
sorry, please use the updated one. |
| Comment by Shuichi Ihara (Inactive) [ 09/Aug/12 ] |
|
debug log with enabled D_QUOTA |
| Comment by Shuichi Ihara (Inactive) [ 09/Aug/12 ] |
|
syslog on OSS after patches applied. Aug 9 13:57:37 s02 kernel: LustreError: 8891:0:(quota_context.c:691:dqacq_completion()) error set quota fs limit! rc:-34, count:1024, hardlimit:4294967296 isblk:b Aug 9 13:57:37 s02 kernel: LustreError: 8891:0:(quota_context.c:691:dqacq_completion()) Skipped 115657 previous similar messages Aug 9 13:57:37 s02 kernel: Lustre: 9294:0:(quota_interface.c:475:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0) |
| Comment by Niu Yawei (Inactive) [ 09/Aug/12 ] |
|
Looks the 'hardlimit' and 'count' values are sane, I suspect that the local quota file was created as 32bit somehow... Could you just mount the ost device as ldiskfs and check the quota file name on it? The name should be "lquota.user" & "lquota.group" for 32bit, or "lquota_v2.user" & "lquota_v2.group" for 64bit. Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 09/Aug/12 ] |
|
Niu, existed. That was odd and filed this prolbem on here. # mount -t ldiskfs /dev/mapper/LUN00 /mnt/lustre/LUN00/ # ls /mnt/lustre/LUN00/ CONFIGS O health_check last_rcvd lost+found lquota_v2.group lquota_v2.user |
| Comment by Shuichi Ihara (Inactive) [ 09/Aug/12 ] |
|
Tested with 1.8.7, but still hit same probolem. |
| Comment by Shuichi Ihara (Inactive) [ 09/Aug/12 ] |
|
I have tested with lustre-2.1.2 on RHEL5, but we hit same quota limitation by 4TB. |
| Comment by Shuichi Ihara (Inactive) [ 10/Aug/12 ] |
|
This is reproducer of this problem. # MDS
# mkfs.lustre --reformat --mgs --mdt --param mdt.quota_type=ug3 /dev/sdb1
# mount -t lustre /dev/sdb1 /mnt/lustre/MDT
# lctl get_param lquota.*.quota_type
lquota.mdd_obd-lustre-MDT0000.quota_type=ug3
# OSS
# mkfs.lustre --reformat --ost --mgsnode=192.168.100.129@o2ib --param ost.quota_type=ug3 /dev/mapper/LUN59
# mount -t lustre /dev/mapper/LUN59 /mnt/lustre/LUN59
# lctl get_param lquota.*.quota_type
lquota.lustre-OST0000.quota_type=ug3
# Client
# mount -t lustre 192.168.100.129@o2ib:/lustre /lustre
# nohup /tmp/reproducer.sh &
# lfs quota -u user1 -v /lustre/
Disk quotas for user user1 (uid 1000):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/ 4295204904 0 5368709120 - 100 0 0 -
lustre-MDT0000_UUID
0 - 1024 - 100 - 0 -
lustre-OST0000_UUID
4295204904* - 4294966272
|
| Comment by Niu Yawei (Inactive) [ 10/Aug/12 ] |
|
Thanks for your update, Ihara! Finally, I found the reason: the kernel quota-large-limits-rhel5.patch (which makes kernel do_set_dqblk() support 64bits) is mis-updated in ba5dd769f66194a80920cf93d6014c78729efaae ( Yangshen, could you take a look on this, and fix the patch? Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 10/Aug/12 ] |
|
Niu, thanks for analysis. what do you mean mis-updated? |
| Comment by Niu Yawei (Inactive) [ 10/Aug/12 ] |
|
The patch updated incorrectly when supporting new kernel. Please check: http://review.whamcloud.com/#change,3599 |
| Comment by Yang Sheng [ 10/Aug/12 ] |
|
patch for b2_1: http://review.whamcloud.com/3600 |
| Comment by Yang Sheng [ 10/Aug/12 ] |
|
patch for b1_8: http://review.whamcloud.com/3599 |
| Comment by Shuichi Ihara (Inactive) [ 19/Aug/12 ] |
|
I reproduced this problem with lustre-1.8.8 on RHEL5 and confirmed it's fixed by kernel patch on |
| Comment by Shuichi Ihara (Inactive) [ 23/Aug/12 ] |
|
Hi, I've confirmed the patch |
| Comment by Peter Jones [ 31/Aug/12 ] |
|
Landed to b1_8. Still needs to land to 2.x branches |
| Comment by Yang Sheng [ 01/Nov/12 ] |
|
Patch landed to all branch. Close bug. |