[LU-4505] invalid "Disk quota exceed" error Created: 17/Jan/14  Updated: 15/Jul/14  Resolved: 21/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: mn4

Issue Links:
Related
Severity: 3
Rank (Obsolete): 12323

 Description   

User is getting "Disk quota Exceeded" but has lots of quota available.

Uploaded following file to ftp site.
kferschw.debug.trace.OSS.gz
kferschw.debug.trace.mds

The following was writing to OST00000 stripe dir.
pfe21.kferschw 125> dd if=/dev/zero of=testfile2 bs=1M count=1000
dd: writing `testfile2': Disk quota exceeded

— STRACE OUTPUT----
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = -1 EDQUOT (Disk quota exceeded)

----------------------------------------------------------
pfe21.kferschw 124> lfs quota -v -u kferschw /nobackupp7
Disk quotas for user kferschw (uid 12262):
Filesystem kbytes quota limit grace files quota limit grace
/nobackupp7 386410032 530000000 1100000000 - 66753 75000 150000 -
nbp7-MDT0000_UUID
35788 - 0 - 66753 - 67496 -
nbp7-OST0000_UUID
3190116 - 3190124 - - - - -
nbp7-OST0001_UUID
5394136 - 5653208 - - - - -
nbp7-OST0002_UUID
4622304 - 4623328 - - - - -
nbp7-OST0003_UUID
4381244 - 4382336 - - - - -
nbp7-OST0004_UUID
5825544 - 5827592 - - - - -
nbp7-OST0005_UUID
4299436 - 4559532 - - - - -
nbp7-OST0006_UUID
3977892 - 4236964 - - - - -
nbp7-OST0007_UUID
4380728 - 4639800 - - - - -
nbp7-OST0008_UUID
4735676 - 4735944 - - - - -
nbp7-OST0009_UUID
4288472 - 4290520 - - - - -
nbp7-OST000a_UUID
3844268 - 3846316 - - - - -
nbp7-OST000b_UUID
4673420 - 5718924 - - - - -
nbp7-OST000c_UUID
4812004 - 5759204 - - - - -
nbp7-OST000d_UUID
4223056 - 5138516 - - - - -
nbp7-OST000e_UUID
5089004 - 5090340 - - - - -
nbp7-OST000f_UUID
4159372 - 4160396 - - - - -
nbp7-OST0010_UUID
5757832 - 6291456 - - - - -
nbp7-OST0011_UUID
4379564 - 4638636 - - - - -
nbp7-OST0012_UUID
3740432 - 3742480 - - - - -
nbp7-OST0013_UUID
4015232 - 4015444 - - - - -
nbp7-OST0014_UUID
4711956 - 4712980 - - - - -
nbp7-OST0015_UUID
4962784 - 4964832 - - - - -
nbp7-OST0016_UUID
4100436 - 4360532 - - - - -
nbp7-OST0017_UUID
4939728 - 5920724 - - - - -
nbp7-OST0018_UUID
4437648 - 4437968 - - - - -
nbp7-OST0019_UUID
4205352 - 4206656 - - - - -
nbp7-OST001a_UUID
3489284 - 3491208 - - - - -
nbp7-OST001b_UUID
3948264 - 3949288 - - - - -
nbp7-OST001c_UUID
4752300 - 5012396 - - - - -
nbp7-OST001d_UUID
3340532 - 4321524 - - - - -
nbp7-OST001e_UUID
3846352 - 3847376 - - - - -
nbp7-OST001f_UUID
3330368 - 3332416 - - - - -
nbp7-OST0020_UUID
4285776 - 4286800 - - - - -
nbp7-OST0021_UUID
4253720 - 4254744 - - - - -
nbp7-OST0022_UUID
3651620 - 3910692 - - - - -
nbp7-OST0023_UUID
4683028 - 4943124 - - - - -
nbp7-OST0024_UUID
4448032 - 4449056 - - - - -
nbp7-OST0025_UUID
3814340 - 3815364 - - - - -
nbp7-OST0026_UUID
4226044 - 4227596 - - - - -
nbp7-OST0027_UUID
5047608 - 5307704 - - - - -
nbp7-OST0028_UUID
4488580 - 4489604 - - - - -
nbp7-OST0029_UUID
4774184 - 4775208 - - - - -
nbp7-OST002a_UUID
3558388 - 3560436 - - - - -
nbp7-OST002b_UUID
29970680 - 30950652 - - - - -
nbp7-OST002c_UUID
3959396 - 3959444 - - - - -
nbp7-OST002d_UUID
4656420 - 4656488 - - - - -
nbp7-OST002e_UUID
4902936 - 4903960 - - - - -
nbp7-OST002f_UUID
2409460 - 3390452 - - - - -
nbp7-OST0030_UUID
4456620 - 4457704 - - - - -
nbp7-OST0031_UUID
4433440 - 4434684 - - - - -
nbp7-OST0032_UUID
5149052 - 5150076 - - - - -
nbp7-OST0033_UUID
4795664 - 5760276 - - - - -
nbp7-OST0034_UUID
4181592 - 4181772 - - - - -
nbp7-OST0035_UUID
5521136* - 5521136 - - - - -
nbp7-OST0036_UUID
4305644 - 4307692 - - - - -
nbp7-OST0037_UUID
3028780 - 4008752 - - - - -
nbp7-OST0038_UUID
4113972 - 4116020 - - - - -
nbp7-OST0039_UUID
3815152 - 3816176 - - - - -
nbp7-OST003a_UUID
4591096 - 4592120 - - - - -
nbp7-OST003b_UUID
3615712 - 3874784 - - - - -
nbp7-OST003c_UUID
4334768 - 4335792 - - - - -
nbp7-OST003d_UUID
4045872 - 4046780 - - - - -
nbp7-OST003e_UUID
4681704 - 4682996 - - - - -
nbp7-OST003f_UUID
4063240 - 4281356 - - - - -
nbp7-OST0040_UUID
5356572 - 5358620 - - - - -
nbp7-OST0041_UUID
4023020 - 4024044 - - - - -
nbp7-OST0042_UUID
3337968 - 3340016 - - - - -
nbp7-OST0043_UUID
2996308 - 3977304 - - - - -
nbp7-OST0044_UUID
4040072 - 4041112 - - - - -
nbp7-OST0045_UUID
6232336 - 6234384 - - - - -
nbp7-OST0046_UUID
3855372 - 3856396 - - - - -
nbp7-OST0047_UUID
3867100 - 4127196 - - - - -
nbp7-OST0048_UUID
3972944 - 3974192 - - - - -
nbp7-OST0049_UUID
4530920 - 4532968 - - - - -
nbp7-OST004a_UUID
3761008 - 3762032 - - - - -
nbp7-OST004b_UUID
3266156 - 4247152 - - - - -
nbp7-OST004c_UUID
5210620 - 5211644 - - - - -
nbp7-OST004d_UUID
4917344 - 4919392 - - - - -
nbp7-OST004e_UUID
4304916 - 4305948 - - - - -
nbp7-OST004f_UUID
3327960 - 4311012 - - - - -
nbp7-OST0050_UUID
4778340 - 4778672 - - - - -
nbp7-OST0051_UUID
5910632 - 5911580 - - - - -
nbp7-OST0052_UUID
4212080 - 4213104 - - - - -
nbp7-OST0053_UUID
2354184 - 2580492 - - - - -



 Comments   
Comment by Peter Jones [ 17/Jan/14 ]

Niu

Could you please look into this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 20/Jan/14 ]

What kind of operations did you do before this happen? Was the limit for the kferschw bumped to current value (1100000000) from a smaller one? Thanks.

Comment by Mahmoud Hanafi [ 20/Jan/14 ]

The user report this issue. The inability to write has been inconsistent. The user sometime was able to write 25GB file to the filesystem. When the files was deleted he couldn't write a much smaller file.

After I took the debug logs I set the user quota to Zero and then set it back. I ran into the same quota issue. Although I didn't change the (1100000000).

There are other users who have the same issue with quota on this filesystem. This is the only 2.4.x filesystem we have so far.

Comment by Niu Yawei (Inactive) [ 21/Jan/14 ]

hi, Mahmoud

Which means problem can be reproduced even when you reset the quota limit? Could you collect debug log on MDT and the OST with D_TRACE & D_QUOTA enabled when you reset limit (set limit to zero then set back)? Thanks.

Comment by Mahmoud Hanafi [ 21/Jan/14 ]

I will see I can reproduce it. Did look at the logs I uploaded

Comment by Mahmoud Hanafi [ 21/Jan/14 ]

i have uploaded the following debug trace files to the ftp site.
debug.lustre.service182.oss.gz
debug.lustre.service180.mds.gz
debug.lustre1.service182.oss.gz
debug.lustre1.service180.mds.gz

Comment by Niu Yawei (Inactive) [ 22/Jan/14 ]

Thank you, Mahmoud. The log shows that only the OST0000 has such problem, I suspect it because the edquot flag on OST0000 was set mistakenly by some sort of race.

Comment by Niu Yawei (Inactive) [ 22/Jan/14 ]

http://review.whamcloud.com/8954

Comment by Mahmoud Hanafi [ 22/Jan/14 ]

I had create a directory that was only 1 striped fixed on ost00000 to make easier to debug. I think that is why you only saw ost00000.

Comment by Jay Lan (Inactive) [ 22/Jan/14 ]

Hi Niu,

Is the extra check you removed in 8954 not needed at the first place? Could removing the check open door to a different race condition?

Comment by Niu Yawei (Inactive) [ 23/Jan/14 ]

Hi, Jay, I can't think of any other race for now, let's see how patch inspectors think.

Comment by Bob Glossman (Inactive) [ 19/Feb/14 ]

backport to b2_5:
http://review.whamcloud.com/9315

Comment by Peter Jones [ 21/Feb/14 ]

Landed for 2.5.1 and 2.6

Comment by Niu Yawei (Inactive) [ 15/Jul/14 ]

b2_4: http://review.whamcloud.com/11100

Generated at Sat Feb 10 01:43:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.