[LU-3964] inode overquota error although user is well below quota Created: 16/Sep/13  Updated: 29/Jul/14  Resolved: 29/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.4
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

client 2.1.5, server 2.1.4


Attachments: Text File lustre.d_quota.d_trace.lfsquota.out.last700000.gz     Text File lustre.d_quota.opens.out.last700000.gz     File lustre_debug.rpc.quot.out.r222i1n12.gz    
Severity: 3
Rank (Obsolete): 10542

 Description   

"SOME" users run into over quota when creating a lot (ie 400 tasks creating 400 files) using fopen. There perror return is "Disk quota exceeded" even though they are will below the inode quota.

Test Case:
The debug logs is 64MB please let me know how to upload the file. User ID is 30180. One thing I notices is for this users iunit=2 and itune=1. For other users it can be very large. Even after setting quota_least_iunit=512

Here is the proc values:
quota_boundary_factor=4
quota_btune_sz=67108850
quota_bunit_sz=134217728
quota_itune_sz=2560
quota_iunit_sz=5120
quota_least_bunit=1048576
quota_least_iunit=512
quota_qs_factor=2
quota_switch_qs=changing qunit size is enabled
quota_switch_seconds=300
quota_sync_blk=0
quota_type=u3

A modified version of mdsrate.c using fopen.
bash-3.2$ rm f*;mpiexec -n 400 /nobackupp1/mahmoudhanafiTesting_echiang/mdsrate.intel_mpt --verbose --create --nfiles 400
0: r133i1n5 starting at Mon Sep 16 15:11:19 2013
f396: Disk quota exceeded
rank 396: open(f396) error: Success
MPT: Global rank 396 is aborting with error code 1.
Process ID: 12249, Host: r134i2n2, Program: /nobackupp1/mah

bash-3.2$ lfs quota -v /nobackupp1
Disk quotas for user echiang (uid 30180):
Filesystem kbytes quota limit grace files quota limit grace
/nobackupp1 388163032 530000000 1100000000 - 390994 600000 750000 -
nbp1-MDT0000_UUID
71540 - 262144 - 390994 - 390996 -
nbp1-OST0000_UUID
3946924 - 4063232 - - - - -
nbp1-OST0001_UUID
3056352 - 3145728 - - - - -
nbp1-OST0002_UUID
3893468 - 4063232 - - - - -
nbp1-OST0003_UUID
3343592 - 3538944 - - - - -
nbp1-OST0004_UUID
3439124 - 3538944 - - - - -
nbp1-OST0005_UUID
3443716 - 3538944 - - - - -
nbp1-OST0006_UUID
2951164 - 3145728 - - - - -
nbp1-OST0007_UUID
2815988 - 2883584 - - - - -
nbp1-OST0008_UUID
3334720 - 3407872 - - - - -
nbp1-OST0009_UUID
3314664 - 3407872 - - - - -
nbp1-OST000a_UUID
3229116 - 3407872 - - - - -
nbp1-OST000b_UUID
3295576 - 3407872 - - - - -
nbp1-OST000c_UUID
3359160 - 3538944 - - - - -
nbp1-OST000d_UUID
3131648 - 3276800 - - - - -
nbp1-OST000e_UUID
2967292 - 3145728 - - - - -
nbp1-OST000f_UUID
3231296 - 3407872 - - - - -
nbp1-OST0010_UUID
4360532 - 4456448 - - - - -
nbp1-OST0011_UUID
2412480 - 2490368 - - - - -
nbp1-OST0012_UUID
3411052 - 3538944 - - - - -
nbp1-OST0013_UUID
3426328 - 3538944 - - - - -
nbp1-OST0014_UUID
3368980 - 3538944 - - - - -
nbp1-OST0015_UUID
2926928 - 3014656 - - - - -
nbp1-OST0016_UUID
3251712 - 3407872 - - - - -
nbp1-OST0017_UUID
2758180 - 2883584 - - - - -
nbp1-OST0018_UUID
2945264 - 3014656 - - - - -
nbp1-OST0019_UUID
2355904 - 2490368 - - - - -
nbp1-OST001a_UUID
2925488 - 3014656 - - - - -
nbp1-OST001b_UUID
3389104 - 3538944 - - - - -
nbp1-OST001c_UUID
3231104 - 3407872 - - - - -
nbp1-OST001d_UUID
3852040 - 3932160 - - - - -
nbp1-OST001e_UUID
3378656 - 3538944 - - - - -
nbp1-OST001f_UUID
2746128 - 2883584 - - - - -
nbp1-OST0020_UUID
3040204 - 3145728 - - - - -
nbp1-OST0021_UUID
2564516 - 2752512 - - - - -
nbp1-OST0022_UUID
3022252 - 3145728 - - - - -
nbp1-OST0023_UUID
3025024 - 3145728 - - - - -
nbp1-OST0024_UUID
3388188 - 3538944 - - - - -
nbp1-OST0025_UUID
3240412 - 3407872 - - - - -
nbp1-OST0026_UUID
3744020 - 3932160 - - - - -
nbp1-OST0027_UUID
3455960 - 3538944 - - - - -
nbp1-OST0028_UUID
3678528 - 3801088 - - - - -
nbp1-OST0029_UUID
3705640 - 3801088 - - - - -
nbp1-OST002a_UUID
3257948 - 3407872 - - - - -
nbp1-OST002b_UUID
3208432 - 3276800 - - - - -
nbp1-OST002c_UUID
3064224 - 3145728 - - - - -
nbp1-OST002d_UUID
2865980 - 3014656 - - - - -
nbp1-OST002e_UUID
3791268 - 3932160 - - - - -
nbp1-OST002f_UUID
3174436 - 3276800 - - - - -
nbp1-OST0030_UUID
2329128 - 2490368 - - - - -
nbp1-OST0031_UUID
3961672 - 4063232 - - - - -
nbp1-OST0032_UUID
3541660 - 3670016 - - - - -
nbp1-OST0033_UUID
3463088 - 3538944 - - - - -
nbp1-OST0034_UUID
2845520 - 3014656 - - - - -
nbp1-OST0035_UUID
2648728 - 2752512 - - - - -
nbp1-OST0036_UUID
3005508 - 3145728 - - - - -
nbp1-OST0037_UUID
3657768 - 3801088 - - - - -
nbp1-OST0038_UUID
3093848 - 3276800 - - - - -
nbp1-OST0039_UUID
2478220 - 2621440 - - - - -
nbp1-OST003a_UUID
3280152 - 3407872 - - - - -
nbp1-OST003b_UUID
2499008 - 2621440 - - - - -
nbp1-OST003c_UUID
3406280 - 3538944 - - - - -
nbp1-OST003d_UUID
3450564 - 3538944 - - - - -
nbp1-OST003e_UUID
3326960 - 3407872 - - - - -
nbp1-OST003f_UUID
2665224 - 2752512 - - - - -
nbp1-OST0040_UUID
2831504 - 3014656 - - - - -
nbp1-OST0041_UUID
3450520 - 3538944 - - - - -
nbp1-OST0042_UUID
3156436 - 3276800 - - - - -
nbp1-OST0043_UUID
2798676 - 2883584 - - - - -
nbp1-OST0044_UUID
3628844 - 3801088 - - - - -
nbp1-OST0045_UUID
3255928 - 3407872 - - - - -
nbp1-OST0046_UUID
3038004 - 3145728 - - - - -
nbp1-OST0047_UUID
3138496 - 3276800 - - - - -
nbp1-OST0048_UUID
3718224 - 3801088 - - - - -
nbp1-OST0049_UUID
2859828 - 3014656 - - - - -
nbp1-OST004a_UUID
3540212 - 3670016 - - - - -
nbp1-OST004b_UUID
3150716 - 3276800 - - - - -
nbp1-OST004c_UUID
3591992 - 3670016 - - - - -
nbp1-OST004d_UUID
3471888 - 3538944 - - - - -
nbp1-OST004e_UUID
3748968 - 3932160 - - - - -
nbp1-OST004f_UUID
3354948 - 3538944 - - - - -
nbp1-OST0050_UUID
3316300 - 3407872 - - - - -
nbp1-OST0051_UUID
3152108 - 3276800 - - - - -
nbp1-OST0052_UUID
3838468 - 3932160 - - - - -
nbp1-OST0053_UUID
3376732 - 3538944 - - - - -
nbp1-OST0054_UUID
3918108 - 4063232 - - - - -
nbp1-OST0055_UUID
3449808 - 3538944 - - - - -
nbp1-OST0056_UUID
3551388 - 3670016 - - - - -
nbp1-OST0057_UUID
3667468 - 3801088 - - - - -
nbp1-OST0058_UUID
3354304 - 3538944 - - - - -
nbp1-OST0059_UUID
2965084 - 3145728 - - - - -
nbp1-OST005a_UUID
2929228 - 3014656 - - - - -
nbp1-OST005b_UUID
2948140 - 3014656 - - - - -
nbp1-OST005c_UUID
3411472 - 3538944 - - - - -
nbp1-OST005d_UUID
3339932 - 3407872 - - - - -
nbp1-OST005e_UUID
2810480 - 2883584 - - - - -
nbp1-OST005f_UUID
3375644 - 3538944 - - - - -
nbp1-OST0060_UUID
4247276 - 4325376 - - - - -
nbp1-OST0061_UUID
3574344 - 3670016 - - - - -
nbp1-OST0062_UUID
3407656 - 3538944 - - - - -
nbp1-OST0063_UUID
3894524 - 4063232 - - - - -
nbp1-OST0064_UUID
3226492 - 3407872 - - - - -
nbp1-OST0065_UUID
3812356 - 3932160 - - - - -
nbp1-OST0066_UUID
3089144 - 3276800 - - - - -
nbp1-OST0067_UUID
2847240 - 3014656 - - - - -
nbp1-OST0068_UUID
3166388 - 3276800 - - - - -
nbp1-OST0069_UUID
2993524 - 3145728 - - - - -
nbp1-OST006a_UUID
3249768 - 3407872 - - - - -
nbp1-OST006b_UUID
3110808 - 3276800 - - - - -
nbp1-OST006c_UUID
2656340 - 2752512 - - - - -
nbp1-OST006d_UUID
3244824 - 3407872 - - - - -
nbp1-OST006e_UUID
2712384 - 2883584 - - - - -
nbp1-OST006f_UUID
2970080 - 3145728 - - - - -
nbp1-OST0070_UUID
2657952 - 2752512 - - - - -
nbp1-OST0071_UUID
3438304 - 3538944 - - - - -
nbp1-OST0072_UUID
3103096 - 3276800 - - - - -
nbp1-OST0073_UUID
3004436 - 3145728 - - - - -
nbp1-OST0074_UUID
3615556 - 3801088 - - - - -
nbp1-OST0075_UUID
2835936 - 3014656 - - - - -
nbp1-OST0076_UUID
3352284 - 3538944 - - - - -
nbp1-OST0077_UUID
2366892 - 2490368 - - - - -
group quotas are not enabled.



 Comments   
Comment by Niu Yawei (Inactive) [ 17/Sep/13 ]

The 'lfs quota' output shows the inode softlimit has been exceeded, maybe the grace is expired already, but because of LU-3383, it's not displayed.

Comment by Mahmoud Hanafi [ 17/Sep/13 ]

I think you may have missread the numbers

inode used= 390994
inode soft=600000
inode hard=750000

Is there a ftp site that I can upload the log files.

Comment by Mahmoud Hanafi [ 17/Sep/13 ]

Got some additional debug info. Ran rpctrace and quota debuging on the client and it looks like the failure is releated to this error on the client.

00040000:00020000:7.0:1379445377.791596:0:11068:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3

see attached debug file.

Comment by Niu Yawei (Inactive) [ 18/Sep/13 ]

The log shows quota_ctl command to MDT failed with -3 (-ESRCH), which means quota on MDT isn't enabled properly, I think that's why MDT can't acquire quota from quota master and result in -EDQUOT at the end.

Could you check if there is any error message on MDS? and if possible, could you enable D_QUOTA & D_TRACE on MDS and collect log when running your test? (or lfs quota). Thanks.

Comment by Mahmoud Hanafi [ 18/Sep/13 ]

I have debug during lfs quota command. It is about 300M where can I upload it.

I was not able to get the failure to occur with D_TRACE is enabled.

Comment by Mahmoud Hanafi [ 18/Sep/13 ]

I was able grab a D_QUOTA during the error. See attached files. I also was able to reduce the size of the lfs quota debug logs.

Comment by Mahmoud Hanafi [ 26/Sep/13 ]

Please provide an update on this

Comment by Niu Yawei (Inactive) [ 08/Oct/13 ]

Hi, Mahmoud, I'm just back from vacation, sorry for the delayed response.

I didn't find anything abnormal in the log, is it possible to provide the MDS log (better with D_TRACE enabled) when lfs quota failed (as you mentioned before: seeing 00040000:00020000:7.0:1379445377.791596:0:11068:0:(quota_ctl.c:330:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3 on client)?

Comment by John Fuchs-Chesney (Inactive) [ 12/Mar/14 ]

Mahmoud,
Are you looking for more support on this issue?
Or is this behind you now?
If so, may I mark it as resolved?
Thanks,
~ jfc.

Comment by John Fuchs-Chesney (Inactive) [ 29/Jul/14 ]

Not sure if this is still an issue for NASA?

If it is, please let us know and we can re-open the ticket if you wish.
Thanks,
~ jfc.

Generated at Sat Feb 10 01:38:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.