[LU-5247] Strange quota limits on OSTs Created: 24/Jun/14 Updated: 07/Aug/14 Resolved: 07/Aug/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Li Xi (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 14638 |
| Description |
|
We are seeing some strange quota limits on OSTs as following: [root@nmds04 home]# lfs quota -u lect02 -v /root/lustre END { print SUM }' First, some OSTs have much bigger granted limits than their usages. And second, the sum of limits on OSTs exceeds the total limit. What is more, some user which do do have have quota limits can not use more spaces unless the quota is turn off manually. |
| Comments |
| Comment by Li Xi (Inactive) [ 24/Jun/14 ] |
|
We are currently in a maintainance windown, and trying to fix/walk around this problem instantly. Otherwise, disabling quota feature would be the last choice. We tried rebooting the OSTs and MDTs, runing 'tune2fs -O ^quota/quota', and doing 'lctl conf_param $FSNAME.quota.ost=none; lctl conf_param $FSNAME.quota.ost=ug;'. None of those attempts succeeded. I am wondering whether the files under quota_slave directories are broken. Is it safe to remove all of those directories offline and restart the Lustre then? I tried that in an test environment, and nothing bad happened. But we'd like to make sure it won't cause any problem of losing data or breaking down the system. Is there any other good idea about how to fix/walk around this problem? Thanks in advance! |
| Comment by Peter Jones [ 24/Jun/14 ] |
|
Niu Can you please advise? Thanks Peter |
| Comment by Johann Lombardi (Inactive) [ 24/Jun/14 ] |
|
Could you please dump the limits on all the OSTs by running the following command? # lctl get_param osd*.*.quota_slave.limit* And also dumps limits on the qmt so that we can compare it. You could also try to force reintegration by running on each OSS: # lctl set_param osd*.*.quota_slave.force_reint=1 As for removing the slave copies of the indexes on the OST, it was designed to work, but it does not seem to be tested in sanity-quota. Niu, could you please run additional tests to make sure it works well? |
| Comment by Li Xi (Inactive) [ 24/Jun/14 ] |
|
Hi Johann, Thanks for helping. I've attached the output of 'lctl get_param osd*..quota_slave.limit' However, nothing changes after running 'lctl set_param osd*.*.quota_slave.force_reint=1' on the first OSS. The limits are still strange. [root@nmds06 ~]# lfs quota -u toyasuda -v /root/lustre/ |
| Comment by Zhenyu Xu [ 25/Jun/14 ] |
|
Strange thing to me is why uid 3018 quota info is not in qmt_dt-0x0_glb-usr.log? |
| Comment by Mitsuhiro Nishizawa [ 25/Jun/14 ] |
|
Hi Zhenyu, |
| Comment by Zhenyu Xu [ 25/Jun/14 ] |
|
While uid 652 quota info does no appear in the qmt_dt-0x0_glb-usr.log also. |
| Comment by Mitsuhiro Nishizawa [ 25/Jun/14 ] |
|
I checked qmt_dt-0x0_glb-usr.log and the current quota_slave.limit on QMT for lustre5. It looks like qmt_dt-0x0_glb-usr.log is not from lustre5, sorry. |
| Comment by Mitsuhiro Nishizawa [ 25/Jun/14 ] |
|
quota_slave.limit from QMT, quota_slave for lustre5 file system |
| Comment by Mitsuhiro Nishizawa [ 25/Jun/14 ] |
|
Sorry, QMT log |
| Comment by Mitsuhiro Nishizawa [ 25/Jun/14 ] |
|
Sorry, many times, this one is the latest QMT log for lustre5 (captured at the same time as lustre5_quota_slave.limit.tar.gz). I uploaded different file mistakenly... |
| Comment by Mitsuhiro Nishizawa [ 25/Jun/14 ] |
|
Hello, can we expect an action plan (or if we can safely remove slave copies) within a few hours? The system is currently in the maintenance window and we can do offline work, but it will over at the end of today. The customer need to determine if they continue to wait for our response or they put the system back to service without quota. This is difficult situation for them to decide. Please share us with the latest information and expectation to go. Regards, |
| Comment by Zhenyu Xu [ 25/Jun/14 ] |
|
There is some id 's granted value exceeds its hardlimit # grep id lustre5_qmt_dt-0x0_glb-usr.log | awk '{print $3}' > ids
# for i in `cat ids` ; do echo $i; grep "id:.*$i" -A1 *.limit | grep -v "id" | awk 'BEGIN {FS=",[ \t]*|[ \t]+"} {SUM += $9} END { if ($5 < SUM) {print $5, SUM, ":granted over hard"}}'; done
0
1000000000 601956847608 :granted over hard
3348
1000000000 9773350268 :granted over hard
3094
5000000000 24406854848 :granted over hard
3352
4000000000 62904909164 :granted over hard
3118
51000000000 1156091716280 :granted over hard
3121
1000000000 47506492972 :granted over hard
3152
100000000000 1373056117076 :granted over hard
3159
11000000000 112035373580 :granted over hard
3161
15000000000 344219288276 :granted over hard
3163
50000000000 54010129436 :granted over hard
3426
1000000000 6235934912 :granted over hard
3173
1000000000 13841161496 :granted over hard
3433
1000000000 17108669108 :granted over hard
3442
1000000000 1271819600 :granted over hard
3190
10000000000 124199124628 :granted over hard
3448
1000000000 15784903460 :granted over hard
3488
1000000000 1091150888 :granted over hard
3256
1000000000 10888793852 :granted over hard
3016
1000000000 2800668840 :granted over hard
3272
1000000000 36381814148 :granted over hard
3039
1000000000 20626585832 :granted over hard
3043
21000000000 428560476316 :granted over hard
3054
1000000000 1088043300 :granted over hard
|
| Comment by Zhenyu Xu [ 25/Jun/14 ] |
|
I've tried to backup quota_slave and remove it, it does work under master branch, but not for b2_4 code. # umount /dev/sdd # mount -t ldiskfs /dev/sdd /mnt/ost2 # mv /mnt/ost2/quota_slave/ /mnt/ost2/quota_slave.bak # umount /dev/sdd # mount -t lustre /dev/sdd /mnt/ost2 mount.lustre: mount /dev/sdd at /mnt/ost2 failed: File exists # git describe v2_4_3_0-1-gd00e4d6 The OI know there exists slave index file in its oi mapping file but cannot find it. LiXi said that he can remove the quota_slave and mount successfully, I don't know how. |
| Comment by Li Xi (Inactive) [ 25/Jun/14 ] |
|
Hi Zhengyu, I removed the whole directory rather than renaming it. I guess that is the difference? I got similar problem when renaming the file, so I removed it by 'rm -rf quota_slave'. BTW, I did the test on master branch of Lustre rather than 2.4.x. |
| Comment by Zhenyu Xu [ 25/Jun/14 ] |
|
I think 2.4.x does not work as master, OI code on master is more mature. |
| Comment by Zhenyu Xu [ 25/Jun/14 ] |
|
Would you mind setting the hard limit to 1MB for a UID, say 3348, then check whether its granted value falls, and set the limit back to check that space got released? If you use soft limit as well, decrease and restore it as well. |
| Comment by Mitsuhiro Nishizawa [ 26/Jun/14 ] |
|
First, I disabled quota on the systems since the behavior is not stable and the customer need to bring the service back. However quota feature is required for them and I need to resolve this issue ASAP. I tried to change the hard limit and found the following (I used admin users for the file systems). I also captured debug log (with +trace, +quota) when I did setquota for UID 3018 on file system lustre4 and for UID652 on file system lustre5. |
| Comment by Li Xi (Inactive) [ 26/Jun/14 ] |
|
Hi Zhengyu, You are right. When I removed the quota_slave directory and tried to mount the OSD again on Lustre-2.4.2. Following LBUG happened. LDISKFS-fs (sdb3): mounted filesystem with ordered data mode. quota=on. Opts: Call Trace: Kernel panic - not syncing: LBUG |
| Comment by Johann Lombardi (Inactive) [ 26/Jun/14 ] |
When setting the limit to 0, the slaves should release all quota space unconditionally (see qsd_calc_adjust()). Could you please enable quota debug on one of the OSS that did not release space, set quota -B 0, wait for a couple of seconds and then collect logs on this OSS? |
| Comment by Mitsuhiro Nishizawa [ 26/Jun/14 ] |
|
I did the same thing (setquota -B 0) and captured OSS quota debug log. |
| Comment by Mitsuhiro Nishizawa [ 26/Jun/14 ] |
|
Other logs captured at the same time. |
| Comment by Johann Lombardi (Inactive) [ 27/Jun/14 ] |
|
I looked at the debug logs for lustre5 and unfortunately, i have traces of neither the setquota request in the MDS logs nor the glimpse in the OSS logs # lctl get_param osd*.*.quota_slave.info Thanks in advance. |
| Comment by Mitsuhiro Nishizawa [ 28/Jun/14 ] |
|
I checked again quota_slave.info, but connection to master was setup on all the OSTs (lustre5_quota_slave.info.tar.gz). lfs quota -u 652 -v /root/lustre lfs setquota -u 652 -B 1000000000 /root/lustre lfs quota -u 652 -v /root/lustre lfs setquota -u 652 -B 0 /root/lustre lfs quota -u 652 -v /root/lustre |
| Comment by Zhenyu Xu [ 01/Jul/14 ] |
|
The lustre5_quota_slave.info shows that all OSTs do not have quota enabled. nmds05_quota_slave.info:5:quota enabled: none You need enable quota for the test and collect the log again. |
| Comment by Mitsuhiro Nishizawa [ 01/Jul/14 ] |
|
Yes, as I stated before, we disabled quota for interim remedy. We have tried changing quota setting, force_reint while quota was being enabled, but it did not work. Since we could not determine if quota exceeded on some OSTs are really effective or false positive, we ended up to disable quota and put the system back in service. When I changed quota setting by 'lfs setquota' while quota is being disabled, OSTs other than OST0000-OST0004 changed 'limit' value in 'lfs quota'. What is the difference between these OSTs and others where 'limit' did not changed? Can I also ask what is the designed (expected behavior) here at the first place and what is not. When we change quota setting while quota is disabled, what value 'limit' reported by 'lfs quota' should be set to? Does the behavior change when we enable quota? How can we confirm if quota has up-to-dated correct information? |
| Comment by Johann Lombardi (Inactive) [ 01/Jul/14 ] |
Understood. That said, the procedure we gave you (i.e set quota limit to 0 to force all OSTs to release space) only works if quota has been enabled. Would it be possible to enable quota for a short amount of time and rerun those commands? Once completed, you can then disable quota again if it does not work.
It is effective.
The thing is that setting the hard limit to 0 is expected to fix the problem (provided that quotas are on), so no log capture would be required in this case.
right.
The OSTs that changed limits still hold a global quota lock, while OST0000-OST0004 do not. This quota lock isn't automatically dropped once quota is disabled and isn't enqueued if quota is disabled at mount time.
The quota master should report the right limit. As for slaves, it depends on whether they still hold a global quota lock. To sum up, when quota is disabled, you should not pay attention to the hard limit on OSTs.
Yes. When quota is enabled, all quota slaves have to acquire a global quota lock.
In quota_slave.info, the number between [] is 1 when the slave is synchronized with the master and 0 otherwise. |
| Comment by Johann Lombardi (Inactive) [ 02/Jul/14 ] |
|
Any update? |
| Comment by Mitsuhiro Nishizawa [ 03/Jul/14 ] |
|
I talked with the customer, but unfortunately just enabling quota while the system is in service is not acceptable for the customer. To do the test safely, setting '-B 0' on all the user and then enable quota would be one option for them. Assuming we do the test, what is expected result? Currently,
What is the expected result (what it should be) for these items after setting quota limit to 0 to force all OSTs to release space? |
| Comment by Niu Yawei (Inactive) [ 03/Jul/14 ] |
The expected result is that all OSTs will release their limit to master (which means all OSTs will have 0 limit at the end). Please collect debug log with D_QUOTA enabled when you enable quota (on both MDT and OSTs). Thank you. |
| Comment by Mitsuhiro Nishizawa [ 03/Jul/14 ] |
|
We need to build a test plan for the action and so let me ask more. |
| Comment by Johann Lombardi (Inactive) [ 03/Jul/14 ] |
sure
On -B 0, all slaves should release reserved quota space.
None of the OSTs are strictly synchronized with the master. If you check quota_slave.info, you will see that all report "glb[0] slv[0]". Some of them just happen to still own a glb quota lock.
Usage should always be consistent, regardless of the quota enforcement status.
Those ones are expected to release all the quota space.
Once quota is enabled, yes.
OST0000-OST0004 will synchronize with the master only once quota is enabled. At this point, i would advise to proceed as follows: # lctl get_param qmt.*.*.glb* 3. set all the limits to 0 with quota disabled What do you think? |
| Comment by Mitsuhiro Nishizawa [ 03/Jul/14 ] |
|
Thanks Johann! I will write up a test plan and try the actions you provided. regards, |
| Comment by Johann Lombardi (Inactive) [ 03/Jul/14 ] |
|
For the record, i created |
| Comment by Mitsuhiro Nishizawa [ 07/Jul/14 ] |
|
The customer agreed with our plan and I am doing the work currently. After step 6, I set quota (-B 1000000000) on two users. The 'limit' value has now "0" for OSTs that does not have any object for the user, and 'limit' is set to the same value as usage in the case of the OSTs that has objects. For example, [root@nmds04 ~]# lfs setquota -u lect01 -B 1000000000 /root/lustre/
[root@nmds04 ~]#
[root@nmds04 ~]# lfs quota -u lect01 -v /root/lustre/
Disk quotas for user lect01 (uid 3017):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre/ 28 0 1000000000 - 7 0 0 -
lustre4-MDT0000_UUID
8 - 0 - 7 - 0 -
lustre4-OST0000_UUID
0 - 0 - - - - -
lustre4-OST0001_UUID
0 - 0 - - - - -
lustre4-OST0002_UUID
0 - 0 - - - - -
lustre4-OST0003_UUID
0 - 0 - - - - -
lustre4-OST0004_UUID
4* - 4 - - - - -
lustre4-OST0005_UUID
4* - 4 - - - - -
lustre4-OST0006_UUID
0 - 0 - - - - -
lustre4-OST0007_UUID
0 - 0 - - - - -
lustre4-OST0008_UUID
0 - 0 - - - - -
[...]
On another user, [root@nmds04 ~]# lfs setquota -u lect02 -B 1000000000 /root/lustre/
[root@nmds04 ~]#
[root@nmds04 ~]# lfs quota -u lect02 -v /root/lustre/
Disk quotas for user lect02 (uid 3018):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre/ 1270432 0 1000000000 - 6 0 0 -
lustre4-MDT0000_UUID
8 - 0 - 6 - 0 -
lustre4-OST0000_UUID
14344* - 14344 - - - - -
lustre4-OST0001_UUID
15368* - 15368 - - - - -
lustre4-OST0002_UUID
15368* - 15368 - - - - -
lustre4-OST0003_UUID
15372* - 15372 - - - - -
lustre4-OST0004_UUID
15368* - 15368 - - - - -
[...]
Is this correct behavior? |
| Comment by Niu Yawei (Inactive) [ 07/Jul/14 ] |
This is correct behavior. |
| Comment by Mitsuhiro Nishizawa [ 07/Jul/14 ] |
|
I performed the action plan and confirmed quota issue has been resolved on most users, but also found a behavior that quota limit on each OST were not updated even I did 'setquota -B 0' (while quota was enabled, quota_slave was sync with the master). On the same file system, in the case of another user, limit value was updated just after I issued 'setquota -B 0'. Although I could not try on the all users, at least the behavior occurred on one user (UID: 2165). I captured debug log (lustre5_quota_UID2165_20140707.tar.gz). Even though "quota exceeded" is set on some of OSTs, I could write to that OSTs successfully. When I was doing write test on that user, finally limit value was updated just after 'setquota -B 0'. What I was doing is writing files on each OSTs, 'setquota -B 1000000000' and 'setquota -B 0' several times. Is this correct behavior? |
| Comment by Johann Lombardi (Inactive) [ 07/Jul/14 ] |
|
Could you please clarify when the quota logs were collected? Just after setting the limit to 0? I only have the lfs quota output with a limit set to 10000000 [root@nmds06 ~]# lfs quota -u w3ganglia -v /root/lustre
Disk quotas for user w3ganglia (uid 2165):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre 1151068 0 10000000 - 25 0 0 -
lustre5-MDT0000_UUID
...
|
| Comment by Johann Lombardi (Inactive) [ 07/Jul/14 ] |
|
In the logs, i see the glimpse related to set quota -B 1000000000: 00040000:04000000:1.0:1404731522.062728:0:24988:0:(qsd_lock.c:238:qsd_glb_glimpse_ast()) lustre5-OST0002: glimpse on glb quota locks, id:2165 ver:569 hard:1000000000 soft:0 But i don't have logs related to -B 0 where the slave is supposed to release all the quota space. When a hard limit is enforced for a given user/group, it is actually expected to have a local slave limit higher than the actual usage as long as the sum of all slave limits (i.e. as reported by lfs quota -v) is equal to the granted field in qmt_dt-0x0_glb-usr on the quota master. When i look at this file in the tarball, i see: - id: 2165
limits: { hard: 1000000, soft: 0, granted: 863512, time: 0 }
It is pretty difficult for me to make any sense of those data because i have the "lfs quota -v" output for -B 10000000, debug logs for -B 1000000000 and qmt_dt-0x0_glb-usr file with -B 1000000. Could you please set the hard limit to 0, collect "lfs quota -v" as well as qmt_dt-0x0_glb-usr and then set the hard limit to 1000000000 and collect again "lfs quota -v" and qmt_dt-0x0_glb-usr ? Thanks in advance. |
| Comment by Johann Lombardi (Inactive) [ 07/Jul/14 ] |
|
Ah, i actually found logs later related to -B 0: 1. The notification related to the hard limit change is received by the slave: 00040000:04000000:19.0:1404731522.120841:0:13913:0:(qsd_lock.c:238:qsd_glb_glimpse_ast()) lustre5-OST0002: glimpse on glb quota locks, id:2165 ver:570 hard:0 soft:0 2. The slave updates the lqe as well as the local copy of the global index: 00040000:04000000:0.0:1404731522.120881:0:26196:0:(qsd_entry.c:334:qsd_update_lqe()) $$$ updating global index hardlimit: 0, softlimit: 0 qsd:lustre5-OST0002 qtype:usr id:2165 enforced:0 granted:4 pending:0 waiting:0 req:0 usage:4 qunit:0 qtune:0 edquot:0 3. The slave refreshes usage for this ID: 00040000:04000000:0.0:1404731522.120890:0:26196:0:(qsd_entry.c:219:qsd_refresh_usage()) $$$ disk usage: 4 qsd:lustre5-OST0002 qtype:usr id:2165 enforced:0 granted:4 pending:0 waiting:0 req:0 usage:4 qunit:0 qtune:0 edquot:0 4. The slave decides to release the 4KB of quota space it owns for this user since quota isn't enforced any more for this ID: 00040000:04000000:0.0:1404731522.120898:0:26196:0:(qsd_handler.c:179:qsd_calc_adjust()) $$$ not enforced, releasing all space qsd:lustre5-OST0002 qtype:usr id:2165 enforced:0 granted:4 pending:0 waiting:0 req:0 usage:4 qunit:0 qtune:0 edquot:0 00040000:04000000:16.0:1404731522.121264:0:60574:0:(qsd_handler.c:335:qsd_req_completion()) $$$ DQACQ returned 0, flags:0x4 qsd:lustre5-OST0002 qtype:usr id:2165 enforced:0 granted:4 pending:0 waiting:0 req:1 usage:4 qunit:0 qtune:0 edquot:0 00040000:04000000:16.0:1404731522.121266:0:60574:0:(qsd_handler.c:357:qsd_req_completion()) $$$ DQACQ qb_count:4 qsd:lustre5-OST0002 qtype:usr id:2165 enforced:0 granted:4 pending:0 waiting:0 req:1 usage:4 qunit:0 qtune:0 edquot:0 00040000:04000000:0.0:1404731522.121283:0:26196:0:(qsd_entry.c:278:qsd_update_index()) lustre5-OST0002: update granted to 0 for id 2165 As far as i can see, -B 0 worked as expected and all the quota space owned by the slave has been released. If you could just collect qmt_dt-0x0_glb-usr as well as "lfs quota -v" output, then we could check that limits are consistent everywhere. |
| Comment by Mitsuhiro Nishizawa [ 08/Jul/14 ] |
|
When I set -B 0 for UID2165, quota limit as a whole was updated to the 0, but 'limit' on each OSTs was not updated. qmt_dt-0x0_glb-usr just after setting -B 0 was; - id: 2165
limits: { hard: 0, soft: 0, granted: 0, time: 0 }
However, when I ran lfs quota; Disk quotas for user 2165 (uid 2165):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre 512084 0 0 - 17 0 0 -
lustre5-MDT0000_UUID
8 - 0 - 17 - 0 -
lustre5-OST0000_UUID
0 - 270728 - - - - -
lustre5-OST0001_UUID
0 - 95220 - - - - -
lustre5-OST0002_UUID
4 - 0 - - - - -
lustre5-OST0003_UUID
0 - 42292 - - - - -
lustre5-OST0004_UUID
0 - 798452 - - - - -
lustre5-OST0005_UUID
0 - 4194304 - - - - -
lustre5-OST0006_UUID
0 - 4194304 - - - - -
lustre5-OST0007_UUID
0 - 0 - - - - -
lustre5-OST0008_UUID
0 - 4 - - - - -
lustre5-OST0009_UUID
0 - 4194304 - - - - -
lustre5-OST000a_UUID
0 - 4194304 - - - - -
lustre5-OST000b_UUID
0 - 0 - - - - -
lustre5-OST000c_UUID
0 - 4194304 - - - - -
lustre5-OST000d_UUID
0 - 4194304 - - - - -
lustre5-OST000e_UUID
0 - 4194304 - - - - -
lustre5-OST000f_UUID
0 - 4194304 - - - - -
lustre5-OST0010_UUID
0 - 0 - - - - -
lustre5-OST0011_UUID
0 - 4 - - - - -
lustre5-OST0012_UUID
102408 - 0 - - - - -
lustre5-OST0013_UUID
4 - 0 - - - - -
lustre5-OST0014_UUID
0 - 0 - - - - -
lustre5-OST0015_UUID
0 - 105836 - - - - -
lustre5-OST0016_UUID
0 - 59688 - - - - -
lustre5-OST0017_UUID
0 - 4 - - - - -
lustre5-OST0018_UUID
0 - 4194304 - - - - -
lustre5-OST0019_UUID
0 - 1983424 - - - - -
lustre5-OST001a_UUID
0 - 4 - - - - -
lustre5-OST001b_UUID
102408 - 0 - - - - -
lustre5-OST001c_UUID
0 - 4 - - - - -
lustre5-OST001d_UUID
0 - 0 - - - - -
lustre5-OST001e_UUID
102408 - 0 - - - - -
lustre5-OST001f_UUID
102408 - 0 - - - - -
lustre5-OST0020_UUID
0 - 1522108 - - - - -
lustre5-OST0021_UUID
0 - 4194304 - - - - -
lustre5-OST0022_UUID
0 - 4194304 - - - - -
lustre5-OST0023_UUID
0 - 0 - - - - -
lustre5-OST0024_UUID
0 - 796168 - - - - -
lustre5-OST0025_UUID
0 - 4194304 - - - - -
lustre5-OST0026_UUID
0 - 4194304 - - - - -
lustre5-OST0027_UUID
0 - 0 - - - - -
lustre5-OST0028_UUID
0 - 648440 - - - - -
lustre5-OST0029_UUID
0 - 55360 - - - - -
lustre5-OST002a_UUID
0 - 4194304 - - - - -
lustre5-OST002b_UUID
4 - 0 - - - - -
lustre5-OST002c_UUID
0 - 4194304 - - - - -
lustre5-OST002d_UUID
0 - 60344 - - - - -
lustre5-OST002e_UUID
0 - 4194304 - - - - -
lustre5-OST002f_UUID
0 - 0 - - - - -
lustre5-OST0030_UUID
4 - 0 - - - - -
lustre5-OST0031_UUID
0 - 18940 - - - - -
lustre5-OST0032_UUID
0 - 63884 - - - - -
lustre5-OST0033_UUID
0 - 19500 - - - - -
lustre5-OST0034_UUID
4 - 0 - - - - -
lustre5-OST0035_UUID
0 - 1138508 - - - - -
lustre5-OST0036_UUID
4 - 0 - - - - -
lustre5-OST0037_UUID
0 - 445992 - - - - -
lustre5-OST0038_UUID
0 - 4194304 - - - - -
lustre5-OST0039_UUID
0 - 4194304 - - - - -
lustre5-OST003a_UUID
0 - 4194304 - - - - -
lustre5-OST003b_UUID
0 - 1083572 - - - - -
lustre5-OST003c_UUID
0 - 1015720 - - - - -
lustre5-OST003d_UUID
0 - 4194304 - - - - -
lustre5-OST003e_UUID
4 - 0 - - - - -
lustre5-OST003f_UUID
0 - 4194304 - - - - -
lustre5-OST0040_UUID
4 - 0 - - - - -
lustre5-OST0041_UUID
0 - 0 - - - - -
lustre5-OST0042_UUID
4 - 0 - - - - -
lustre5-OST0043_UUID
0 - 80396 - - - - -
lustre5-OST0044_UUID
0 - 4 - - - - -
lustre5-OST0045_UUID
0 - 68900 - - - - -
lustre5-OST0046_UUID
0 - 74876 - - - - -
lustre5-OST0047_UUID
0 - 4194304 - - - - -
lustre5-OST0048_UUID
0 - 4194304 - - - - -
lustre5-OST0049_UUID
102408 - 0 - - - - -
lustre5-OST004a_UUID
0 - 0 - - - - -
lustre5-OST004b_UUID
0 - 4194304 - - - - -
lustre5-OST004c_UUID
0 - 4194304 - - - - -
lustre5-OST004d_UUID
0 - 4194304 - - - - -
lustre5-OST004e_UUID
0 - 4194304 - - - - -
lustre5-OST004f_UUID
0 - 4194304 - - - - -
lustre5-OST0050_UUID
0 - 0 - - - - -
lustre5-OST0051_UUID
0 - 4194304 - - - - -
lustre5-OST0052_UUID
0 - 4 - - - - -
lustre5-OST0053_UUID
0 - 1604368 - - - - -
Why those 'limit' were not updated to 0? |
| Comment by Johann Lombardi (Inactive) [ 08/Jul/14 ] |
|
Well, i looked at logs for lustre5-OST0002 which released all the quota space (limit = 0). I am going to have a look at the logs for the other OSTs then. |
| Comment by Johann Lombardi (Inactive) [ 08/Jul/14 ] |
|
Checking logs for lustre5-OST0004: 00040000:04000000:1.0:1404731522.120827:0:24988:0:(qsd_lock.c:238:qsd_glb_glimpse_ast()) lustre5-OST0004: glimpse on glb quota locks, id:2165 ver:570 hard:0 soft:0 00040000:04000000:27.0:1404731522.120894:0:26375:0:(qsd_entry.c:334:qsd_update_lqe()) $$$ updating global index hardlimit: 0, softlimit: 0 qsd:lustre5-OST0004 qtype:usr id:2165 enforced:0 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0 00040000:04000000:27.0:1404731522.120936:0:26375:0:(qsd_entry.c:219:qsd_refresh_usage()) $$$ disk usage: 0 qsd:lustre5-OST0004 qtype:usr id:2165 enforced:0 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0 00040000:04000000:27.0:1404731522.120946:0:26375:0:(qsd_handler.c:931:qsd_adjust()) $$$ no adjustment required qsd:lustre5-OST0004 qtype:usr id:2165 enforced:0 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0 So it seems that no quota space was owned by lustre5-OST0004 and everything looks fine in the logs. Unfortunately, the logs were not collected at the same time as the lfs quota output above. Could you please: Thanks in advance. |
| Comment by Mitsuhiro Nishizawa [ 08/Jul/14 ] |
|
Johann, I collected 'lfs quota -v' output just after 'setquota -B 0'. What I did was, |
| Comment by Johann Lombardi (Inactive) [ 08/Jul/14 ] |
|
Mitsuhiro, could you please clarify what you mean by "and now it was resolved somehow"? |
| Comment by Mitsuhiro Nishizawa [ 08/Jul/14 ] |
|
After I captured lustre debug log, the customer was doing a write test using UID2165 to see how it behaves. At that time, 'limit' on each OSTs did not changed by setting hard limit to 0 or to 1000000000. They found they can write over the 'limit' shown in 'lfs quota'. While doing the test and 'lfs setquota', they also noticed 'limit' on each OSTs changed by 'lfs setquota'. Now, it can be set to "0" by 'lfs setquota -B 0' and to a value like "69652" by 'lfs setquota -B 10000000" (in this case, hard limit is set to 10GB). Thanks, |
| Comment by Mitsuhiro Nishizawa [ 11/Jul/14 ] |
|
What does the behavior on UID2165 mean? Is there anything we should do to have quota behave correctly? Regards, |
| Comment by Niu Yawei (Inactive) [ 11/Jul/14 ] |
Do you mean that customer is writing files as UID2165 while you changing hard limit? How did you observe that limit wasn't changed by setting hard limit to 0 (could you show me the command and the output)?
Could you explain it in detail? One possible reason is: the limit was set to 0, data could be cached on client, when user set a limit, the cache data will be flushed back anyway despite of quota limit.
Was the limit changed as we have expected?
You mean that UID is back to normal now, right? |
| Comment by Mitsuhiro Nishizawa [ 11/Jul/14 ] |
No. I set hard limit to 0 and confirmed 'limit' did not changed. After that, the customer did write tests and found 'limit' started to change. Here is the log, ESC]0;root@nmds06:~^G[root@nmds06 ~]# lfs setquota -u w3ganglia -B ESC[1@1ESC[CESC[CESC[ESC[1P0000000ESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[C
ESC]0;root@nmds06:~^G[root@nmds06 ~]# lfs quota ESC[K-u w3ganglia /root/lustre/
Disk quotas for user w3ganglia (uid 2165):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre/ 512084 0 1000000000 - 17 0 0 -
ESC]0;root@nmds06:~^G[root@nmds06 ~]#
ESC]0;root@nmds06:~^G[root@nmds06 ~]# lfs quota -u w3ganglia -v
Disk quotas for user w3ganglia (uid 2165):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre/ 512084 0 1000000000 - 17 0 0 -
lustre5-MDT0000_UUID
8 - 0 - 17 - 0 -
lustre5-OST0000_UUID
0 - 270728 - - - - -
lustre5-OST0001_UUID
0 - 95220 - - - - -
lustre5-OST0002_UUID
4* - 4 - - - - -
lustre5-OST0003_UUID
0 - 42292 - - - - -
lustre5-OST0004_UUID
0 - 798452 - - - - -
lustre5-OST0005_UUID
0 - 4194304 - - - - -
lustre5-OST0006_UUID
0 - 4194304 - - - - -
lustre5-OST0007_UUID
0 - 0 - - - - -
lustre5-OST0008_UUID
0 - 4 - - - - -
lustre5-OST0009_UUID
0 - 4194304 - - - - -
lustre5-OST000a_UUID
0 - 4194304 - - - - -
lustre5-OST000b_UUID
0 - 0 - - - - -
lustre5-OST000c_UUID
0 - 4194304 - - - - -
lustre5-OST000d_UUID
0 - 4194304 - - - - -
lustre5-OST000e_UUID
0 - 4194304 - - - - -
lustre5-OST000f_UUID
0 - 4194304 - - - - -
lustre5-OST0010_UUID
0 - 0 - - - - -
lustre5-OST0011_UUID
0 - 4 - - - - -
lustre5-OST0012_UUID
102408 - 4194304 - - - - -
lustre5-OST0013_UUID
4* - 4 - - - - -
lustre5-OST0014_UUID
0 - 0 - - - - -
lustre5-OST0015_UUID
0 - 105836 - - - - -
lustre5-OST0016_UUID
0 - 59688 - - - - -
lustre5-OST0017_UUID
0 - 4 - - - - -
lustre5-OST0018_UUID
0 - 4194304 - - - - -
lustre5-OST0019_UUID
0 - 1983424 - - - - -
lustre5-OST001a_UUID
0 - 4 - - - - -
lustre5-OST001b_UUID
102408 - 4194304 - - - - -
lustre5-OST001c_UUID
0 - 4 - - - - -
lustre5-OST001d_UUID
0 - 0 - - - - -
lustre5-OST001e_UUID
102408* - 102408 - - - - -
lustre5-OST001f_UUID
102408* - 102408 - - - - -
lustre5-OST0020_UUID
0 - 1522108 - - - - -
lustre5-OST0021_UUID
0 - 4194304 - - - - -
lustre5-OST0022_UUID
0 - 4194304 - - - - -
lustre5-OST0023_UUID
0 - 0 - - - - -
lustre5-OST0024_UUID
0 - 796168 - - - - -
lustre5-OST0025_UUID
0 - 4194304 - - - - -
lustre5-OST0026_UUID
0 - 4194304 - - - - -
lustre5-OST0027_UUID
0 - 0 - - - - -
lustre5-OST0028_UUID
0 - 648440 - - - - -
lustre5-OST0029_UUID
0 - 55360 - - - - -
lustre5-OST002a_UUID
0 - 4194304 - - - - -
lustre5-OST002b_UUID
4* - 4 - - - - -
lustre5-OST002c_UUID
0 - 4194304 - - - - -
lustre5-OST002d_UUID
0 - 60344 - - - - -
lustre5-OST002e_UUID
0 - 4194304 - - - - -
lustre5-OST002f_UUID
0 - 0 - - - - -
lustre5-OST0030_UUID
4* - 4 - - - - -
lustre5-OST0031_UUID
0 - 18940 - - - - -
lustre5-OST0032_UUID
0 - 63884 - - - - -
lustre5-OST0033_UUID
0 - 19500 - - - - -
lustre5-OST0034_UUID
4* - 4 - - - - -
lustre5-OST0035_UUID
0 - 1138508 - - - - -
lustre5-OST0036_UUID
4* - 4 - - - - -
lustre5-OST0037_UUID
0 - 445992 - - - - -
lustre5-OST0038_UUID
0 - 4194304 - - - - -
lustre5-OST0039_UUID
0 - 4194304 - - - - -
lustre5-OST003a_UUID
0 - 4194304 - - - - -
lustre5-OST003b_UUID
0 - 1083572 - - - - -
lustre5-OST003c_UUID
0 - 1015720 - - - - -
lustre5-OST003d_UUID
0 - 4194304 - - - - -
lustre5-OST003e_UUID
4 - 4194304 - - - - -
lustre5-OST003f_UUID
0 - 4194304 - - - - -
lustre5-OST0040_UUID
4* - 4 - - - - -
lustre5-OST0041_UUID
0 - 0 - - - - -
lustre5-OST0042_UUID
4* - 4 - - - - -
lustre5-OST0043_UUID
0 - 80396 - - - - -
lustre5-OST0044_UUID
0 - 4 - - - - -
lustre5-OST0045_UUID
0 - 68900 - - - - -
lustre5-OST0046_UUID
0 - 74876 - - - - -
lustre5-OST0047_UUID
0 - 4194304 - - - - -
lustre5-OST0048_UUID
0 - 4194304 - - - - -
lustre5-OST0049_UUID
102408* - 102408 - - - - -
lustre5-OST004a_UUID
0 - 0 - - - - -
lustre5-OST004b_UUID
0 - 4194304 - - - - -
lustre5-OST004c_UUID
0 - 4194304 - - - - -
lustre5-OST004d_UUID
0 - 4194304 - - - - -
lustre5-OST004e_UUID
0 - 4194304 - - - - -
lustre5-OST004f_UUID
0 - 4194304 - - - - -
lustre5-OST0050_UUID
0 - 0 - - - - -
lustre5-OST0051_UUID
0 - 4194304 - - - - -
lustre5-OST0052_UUID
0 - 4 - - - - -
lustre5-OST0053_UUID
0 - 1604368 - - - - -
ESC]0;root@nmds06:~^G[root@nmds06 ~]#
ESC]0;root@nmds06:~^G[root@nmds06 ~]#
ESC]0;root@nmds06:~^G[root@nmds06 ~]#
ESC]0;root@nmds06:~^G[root@nmds06 ~]# lfs quota -u w3ganglia ESC[3P/root/lustre/^MESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[17@setquota -u w3ganglia -B 1000000000ESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H0 ESC[K/root/lustreESC[K
ESC]0;root@nmds06:~^G[root@nmds06 ~]#
ESC]0;root@nmds06:~^G[root@nmds06 ~]# lfs setquota -u w3ganglia -B 0 /root/lustre^MESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[4Pquota -u w3ganglia -v /root/lustre/
Disk quotas for user w3ganglia (uid 2165):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre/ 512084 0 0 - 17 0 0 -
lustre5-MDT0000_UUID
8 - 0 - 17 - 0 -
lustre5-OST0000_UUID
0 - 270728 - - - - -
lustre5-OST0001_UUID
0 - 95220 - - - - -
lustre5-OST0002_UUID
4 - 0 - - - - -
lustre5-OST0003_UUID
0 - 42292 - - - - -
lustre5-OST0004_UUID
0 - 798452 - - - - -
lustre5-OST0005_UUID
0 - 4194304 - - - - -
lustre5-OST0006_UUID
0 - 4194304 - - - - -
lustre5-OST0007_UUID
0 - 0 - - - - -
lustre5-OST0008_UUID
0 - 4 - - - - -
lustre5-OST0009_UUID
0 - 4194304 - - - - -
lustre5-OST000a_UUID
0 - 4194304 - - - - -
lustre5-OST000b_UUID
0 - 0 - - - - -
lustre5-OST000c_UUID
0 - 4194304 - - - - -
lustre5-OST000d_UUID
0 - 4194304 - - - - -
lustre5-OST000e_UUID
0 - 4194304 - - - - -
lustre5-OST000f_UUID
0 - 4194304 - - - - -
lustre5-OST0010_UUID
0 - 0 - - - - -
lustre5-OST0011_UUID
0 - 4 - - - - -
lustre5-OST0012_UUID
102408 - 0 - - - - -
lustre5-OST0013_UUID
4 - 0 - - - - -
lustre5-OST0014_UUID
0 - 0 - - - - -
lustre5-OST0015_UUID
0 - 105836 - - - - -
lustre5-OST0016_UUID
0 - 59688 - - - - -
lustre5-OST0017_UUID
0 - 4 - - - - -
lustre5-OST0018_UUID
0 - 4194304 - - - - -
lustre5-OST0019_UUID
0 - 1983424 - - - - -
lustre5-OST001a_UUID
0 - 4 - - - - -
lustre5-OST001b_UUID
102408 - 0 - - - - -
lustre5-OST001c_UUID
0 - 4 - - - - -
lustre5-OST001d_UUID
0 - 0 - - - - -
lustre5-OST001e_UUID
102408 - 0 - - - - -
lustre5-OST001f_UUID
102408 - 0 - - - - -
lustre5-OST0020_UUID
0 - 1522108 - - - - -
lustre5-OST0021_UUID
0 - 4194304 - - - - -
lustre5-OST0022_UUID
0 - 4194304 - - - - -
lustre5-OST0023_UUID
0 - 0 - - - - -
lustre5-OST0024_UUID
0 - 796168 - - - - -
lustre5-OST0025_UUID
0 - 4194304 - - - - -
lustre5-OST0026_UUID
0 - 4194304 - - - - -
lustre5-OST0027_UUID
0 - 0 - - - - -
lustre5-OST0028_UUID
0 - 648440 - - - - -
lustre5-OST0029_UUID
0 - 55360 - - - - -
lustre5-OST002a_UUID
0 - 4194304 - - - - -
lustre5-OST002b_UUID
4 - 0 - - - - -
lustre5-OST002c_UUID
0 - 4194304 - - - - -
lustre5-OST002d_UUID
0 - 60344 - - - - -
lustre5-OST002e_UUID
0 - 4194304 - - - - -
lustre5-OST002f_UUID
0 - 0 - - - - -
lustre5-OST0030_UUID
4 - 0 - - - - -
lustre5-OST0031_UUID
0 - 18940 - - - - -
lustre5-OST0032_UUID
0 - 63884 - - - - -
lustre5-OST0033_UUID
0 - 19500 - - - - -
lustre5-OST0034_UUID
4 - 0 - - - - -
lustre5-OST0035_UUID
0 - 1138508 - - - - -
lustre5-OST0036_UUID
4 - 0 - - - - -
lustre5-OST0037_UUID
0 - 445992 - - - - -
lustre5-OST0038_UUID
0 - 4194304 - - - - -
lustre5-OST0039_UUID
0 - 4194304 - - - - -
lustre5-OST003a_UUID
0 - 4194304 - - - - -
lustre5-OST003b_UUID
0 - 1083572 - - - - -
lustre5-OST003c_UUID
0 - 1015720 - - - - -
lustre5-OST003d_UUID
0 - 4194304 - - - - -
lustre5-OST003e_UUID
4 - 0 - - - - -
lustre5-OST003f_UUID
0 - 4194304 - - - - -
lustre5-OST0040_UUID
4 - 0 - - - - -
lustre5-OST0041_UUID
0 - 0 - - - - -
lustre5-OST0042_UUID
4 - 0 - - - - -
lustre5-OST0043_UUID
0 - 80396 - - - - -
lustre5-OST0044_UUID
0 - 4 - - - - -
lustre5-OST0045_UUID
0 - 68900 - - - - -
lustre5-OST0046_UUID
0 - 74876 - - - - -
lustre5-OST0047_UUID
0 - 4194304 - - - - -
lustre5-OST0048_UUID
0 - 4194304 - - - - -
lustre5-OST0049_UUID
102408 - 0 - - - - -
lustre5-OST004a_UUID
0 - 0 - - - - -
lustre5-OST004b_UUID
0 - 4194304 - - - - -
lustre5-OST004c_UUID
0 - 4194304 - - - - -
lustre5-OST004d_UUID
0 - 4194304 - - - - -
lustre5-OST004e_UUID
0 - 4194304 - - - - -
lustre5-OST004f_UUID
0 - 4194304 - - - - -
lustre5-OST0050_UUID
0 - 0 - - - - -
lustre5-OST0051_UUID
0 - 4194304 - - - - -
lustre5-OST0052_UUID
0 - 4 - - - - -
lustre5-OST0053_UUID
0 - 1604368 - - - - -
and finally it was, [CESC[CESC[CESC[CESC[ClsESC[fs quota -u w3ganglia -v /root/lustre^MESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CsESC[fs quota -u w3ganglia -v /root/lustre^MESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CESC[CsESC[fs quota -u w3ganglia -v /root/lustre
Disk quotas for user w3ganglia (uid 2165):
Filesystem kbytes quota limit grace files quota limit grace
/root/lustre 1151068 0 10000000 - 25 0 0 -
lustre5-MDT0000_UUID
32 - 0 - 25 - 0 -
lustre5-OST0000_UUID
4100 - 68616 - - - - -
lustre5-OST0001_UUID
6152 - 70672 - - - - -
lustre5-OST0002_UUID
5128 - 69652 - - - - -
lustre5-OST0003_UUID
5124 - 68616 - - - - -
lustre5-OST0004_UUID
4100 - 68616 - - - - -
lustre5-OST0005_UUID
4112 - 68624 - - - - -
lustre5-OST0006_UUID
7188 - 71720 - - - - -
lustre5-OST0007_UUID
4112 - 68632 - - - - -
lustre5-OST0008_UUID
6160 - 69648 - - - - -
lustre5-OST0009_UUID
4112 - 68624 - - - - -
lustre5-OST000a_UUID
6160 - 70688 - - - - -
lustre5-OST000b_UUID
6164 - 70696 - - - - -
lustre5-OST000c_UUID
5136 - 68624 - - - - -
lustre5-OST000d_UUID
5136 - 69648 - - - - -
lustre5-OST000e_UUID
4112 - 68624 - - - - -
lustre5-OST000f_UUID
7188 - 71720 - - - - -
lustre5-OST0010_UUID
4112 - 68632 - - - - -
lustre5-OST0011_UUID
6160 - 69648 - - - - -
lustre5-OST0012_UUID
106520 - 171032 - - - - -
lustre5-OST0013_UUID
6168 - 70692 - - - - -
lustre5-OST0014_UUID
6164 - 70696 - - - - -
lustre5-OST0015_UUID
5136 - 68624 - - - - -
lustre5-OST0016_UUID
5136 - 69648 - - - - -
lustre5-OST0017_UUID
4112 - 68624 - - - - -
lustre5-OST0018_UUID
7188 - 71720 - - - - -
lustre5-OST0019_UUID
4112 - 68632 - - - - -
lustre5-OST001a_UUID
6160 - 69648 - - - - -
lustre5-OST001b_UUID
106520 - 171032 - - - - -
lustre5-OST001c_UUID
4108 - 68624 - - - - -
lustre5-OST001d_UUID
6164 - 70688 - - - - -
lustre5-OST001e_UUID
107544 - 172072 - - - - -
lustre5-OST001f_UUID
108568 - 172056 - - - - -
lustre5-OST0020_UUID
4112 - 68624 - - - - -
lustre5-OST0021_UUID
5136 - 69648 - - - - -
lustre5-OST0022_UUID
6164 - 70696 - - - - -
lustre5-OST0023_UUID
106516 - 171388 - - - - -
lustre5-OST0024_UUID
6160 - 69648 - - - - -
lustre5-OST0025_UUID
4112 - 68624 - - - - -
lustre5-OST0026_UUID
6152 - 70672 - - - - -
lustre5-OST0027_UUID
6156 - 70680 - - - - -
lustre5-OST0028_UUID
5124 - 68616 - - - - -
lustre5-OST0029_UUID
5124 - 69640 - - - - -
lustre5-OST002a_UUID
5136 - 69648 - - - - -
lustre5-OST002b_UUID
6168 - 70700 - - - - -
lustre5-OST002c_UUID
4112 - 68624 - - - - -
lustre5-OST002d_UUID
6160 - 69648 - - - - -
lustre5-OST002e_UUID
4112 - 68624 - - - - -
lustre5-OST002f_UUID
6164 - 70688 - - - - -
lustre5-OST0030_UUID
6168 - 70700 - - - - -
lustre5-OST0031_UUID
5136 - 68624 - - - - -
lustre5-OST0032_UUID
5136 - 69648 - - - - -
lustre5-OST0033_UUID
5132 - 65536 - - - - -
lustre5-OST0034_UUID
6168 - 70700 - - - - -
lustre5-OST0035_UUID
4112 - 68624 - - - - -
lustre5-OST0036_UUID
6164 - 69652 - - - - -
lustre5-OST0037_UUID
4112 - 68624 - - - - -
lustre5-OST0038_UUID
4112 - 68624 - - - - -
lustre5-OST0039_UUID
6164 - 70688 - - - - -
lustre5-OST003a_UUID
106516 - 150608 - - - - -
lustre5-OST003b_UUID
6160 - 69648 - - - - -
lustre5-OST003c_UUID
4112 - 68624 - - - - -
lustre5-OST003d_UUID
5136 - 69648 - - - - -
lustre5-OST003e_UUID
6168 - 70692 - - - - -
lustre5-OST003f_UUID
4112 - 68624 - - - - -
lustre5-OST0040_UUID
5140 - 69652 - - - - -
lustre5-OST0041_UUID
4112 - 68624 - - - - -
lustre5-OST0042_UUID
5848 - 70700 - - - - -
lustre5-OST0043_UUID
5136 - 68624 - - - - -
lustre5-OST0044_UUID
5136 - 69648 - - - - -
lustre5-OST0045_UUID
4112 - 68624 - - - - -
lustre5-OST0046_UUID
6164 - 70680 - - - - -
lustre5-OST0047_UUID
6164 - 70696 - - - - -
lustre5-OST0048_UUID
5136 - 68624 - - - - -
lustre5-OST0049_UUID
107544 - 172056 - - - - -
lustre5-OST004a_UUID
4112 - 68624 - - - - -
lustre5-OST004b_UUID
4112 - 68624 - - - - -
lustre5-OST004c_UUID
4112 - 68624 - - - - -
lustre5-OST004d_UUID
4112 - 68624 - - - - -
lustre5-OST004e_UUID
4112 - 68624 - - - - -
lustre5-OST004f_UUID
6164 - 70680 - - - - -
lustre5-OST0050_UUID
6164 - 70700 - - - - -
lustre5-OST0051_UUID
5136 - 68624 - - - - -
lustre5-OST0052_UUID
5136 - 69648 - - - - -
lustre5-OST0053_UUID
4112 - 68624 - - - - -
I don't have the log to show details unfortunately as I was not able to do a test by my own... Maybe the possibility you show is true. Thanks for pointing out.
Yes, as shown above, 'limit' on each OSTs started to change.
Yes, right. As far as I checked the account, I don't see a problematic behavior. However, because the customer observed the behavior above by themselves, they have a concern if quota is really behaving correctly now. I could confirm this account, but am not sure for all the account. |
| Comment by Niu Yawei (Inactive) [ 11/Jul/14 ] |
I didn't see global limit was set to 0 (it was from 1000000000 to 10000000) from the log. |
| Comment by Mitsuhiro Nishizawa [ 11/Jul/14 ] |
The log contains many meta characters and so it is a bit hard to read, but hard limit was first set to 1000000000, then to 0, finally to 10000000. |
| Comment by Niu Yawei (Inactive) [ 11/Jul/14 ] |
Ah, I see it. Did you wait for a while between 'lfs setquota -B 0' and 'lfs quota'? |
| Comment by Mitsuhiro Nishizawa [ 11/Jul/14 ] |
|
In the log above, it was a few seconds, but this situation lasted for more than 10 minutes. I tried a couple of times during the time. |
| Comment by Niu Yawei (Inactive) [ 14/Jul/14 ] |
Got it, thank you, but it's hard to tell why this happened without related logs, could you try to collect logs (in the way Johann suggested in previous comment) when you see the problem again? Thanks. |
| Comment by Mitsuhiro Nishizawa [ 14/Jul/14 ] |
|
The steps Johann suggested was,
and the log I collected was,
The difference would be whether we wait for 5 seconds before 'lfs quota' (debug log was captured after the '5 seconds') or do it after waiting for 5 seconds. How does this difference in collecting the log make sense to explain the behavior? |
| Comment by Niu Yawei (Inactive) [ 14/Jul/14 ] |
If the 'lfs quota -v' is executed immediately after 'lfs setquota -B 0', the slaves might haven't release limits yet, so you'd wait for a while after setting limit, then run 'lfs quota -v' to verify limit. We checked the provided log, and the log shows that slaves released limits as expected. |
| Comment by Peter Jones [ 07/Aug/14 ] |
|
As per Ihara it is ok to close this ticket. Quotas are now running ok since they were reset. It is not understood how things got into an inconsistent state but things have been running ok since the reset. |