[LU-935] Crash lquota:dquot_create_oqaq+0x28f/0x510 Created: 16/Dec/11 Updated: 09/May/12 Resolved: 02/Feb/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.x (1.8.0 - 1.8.5) |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.1.2, Lustre 1.8.8 |
| Type: | Bug | Priority: | Major |
| Reporter: | Supporto Lustre Jnet2000 (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | stats | ||
| Environment: |
Lustre version: 1.8.5.54-20110316022453-PRISTINE-2.6.18-194.17.1.el5_lustre.20110315140510 |
||
| Attachments: |
|
| Severity: | 2 |
| Epic: | client, hang, metadata, quota, server |
| Rank (Obsolete): | 4798 |
| Description |
|
The Lustre infrastructure is based on two HP Blade Server with an Dec 9 11:27:08 osiride-lp-030 kernel: BUG: soft lockup - CPU#8 stuck for 10s! [ll_mdt_06:21936] This saturates the resources of the server and the clients are unable to Regards |
| Comments |
| Comment by Peter Jones [ 16/Dec/11 ] |
|
Thanks for the report. An engineer will be in touch soon |
| Comment by Johann Lombardi (Inactive) [ 16/Dec/11 ] |
|
Could you please collect a sysrq-t (or even better a crash dump) of the MDS when those soft lockups are dumped to the console? |
| Comment by Peter Jones [ 16/Dec/11 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 18/Dec/11 ] |
|
in dquot_create_oqaq(), I don't see why we didn't break early when the i/bunit_size exceeding the upper limit while expanding i/bunit_size: /* enlarge block qunit size */ while (blimit > QUSG(dquot->dq_dqb.dqb_curspace + 2 * b_limitation, 1)) { oqaq->qaq_bunit_sz = QUSG(oqaq->qaq_bunit_sz * cqs_factor, 1) << QUOTABLOCK_BITS; b_limitation = oqaq->qaq_bunit_sz * ost_num * shrink_qunit_limit; } /* enlarge file qunit size */ while (ilimit > dquot->dq_dqb.dqb_curinodes + 2 * i_limitation) { oqaq->qaq_iunit_sz = oqaq->qaq_iunit_sz * cqs_factor; i_limitation = oqaq->qaq_iunit_sz * mdt_num * shrink_qunit_limit; } If the i/blimit is setting to a very large value by user, then the qaq_i/bunit_sz * cqs_factor could overflow, and causing endless loop at the end. I think we'd better break the loop whenever the oqaq->qaq_i/bunit_sz exceeded the upper limit, will provide a patch soon. |
| Comment by Niu Yawei (Inactive) [ 18/Dec/11 ] |
|
patch for b1_8: http://review.whamcloud.com/1887 |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 19/Dec/11 ] |
|
Thanks to all for answers, Regards Andrea Mattioli |
| Comment by Niu Yawei (Inactive) [ 19/Dec/11 ] |
I'm afard that some apps were setting very large limits, and that triggered this defect. To reproduce it, you could just set a very large ilimt/blimit for some user, for instace: lfs setquota -u user_foo -b 0 -B 0 -i 0 -I 17293822569102704639 /mnt/lustre |
| Comment by Niu Yawei (Inactive) [ 19/Dec/11 ] |
|
patch for master: http://review.whamcloud.com/1890 |
| Comment by Peter Jones [ 19/Dec/11 ] |
|
Niu Does that mean that it should be possible for the customer to workaround this issue by identifying which jobs use a "too high" limit and correcting it to something within an acceptable range and thus remove the need to apply a patch? If so, at what threshold does the value become problematic? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 19/Dec/11 ] |
This patch is necessary, we shouldn't restrict user to set high limit, and the threshold limit which can trigger the overflow depends on many factors: ost count, quota_qs_factor (default is 2), quota_boundary_factor (default is 4), so there isn't a static threshold. Assuming there are 100 osts, the blimit less than 100P bytes should probably be safe, and the ilimit could be larger (since there is only one mds), say less than 1000P inodes or so. |
| Comment by Peter Jones [ 19/Dec/11 ] |
|
Niu I understand that we want to fix this issue for a future release, but I just mean that the customer may prefer to workaround the issue rather than to apply a patch as a more immediate way to avoid the issue. If the customer were to provide the three values you mention ( ost count, quota_qs_factor, and quota_boundary_factor) would you be able to calculate the threshold more precisely? Thanks Peter |
| Comment by Johann Lombardi (Inactive) [ 19/Dec/11 ] |
|
Could we ask to the customer what is the highest quota limit set for users/groups? A workaround could be to just disable the dynamic qunit size feature by running the following command on the MDS before re-enabling quotas: # lctl set_param lquota.*.quota_switch_qs=0 |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 19/Dec/11 ] |
|
We are working to give to you the value of our highest group-quota limit. Thanks |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 19/Dec/11 ] |
|
I created the attached file with all quota set for group, we don't use user quota but only group quota limit. Now the highest hard limit for Block is 734003200 and for Inode is 2000000. Regards Andrea Mattioli |
| Comment by Niu Yawei (Inactive) [ 19/Dec/11 ] |
such small limits should not trigger the overflow, maybe there are other reasons, or someone (or user app) was trying to set a very high limit but didn't success because of the defect. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 20/Dec/11 ] |
|
So, is it correct? which is the consequence to put to 0 the quota_switch_qs? thanks |
| Comment by Niu Yawei (Inactive) [ 20/Dec/11 ] |
|
The consequence of setting quota_switch_qs=0 is that the qunit expanding/shrinking feature will be disabled, and quota unit (granularity) will always be default size (128M for block quota/5120 inodes for file quota). Without qunit shrinking, write will be more likely getting -EDQUOT when the total usage is still less than limit, because at least 1 qunit (128M) limit is allocated on each OST, even if the user doesn't have any objects on that OST. BTW: Do you know what kind of operations caused this problem? And if possible, could you provide the full stacktrace that Johann mentioned in comment #2? Thanks a lot. |
| Comment by Elia Pinto [ 20/Dec/11 ] |
|
Hi, i am the client that are working with Supporto Lustre jnet2000 on this issue. To get the kernel dump we should enable the functionality of kexec / kdump on RHEL. We will do so whenever possible as this is a mission critical production env and we should be do a reboot for this. But agree that it is a useful thing to do. In any case, from the Lustre stack trace (under /tmp) I noticed that when the system crashes it has a load average of about 400 and seems to have some Lustre processes (kernel threads probably) hung and are not terminated, so the system crashes. These lustre processes are enumerating the secondary user groups via the standard POSIX API ( i am speaking of the default upcall /usr/sbin/l_getgroups), and we are using a central LDAP server as a POSIX USER/GROUP container (RFC2307bis). I believe that the bug is in the fact that these processes are not never terminated. Make sense ? In the meantine could be a useful workraround
Thanks in advance |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 20/Dec/11 ] |
|
We have upload the lustre dump when we open the issue. We don't have any kernel dump. The procedure to switch off the quota_switch_qs is it correct? Should we need some quotacheck after quotaon? We are investigate to find which operation cause the problem. In the past we have experienced the "BZ=22755 |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 20/Dec/11 ] |
|
I made some test: Dec 20 11:05:05 osiride-lp-034 kernel: BUG: soft lockup - CPU#7 stuck for 10s! [ll_mdt_11:7577] After reboot Then i reset my quota as in the first time with Hard Inode limit by 17293822569102704639, now on the server Lustre logs i don't find any Error. Regards Andrea Mattioli |
| Comment by Niu Yawei (Inactive) [ 20/Dec/11 ] |
Hmm, from the statcktrace provided in this ticket, seems the process is stuck in dquot_create_oqaq(), so it's very likely we ran into endless loop while expanding qunit in dquot_create_oqaq(). |
| Comment by Peter Jones [ 20/Dec/11 ] |
|
Andrea\Elia So, is the immediate emergency dealt with and you are satisfied to know how to workaround this bug and that this bug will be fixed in a future version of Lustre? Peter |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 21/Dec/11 ] |
|
Hi Peter, Thanks Andrea Mattioli |
| Comment by Niu Yawei (Inactive) [ 21/Dec/11 ] |
|
Hi, Andrea The quota_switch_qs can't be set by 'lctl conf_param', you might need to write a script to set it after mount lustre. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 21/Dec/11 ] |
|
After deploy workaround in environment "production" we found this error in Lustre server logs: Dec 21 17:11:36 osiride-lp-031 kernel: LustreError: 16687:0:(fsfilt-ldiskfs.c:2248:fsfilt_ldiskfs_dquot()) operate dquot before it's enabled! Regards Andrea Mattioli |
| Comment by Johann Lombardi (Inactive) [ 21/Dec/11 ] |
|
It seems that we were trying to acquire space from the master while it was not ready yet. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 21/Dec/11 ] |
|
hi Johann the log reports everytime i exec the command lfs quotaoff /mountpoin. [root@osiride-lp-018 ~]# lfs quota -g wisi251 /home if i exec the command:lctl get_param lquota.*.quota_switch_qs is this correct? Regards Andrea Mattioli |
| Comment by Johann Lombardi (Inactive) [ 21/Dec/11 ] |
|
> hi Johann the log reports everytime i exec the command lfs quotaoff /mountpoin ok, it is just a transient issue then. I would not care about those messages as long as everything works well once quota is re-enabled. > lquota.home-MDT0000.quota_switch_qs=changing qunit size is disabled Yes, this means that the dynamic qunit feature is correctly disabled. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Dec/11 ] |
|
Hi Johann, Regards Andrea Mattioli |
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 29/Dec/11 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 19/Jan/12 ] |
|
Hello, This is the example: [root@osiride-lp-032 ~]# lfs setquota -g testtest -B 256000 -I 500000 /home [root@osiride-lp-032 ~]# lfs quota -g testtest /home Disk quotas for group testtest (gid 30942): Filesystem kbytes quota limit grace files quota limit grace /home 36 0 256000 - 9 0 500000 - [root@osiride-lp-032 ~]# su - testtest [testtest@osiride-lp-032 ~]$ mkdir asd mkdir: cannot create directory `asd': Disk quota exceeded else it works if i set hard block limit as 512000: [root@osiride-lp-032 ~]# lfs setquota -g testtest -B 512000 -I 500000 /home [root@osiride-lp-032 ~]# lfs quota -g testtest /home Disk quotas for group testtest (gid 30942): Filesystem kbytes quota limit grace files quota limit grace /home 40 0 512000 - 10 0 500000 - [root@osiride-lp-032 ~]# su - testtest [testtest@osiride-lp-032 ~]$ mkdir asd [testtest@osiride-lp-032 ~]$ ls asd private public public_html Like Quota User: [root@osiride-lp-032 ~]# lfs setquota -g testtest -B 0 -I 0 /home [root@osiride-lp-032 ~]# lfs quota -g testtest /home Disk quotas for group testtest (gid 30942): Filesystem kbytes quota limit grace files quota limit grace /home 40 0 0 - 10 0 0 - [root@osiride-lp-032 ~]# lfs setquota -u testtest -B 256000 -I 500000 /home [root@osiride-lp-032 ~]# lfs quota -u testtest /home Disk quotas for user testtest (uid 10942): Filesystem kbytes quota limit grace files quota limit grace /home 44 0 256000 - 11 0 500000 - [root@osiride-lp-032 ~]# su - testtest [testtest@osiride-lp-032 ~]$ mkdir asd2 [testtest@osiride-lp-032 ~]$ ls asd asd2 private public public_html Enabling qunit working properly. Best regards Andrea Mattioli |
| Comment by Niu Yawei (Inactive) [ 20/Jan/12 ] |
|
I think it because the limit (256000) is too small for the MDT/OSTs. Without qunit shrinking, there is at least 1 qunit size block limit (128M) allocated for each OST and MDT, and it can't be revoked by master, if the total limit is too small for each OST/MDT having 1 qunit, then some of the OST/MDT will have 1 block limit at the end. Hi, Andrea You could run 'lfs quota -v -g testtest /home' to see if the MDT is having 1 block limit in such case. To resolve it without enableing qunit shrink, you have to set enough high limit (at least 1 qunit each OST/MDT). Thanks. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 20/Jan/12 ] |
|
Hi, [root@osiride-lp-032 ~]# lfs quota -v -g testtest /home where i can see this value? Thanks Andrea Mattioli |
| Comment by Johann Lombardi (Inactive) [ 25/Jan/12 ] |
|
hm, that's strange, only 1MB was allocated to the slaves. # lctl get_param lquota.*.quota_*_sz lquota.*.quota_switch_qs |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 26/Jan/12 ] |
|
hi this is the output. [root@osiride-lp-030 ~]# lctl get_param lquota..quota__sz lquota.*.quota_switch_qs Best Regards Andrea Mattioli |
| Comment by Johann Lombardi (Inactive) [ 26/Jan/12 ] |
|
All the parameters look sane on the server side. At this point, we would need to collect a debug log. Here is how to proceed: * On the MDS: # lctl set_param debug=+quota+vfstrace # lctl clear * On the client, reproduce the problem: # su - testtest $ mkdir asd * One the MDS: # lctl dk > /tmp/lustre_logs # lctl set_param debug=-quota-vfstrace And then please attach /tmp/lustre_logs to this bug. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 28/Jan/12 ] |
|
Ok, |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 30/Jan/12 ] |
|
Hi Johann, Could you close this issue? Thanks in advance. |
| Comment by Peter Jones [ 31/Jan/12 ] |
|
The first release that this patch is scheduled for inclusion in is Lustre 2.2, which is expected out in a couple of months time. I would suggest that you continue with the existing workaround for now. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 31/Jan/12 ] |
|
Hi Peter, |
| Comment by Johann Lombardi (Inactive) [ 01/Feb/12 ] |
|
I would indeed suggest to upgrade to 1.8.7-wc and use the workaround for now. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 02/Feb/12 ] |
|
We have not any choice to upgrade to another version of Lustre for the next 6 months... so we have to be sure to install a rock solid version!!! Please could you confirm the 1.8.7-wc version? Please close the issue. Thanks |
| Comment by Peter Jones [ 02/Feb/12 ] |
|
Yes Lustre 1.8.7-wc1 is the best option for you. |
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|