[LU-91] Impossible to use quotas on RHEL6.0 Created: 22/Feb/11  Updated: 25/Mar/11  Resolved: 24/Mar/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Diego Moreno (Inactive) Assignee: Johann Lombardi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL6.0 GA with kernel 2.6.32-71


Attachments: File change_lquota_version_rhel6.patch    
Severity: 3
Bugzilla ID: 23,707
Epic: RHEL6, ext4, ldiskfs, quotacheck, quotas
Rank (Obsolete): 5091

 Description   

It's not possible for us to use quotas on our Lustre 2.0 over RHEL6.0.

When we try to do "lfs quotacheck -ug /fs1/" from a client, and being fs1 a Lustre fs, command hangs without replying (you can see the logs on bugzilla#23707).

After investigation, we found that Lustre have some conflicts with new code introduced in kernel's quotas code (fs/quota/). The Lustre stack would be like this:

ll_quota_on (> return sb>s_qcop->quota_on(sb, off, ver, name... (with ver=QFMT_VFS_V0), returning code =-1)
fsfilt_ext3_quotactl
quota_onoff
fsfilt_ext3_quotacheck

Then, in the ldiskfs (ext4) stack we have the next entries:

v2_read_file_info (from line "dqopt->ops[type]->read_file_info(sb, type) on vfs_load_quota_inode) rc=-1
vfs_load_quota_inode
vfs_quota_on_path
ldiskfs_quota_on

The -1 value comes from the beginning of v2_read_file_info function:

static int v2_read_file_info(struct super_block *sb, int type)
{
struct v2_disk_dqinfo dinfo;
struct v2_disk_dqheader dqhead;
struct mem_dqinfo *info = sb_dqinfo(sb, type);
struct qtree_mem_dqinfo *qinfo;
ssize_t size;
unsigned int version;

if (!v2_read_header(sb, type, &dqhead))
return -1;
version = le32_to_cpu(dqhead.dqh_version);
if ((info->dqi_fmt_id == QFMT_VFS_V0 && version != 0) ||
(info->dqi_fmt_id == QFMT_VFS_V1 && version != 1))
return -1;

The first condition statement (!v2_read_header(sb, type, &dqhead)) is false, the second is true so v2_read_file_info returns '1'. Values are "info>dqi_fmt_id = QFMT_VFS_V0" and "version = 1", so there is a version mismatch between the version in the header of quotas file and the version stored on the ldiskfs super block (info->dqi_fmt_id).

This new condition statement was introduced in 2.6.33-rc (commit 869835dfad3eb6f7d90c3255a24b084fea82f30d "quota: Improve checking of quota file header") and then it was accepted for RHEL6.0 GA so Lustre would hit this problem in any new kernel with this commit on it.

I was looking where do we initialize the dqhead.dqh_version (this is the 'bad' value as we use QFMT_VFS_V0 for quotas on Lustre, isn't it?) but I didn't find it. I also looked how ext4 initialize this value but I didn't find it. This is the first time I'm looking at the quotas code so I ask for help to somebody knowing more than me on quotas:

  • Do you know where do Lustre or ldiskfs initialize the headers value (&sb->s_dquot->files[0].dqh_version)? Adding traces in Lustre code didn't help me.
  • Do you think is possible to have a work-around? Using QFMT_VFS_V1 instead of QFMT_VFS_V0 doesn't seem a proper WA...


 Comments   
Comment by Diego Moreno (Inactive) [ 23/Feb/11 ]

I continued analysing the issue and I found where does dqhead version value get initialized. Actually I didn't find it before because it was obviously initialized the first time we run "lfs quotacheck" on client.

It's initialized twice:

  • The first time it's initialized in function lustre_init_quota_header, with values from macros LUSTRE_QUOTA_V2(1) and LUSTRE_INITQVERSIONS_V2( {1,1}).

    - The second one it's initialized in function v3_write_dqheader, with value from V2_INITQVERSIONS_R1 which comes from value in V2_INITQVERSIONS(which is {1,1}

    ).

I changed all initializations to zero and quotas are now working properly. But the problem comes from the last macro, V2_INITQVERSIONS, which is in quotas kernel code (fs/quota/quotaio_v2.h). I don't know if this is a kernel bug but, do you think it could be a kernel bug?

Using another macro to initialize dqhead version to '0' can be a WA for new kernels but it doesn't seem like a proper solution for old 2.6.18, what do you think?

Comment by Johann Lombardi (Inactive) [ 24/Feb/11 ]

The 32-bit quota format is no longer supported on 2.x, so i think it should be fine to just replace QFMT_VFS_V0 with QFMT_VFS_V1 in lustre/lvfs/fsfilt_ext3.c. Diego, would you mind giving this a try?

Comment by Diego Moreno (Inactive) [ 25/Feb/11 ]

That made the trick. Actually when I tried that solution I forgot to update OSS packages...

Now it works.

Thanks Johann,

Comment by Diego Moreno (Inactive) [ 25/Feb/11 ]

Patch changing QFMT_VFS_V0 to QFMT_VFS_V1 for quotas initialization on recent kernels.

Comment by Johann Lombardi (Inactive) [ 25/Feb/11 ]

Thanks for the quick feedback. If we don't want to break older kernels (like RHEL5), we need to add something like:
#ifdef QFMT_VFS_V1
... use QFMT_VFS_V1
#else
... use QFMT_VFS_V0
#endif

Actually, we might just want to use something like QFMT_LUSTRE in the lustre code and define it as appropriate.
I will push such a patch through gerrit.

Comment by Johann Lombardi (Inactive) [ 28/Feb/11 ]

Updated patch pushed to gerrit:
http://review.whamcloud.com/#change,268

Comment by Build Master (Inactive) [ 24/Mar/11 ]

Integrated in reviews-centos5 #552
LU-91 Fix quota format problem with RHEL6 and kernels >= 2.6.33

Johann Lombardi : 93171cb31cacbd12d976a71ae056775c8e4583b9
Files :

  • lustre/lvfs/fsfilt_ext3.c
Comment by Build Master (Inactive) [ 24/Mar/11 ]

Integrated in lustre-master-centos5 #162
LU-91 Fix quota format problem with RHEL6 and kernels >= 2.6.33

Oleg Drokin : b25eb219a157dda49a57e14f7bbc400a52a10a9d
Files :

  • lustre/lvfs/fsfilt_ext3.c
Comment by Peter Jones [ 24/Mar/11 ]

Fix is now landed to master

Comment by Build Master (Inactive) [ 25/Mar/11 ]

Integrated in lustre-master » x86_64,ubuntu #13
LU-91 Fix quota format problem with RHEL6 and kernels >= 2.6.33

Brian J. Murrell : cc6dc918f19cbabdcb7333d7dccd48fd6e3d72cf
Files :

  • lustre/lvfs/fsfilt_ext3.c
Comment by Build Master (Inactive) [ 25/Mar/11 ]

Integrated in reviews-centos5 #571
LU-91 Fix quota format problem with RHEL6 and kernels >= 2.6.33

Brian J. Murrell : cc6dc918f19cbabdcb7333d7dccd48fd6e3d72cf
Files :

  • lustre/lvfs/fsfilt_ext3.c
Comment by Build Master (Inactive) [ 25/Mar/11 ]

Integrated in reviews-rhel6 #55
LU-91 Fix quota format problem with RHEL6 and kernels >= 2.6.33

Brian J. Murrell : cc6dc918f19cbabdcb7333d7dccd48fd6e3d72cf
Files :

  • lustre/lvfs/fsfilt_ext3.c
Comment by Build Master (Inactive) [ 25/Mar/11 ]

Integrated in lustre-master » x86_64,el6 #13
LU-91 Fix quota format problem with RHEL6 and kernels >= 2.6.33

Brian J. Murrell : cc6dc918f19cbabdcb7333d7dccd48fd6e3d72cf
Files :

  • lustre/lvfs/fsfilt_ext3.c
Generated at Sat Feb 10 01:03:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.