Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.0.0
-
None
-
RHEL6.0 GA with kernel 2.6.32-71
-
3
-
23,707
-
5091
Description
It's not possible for us to use quotas on our Lustre 2.0 over RHEL6.0.
When we try to do "lfs quotacheck -ug /fs1/" from a client, and being fs1 a Lustre fs, command hangs without replying (you can see the logs on bugzilla#23707).
After investigation, we found that Lustre have some conflicts with new code introduced in kernel's quotas code (fs/quota/). The Lustre stack would be like this:
ll_quota_on (> return sb>s_qcop->quota_on(sb, off, ver, name... (with ver=QFMT_VFS_V0), returning code =-1)
fsfilt_ext3_quotactl
quota_onoff
fsfilt_ext3_quotacheck
Then, in the ldiskfs (ext4) stack we have the next entries:
v2_read_file_info (from line "dqopt->ops[type]->read_file_info(sb, type) on vfs_load_quota_inode) rc=-1
vfs_load_quota_inode
vfs_quota_on_path
ldiskfs_quota_on
The -1 value comes from the beginning of v2_read_file_info function:
static int v2_read_file_info(struct super_block *sb, int type)
{
struct v2_disk_dqinfo dinfo;
struct v2_disk_dqheader dqhead;
struct mem_dqinfo *info = sb_dqinfo(sb, type);
struct qtree_mem_dqinfo *qinfo;
ssize_t size;
unsigned int version;
if (!v2_read_header(sb, type, &dqhead))
return -1;
version = le32_to_cpu(dqhead.dqh_version);
if ((info->dqi_fmt_id == QFMT_VFS_V0 && version != 0) ||
(info->dqi_fmt_id == QFMT_VFS_V1 && version != 1))
return -1;
The first condition statement (!v2_read_header(sb, type, &dqhead)) is false, the second is true so v2_read_file_info returns '1'. Values are "info>dqi_fmt_id = QFMT_VFS_V0" and "version = 1", so there is a version mismatch between the version in the header of quotas file and the version stored on the ldiskfs super block (info->dqi_fmt_id).
This new condition statement was introduced in 2.6.33-rc (commit 869835dfad3eb6f7d90c3255a24b084fea82f30d "quota: Improve checking of quota file header") and then it was accepted for RHEL6.0 GA so Lustre would hit this problem in any new kernel with this commit on it.
I was looking where do we initialize the dqhead.dqh_version (this is the 'bad' value as we use QFMT_VFS_V0 for quotas on Lustre, isn't it?) but I didn't find it. I also looked how ext4 initialize this value but I didn't find it. This is the first time I'm looking at the quotas code so I ask for help to somebody knowing more than me on quotas:
- Do you know where do Lustre or ldiskfs initialize the headers value (&sb->s_dquot->files[0].dqh_version)? Adding traces in Lustre code didn't help me.
- Do you think is possible to have a work-around? Using QFMT_VFS_V1 instead of QFMT_VFS_V0 doesn't seem a proper WA...