Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.0
-
None
-
CentOS 7 servers
kernel-3.10.0-514.21.1.el7_lustre.x86_64
lustre-2.10.0-1.el7.x86_64
lustre-dkms-2.10.0-1.el7.noarch
lustre-osd-zfs-mount-2.10.0-1.el7.x86_64
lustre-resource-agents-2.10.0-1.el7.x86_64
CentOS 6 clients
lustre-client-2.10.0-1.el6.x86_64
lustre-client-dkms-2.10.0-1.el6.noarch
ZFS for OSTs & MDT
libzfs2-0.7.3-1.el7_3.x86_64
libzfs2-devel-0.7.3-1.el7_3.x86_64
zfs-0.7.3-1.el7_3.x86_64
zfs-dkms-0.7.3-1.el7_3.noarch
zfs-release-1-4.el7_3.centos.noarch
DKMS kernel modulesCentOS 7 servers kernel-3.10.0-514.21.1.el7_lustre.x86_64 lustre-2.10.0-1.el7.x86_64 lustre-dkms-2.10.0-1.el7.noarch lustre-osd-zfs-mount-2.10.0-1.el7.x86_64 lustre-resource-agents-2.10.0-1.el7.x86_64 CentOS 6 clients lustre-client-2.10.0-1.el6.x86_64 lustre-client-dkms-2.10.0-1.el6.noarch ZFS for OSTs & MDT libzfs2-0.7.3-1.el7_3.x86_64 libzfs2-devel-0.7.3-1.el7_3.x86_64 zfs-0.7.3-1.el7_3.x86_64 zfs-dkms-0.7.3-1.el7_3.noarch zfs-release-1-4.el7_3.centos.noarch DKMS kernel modules
-
3
-
9223372036854775807
Description
We have a Lustre 2.10.0 filesystem which was built with two OSSes containing 5 OSTs each. Last week I added a third OSS (same exact hardware, slightly newer OS software except the kernel & lustre). When I created the OSTs with mkfs.lustre, the filesystem seemed to grow correctly. We currently only set and enforce group quotas.
Later that day, we noticed the output of `lfs quota -g $GROUP /center1` was showing bad values and an error message. Here's an example.
chinook02:PENGUIN$ sudo lfs quota -g penguin /center1
Disk quotas for grp penguin (gid 12738):
Filesystem kbytes quota limit grace files quota limit grace
/center1 [214] 1073741824 1181116006 - 13 0 0 -
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
We found a workaround. As soon as the group has data written to the new OSTs, `lfs quota` seems to work fine.
chinook02:PENGUIN$ lfs setstripe -i -1 -c -1 loforbes
chinook02:PENGUIN$ dd of=loforbes/testfile if=/dev/urandom bs=1M count=15
15+0 records in
15+0 records out
15728640 bytes (16 MB) copied, 1.80694 s, 8.7 MB/s
chinook02:PENGUIN$ sudo lfs quota -g penguin /center1
Disk quotas for grp penguin (gid 12738):
Filesystem kbytes quota limit grace files quota limit grace
/center1 671997883 1073741824 1181116006 - 13 0 0 -
chinook02:PENGUIN$ lfs getstripe loforbes/testfile
loforbes/testfile
lmm_stripe_count: 15
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 12
obdidx objid objid group
12 31981 0x7ced 0
7 62233208 0x3b59a78 0
14 32068 0x7d44 0
8 72183233 0x44d6dc1 0
10 31854 0x7c6e 0
11 31849 0x7c69 0
2 68917015 0x41b9717 0
5 71171215 0x43dfc8f 0
1 69395583 0x422e47f 0
13 32088 0x7d58 0
9 68211489 0x410d321 0
6 70389457 0x4320ed1 0
4 70225352 0x42f8dc8 0
3 66783438 0x3fb08ce 0
0 65674625 0x3ea1d81 0
We figured out it's not really necessary to have data on the 10 original OSTs, just the 5 new ones for this to work. I've implemented this workaround for all projects using our lustre filesystem.
Before implementing the workaround, we tried "deleting" a group's quota and recreating. That didn't seem to impact the issue. We also tried unmounting and remounting the filesystem on a client. Again, no change. Removing all files owned by a group that have data on the new OSTs results in `lfs quota` showing the error again.
We are considering a Lustre 2.10.1 update sometime soon.
Regards,
-liam