Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.10.0
Labels:
None
Environment:

Hide
CentOS 7 servers
  kernel-3.10.0-514.21.1.el7_lustre.x86_64
  lustre-2.10.0-1.el7.x86_64
  lustre-dkms-2.10.0-1.el7.noarch
  lustre-osd-zfs-mount-2.10.0-1.el7.x86_64
  lustre-resource-agents-2.10.0-1.el7.x86_64
CentOS 6 clients
  lustre-client-2.10.0-1.el6.x86_64
  lustre-client-dkms-2.10.0-1.el6.noarch
ZFS for OSTs & MDT
  libzfs2-0.7.3-1.el7_3.x86_64
  libzfs2-devel-0.7.3-1.el7_3.x86_64
  zfs-0.7.3-1.el7_3.x86_64
  zfs-dkms-0.7.3-1.el7_3.noarch
  zfs-release-1-4.el7_3.centos.noarch
DKMS kernel modules

Show
CentOS 7 servers   kernel-3.10.0-514.21.1.el7_lustre.x86_64   lustre-2.10.0-1.el7.x86_64   lustre-dkms-2.10.0-1.el7.noarch   lustre-osd-zfs-mount-2.10.0-1.el7.x86_64   lustre-resource-agents-2.10.0-1.el7.x86_64 CentOS 6 clients   lustre-client-2.10.0-1.el6.x86_64   lustre-client-dkms-2.10.0-1.el6.noarch ZFS for OSTs & MDT   libzfs2-0.7.3-1.el7_3.x86_64   libzfs2-devel-0.7.3-1.el7_3.x86_64   zfs-0.7.3-1.el7_3.x86_64   zfs-dkms-0.7.3-1.el7_3.noarch   zfs-release-1-4.el7_3.centos.noarch DKMS kernel modules

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We have a Lustre 2.10.0 filesystem which was built with two OSSes containing 5 OSTs each. Last week I added a third OSS (same exact hardware, slightly newer OS software except the kernel & lustre). When I created the OSTs with mkfs.lustre, the filesystem seemed to grow correctly. We currently only set and enforce group quotas.

Later that day, we noticed the output of `lfs quota -g $GROUP /center1` was showing bad values and an error message. Here's an example.

chinook02:PENGUIN$ sudo lfs quota -g penguin /center1
Disk quotas for grp penguin (gid 12738):
Filesystem kbytes quota limit grace files quota limit grace
/center1 [214] 1073741824 1181116006 - 13 0 0 -
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.

We found a workaround. As soon as the group has data written to the new OSTs, `lfs quota` seems to work fine.

chinook02:PENGUIN$ lfs setstripe -i -1 -c -1 loforbes
chinook02:PENGUIN$ dd of=loforbes/testfile if=/dev/urandom bs=1M count=15
15+0 records in
15+0 records out
15728640 bytes (16 MB) copied, 1.80694 s, 8.7 MB/s
chinook02:PENGUIN$ sudo lfs quota -g penguin /center1
Disk quotas for grp penguin (gid 12738):
Filesystem kbytes quota limit grace files quota limit grace
/center1 671997883 1073741824 1181116006 - 13 0 0 -
chinook02:PENGUIN$ lfs getstripe loforbes/testfile
loforbes/testfile
lmm_stripe_count: 15
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 12
obdidx objid objid group
12 31981 0x7ced 0
7 62233208 0x3b59a78 0
14 32068 0x7d44 0
8 72183233 0x44d6dc1 0
10 31854 0x7c6e 0
11 31849 0x7c69 0
2 68917015 0x41b9717 0
5 71171215 0x43dfc8f 0
1 69395583 0x422e47f 0
13 32088 0x7d58 0
9 68211489 0x410d321 0
6 70389457 0x4320ed1 0
4 70225352 0x42f8dc8 0
3 66783438 0x3fb08ce 0
0 65674625 0x3ea1d81 0

We figured out it's not really necessary to have data on the 10 original OSTs, just the 5 new ones for this to work. I've implemented this workaround for all projects using our lustre filesystem.

Before implementing the workaround, we tried "deleting" a group's quota and recreating. That didn't seem to impact the issue. We also tried unmounting and remounting the filesystem on a client. Again, no change. Removing all files owned by a group that have data on the new OSTs results in `lfs quota` showing the error again.

We are considering a Lustre 2.10.1 update sometime soon.

Regards,
-liam

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdsLogs.tar.gz
1.36 MB
22/Jan/18 10:57 PM

Activity

People

Assignee:: Hongchao Zhang

Reporter:: Liam Forbes (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Nov/17 11:41 PM

Updated:: 02/Feb/18 4:17 AM