[LU-10238] adding new OSTs causes quota reporting error Created: 13/Nov/17 Updated: 02/Feb/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Liam Forbes | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7 servers |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We have a Lustre 2.10.0 filesystem which was built with two OSSes containing 5 OSTs each. Last week I added a third OSS (same exact hardware, slightly newer OS software except the kernel & lustre). When I created the OSTs with mkfs.lustre, the filesystem seemed to grow correctly. We currently only set and enforce group quotas. Later that day, we noticed the output of `lfs quota -g $GROUP /center1` was showing bad values and an error message. Here's an example. chinook02:PENGUIN$ sudo lfs quota -g penguin /center1 We found a workaround. As soon as the group has data written to the new OSTs, `lfs quota` seems to work fine. chinook02:PENGUIN$ lfs setstripe -i -1 -c -1 loforbes We figured out it's not really necessary to have data on the 10 original OSTs, just the 5 new ones for this to work. I've implemented this workaround for all projects using our lustre filesystem. Before implementing the workaround, we tried "deleting" a group's quota and recreating. That didn't seem to impact the issue. We also tried unmounting and remounting the filesystem on a client. Again, no change. Removing all files owned by a group that have data on the new OSTs results in `lfs quota` showing the error again. We are considering a Lustre 2.10.1 update sometime soon. Regards, |
| Comments |
| Comment by James Nunez (Inactive) [ 20/Dec/17 ] |
|
Hongchao - Would you please look into this issue? Thank you |
| Comment by Hongchao Zhang [ 29/Dec/17 ] |
|
Hi Liam, I can't reproduce the issue in my local VMs, could you please attach the logs (syslog and debug log) when the issue occurred? btw, please add quota to the debug log by "lctl set_param debug=+quota". |
| Comment by Liam Forbes [ 22/Jan/18 ] |
|
Hongchao, I'm attaching the syslog file from the two days when we added the new OSS (oss09) to the filesystem. Unfortunately, I can't say exactly what time that occurred. Also unfortunately, I don't seem to have the syslogs from that OSS on that day either. Here are the system logs that occur when we get the error message in the `lfs quota` output. From a client: No messages occur on the MDS or OSS. Could this be an LNET issue? Regards, |
| Comment by Hongchao Zhang [ 02/Feb/18 ] |
|
Hi Liam, The issue is related to the OSS, could you please get the quota usage of some non-existing group (say, 20000) on your site |