[LU-13856] ost00 100% full Created: 05/Aug/20 Updated: 15/Mar/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ryan Seal | Assignee: | Peter Jones |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 1 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
I am unable to write data to the lustre file system due to ost00 being 100% full. I am receiving the following error: State of repository file us unknown due to error while truncating file: Error writting iobuffer for '<file>': No space left on device |
| Comments |
| Comment by Andreas Dilger [ 05/Aug/20 ] |
|
Typically it is best to keep the filesystem below 90% full to avoid sudden large application IO causing the filesystem to run out of space, and to avoid performance loss as the slowest inner tracks of the disk are used and free space fragmentation results in poor allocations. Several things to do in this case:
|
| Comment by Ryan Seal [ 05/Aug/20 ] |
|
ost00 is the only one that is full. The other are around 70% or less. I noticed that #pcs cluster status shows pcsd as offline for the 4 oss servers I have but crm_mon. When I run lfs df -i or lfs df -h I get no output. For lfs getstripe, where can i get the lustre_mountpoint? |
| Comment by Andreas Dilger [ 05/Aug/20 ] |
|
The "<lustre_mountpoint>" is the directory where Lustre is mounted on the client. I don't know what that is, since there is exceedingly little in this ticket for me to work with. The "lfs df" and "lfs df -i" and "lfs getstripe" commands need to be run on a client node. When you write that these commands "get no output", does that mean "they return immediately without printing anything"? That probably means that they are being run on the server instead of a client. Or do you mean "they hang forever and do not print anything", which probably means that the servers are not working properly, which would match your comments that report there are OSS nodes offline. That said, if there are OSTs which are not working properly, that is useful information to have. Are there errors reported on the console of the client or OSS nodes, beyond the "no space left on device" error? |
| Comment by Ryan Seal [ 06/Aug/20 ] |
|
I was able to get the pcsd back online this morning. Currently I have set ost0000 to not active and inop on the primary mds by doing the following: lctl --device 8 deactivate lctl set_param .osp.data-OST0000*.active=0 Despite this being set it is still trying to write to ost00.
I was running lfs on the servers which explains why the command was not working. "lfs df" reports that ost0000 is mounted on "/data[OST:0]. We have many client. Does it matter what client the migrate is ran on? I have 4 OSS's with 3 ost's mounted on each. Could you help with how to migrate data off of ost00 to the others? Also will the migration be done on the servers or clients?
|
| Comment by Andreas Dilger [ 06/Aug/20 ] |
|
If you are using Lustre 2.10.7 or later on the servers then the preferred mechanism to stop file creation on the OST is "lctl set_param osp.data-OST0000*.max_create_count=0", which will stop file creation but not deactivate the OST completely. You can set "lctl set_param osp.data-OST0000*.active=1" to reactivate that OST.
The mechanism to migrate files off OST0000 was already listed in my previous comment:
On the clients. If the "lfs find" is run on different subdirectories then you could run a few of them in parallel on different clients.
Not really. |
| Comment by Peter Jones [ 08/Aug/20 ] |
|
Hi Ryan I'm just checking in to see how things are progressing Peter |
| Comment by Ryan Seal [ 13/Aug/21 ] |
|
Is there a way to use lctl get_param to show the max_create_count? |
| Comment by Andreas Dilger [ 13/Aug/21 ] |
|
Yes, "lctl get_param osp.*.max_create_count" on the MDS. This will normally default to 20000. |
| Comment by Ryan Seal [ 14/Sep/21 ] |
|
It looks like the root of this issue is with the striping of the data across all the ost's. How do i configure lustre to stripe data evenly across all the ost's. Is this done on the primary mds? How do I see the current striping configuration. |
| Comment by Andreas Dilger [ 14/Sep/21 ] |
|
You can get the filesystem-wide default striping by running "lfs getstripe -d <root directory>" on a client. The default is: stripe_count: 1 stripe_size: 1048576 pattern: 0 stripe_offset: -1 The MDS will normally balance new files across all OSTs pretty evenly, unless told otherwise. It can happen that OSTs become imbalanced if there are very large 1-stripe files created on one OST, or if the default striping for a directory incorrectly uses "--stripe-index=0" to force creation on OST0000 instead of "--stripe-index=-1" that allows the MDS to select any OST. Note that it is also possible to set a different default file layout on any subdirectory, or on a per-file basis, so the cause of this imbalance may be in a specific subdirectory. |
| Comment by Ryan Seal [ 29/Sep/21 ] |
|
How would I set the striping to the default setting? The output from the getstripe returned : stripe_count: -1 stripe_size: 4194304 pattern: raid0 stripe_offset: 0 Could this be the reason ost00 is filling up? Would I set the stripe on a client? |
| Comment by Andreas Dilger [ 30/Sep/21 ] |
|
Yes, setting "stripe_offset: 0" means "put all files onto OST0000". Also, the "stripe_count: -1" means "stripe across all OSTs", which is probably also not what you want, since this adds significant overhead for small files, and consumes a lot of objects needlessly on every OST. You can fix both of these issues by running the following command as the root user (or via sudo) on any client: # lfs setstripe --stripe-size=4M --stripe-index=-1 --stripe-count=1 <root_directory> If you have a significant number of large files, it would be much better to set a default layout that is using the PFL feature: # lfs setstripe -E 256M -c 1 -E 16G -c 4 -E eof -S 4M -c 40 <root_directory> In this example, "large file" means files up to 256MB will use a single OST, then up to 16GB will stripe across 4 OSTs, and anything larger than 16GB in size will be striped across (up to) 40 OSTs (if your filesystem has fewer than 40 OSTs, it will use the number available). See https://wiki.lustre.org/Configuring_Lustre_File_Striping for details. |
| Comment by Ryan Seal [ 17/Nov/21 ] |
|
I set the striping to the recommended setting above. #lfs setstripe -E 256M -c 1 -E 16G -c 4 -E eof -S 4M -c 40 <root_directory> After doing this when trying to migrate data from one ost to another by: #lfs find /data/example/ -ost 13 -mtime +10 -size +128m -print0 | lfs_migrate -y -0 -i -1 I get the following error: lfs migrate migrate: unrecognized option '-1' Is this due to the new striping configuration? If so, how do I manually migrate data in the event 1 ost becomes almost full?
|
| Comment by Ryan Seal [ 07/Dec/21 ] |
|
Any updates? |
| Comment by Andreas Dilger [ 09/Dec/21 ] |
|
It might be that you need to specify "{{-i-1}" (no space) to the lfs_migrate script, but in any case this should not be needed. However, if the files are large and striped across all OSTs, then migrating them will not actually reduce space usage, since the same smount of data will be on OST0000 after the migration. My recommendation would be migrate the largest files from the OST0000, but reduce the stripe count slightly below the actual number of OSTs, so that the full OST can be skipped, like: client# lfs find -ost 13 -size +16G -mtime +10 -print0 |
xargs -0 lfs migrate -c<ost_count - 1>
The other option would be to migrate a lot of small files that only have data on OST0000, since that avoids moving a lot of extra data , like: client# lfs find -ost 13 -size -256M -mtime +10 -print0 | |
| Comment by Ryan Seal [ 15/Mar/22 ] |
|
I ran #lfs find -ost 13 -size -256M -mtime +10 -print0 |xargs -0 lfs migrate -c1 on one of the clients and I am getting the following error:
lfs_migrate is currently NOT SAFE for moving in-use files. Use it only when you are sure migrated files are not in use. If emptying an OST that is active on the MDS, new files may use it. To stop allocating any new objects on OSTNNNN run: lct set_param osp.<fname>-OSTNNNN*.max_create_count=0 on each MDS using the OST(s) being emptied Continue? (y/n?
I only have 1 active MDS with 1 MDT mounted and am only trying to migrate objects off ost00 across the other 10 ost's. Before running the migrate I set max_create_count=0 for ost00 and verified the change. After receiving this error, I also set active=0 with the same result. I have confirmed that no new objects have been create since setting max_create_count=0 by checking the in use size of ost00 over the last 24 hours. Is it safe to continue or does the syntax of the lfs_migrate need to be adjusted? If so could you provide that as well? |
| Comment by Andreas Dilger [ 15/Mar/22 ] |
|
You should not set active=0 on the MDS, since that will prevent it from destroying objects on OST0000. That was the old mechanism (before Lustre 2.4) and is no longer needed with max_create_count=0. The "NOT SAFE" warning is a bit old, but has not been removed yet for a couple of reasons ( |