[LU-5562] running out of inodes Created: 29/Aug/14  Updated: 10/Sep/14  Resolved: 10/Sep/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.2
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Frederik Ferner (Inactive) Assignee: Andreas Dilger
Resolution: Not a Bug Votes: 0
Labels: None

Rank (Obsolete): 15514

 Description   

We are currently facing the issue that on one of our file systems is running out of inodes on the MDT while we still have disk space on the OSTs. (Basically, bad estimates on average file size...) We are working on identifying why the average file size has gone down recently and also remove small files.

We're also looking into other ways to increase the available inodes on the MDT. Now that we are running lustre 2.5, adding additional MDTs is an option we are considering but this won't happen in the short term. We have spare capacity on the MDT backend storage system that we could add to the MDT LVM LV.

The question is, would increasing the disk size and a resize of the ldiskfs file system increase the number of inodes available? Is this something that has been done/tested? Is this a recommended way to increase the number of available inodes?

Would it be a better idea to take the space in the LVM VG, create a separate logical volume, format that as MDT and add that to the file system, moving a couple of directories there? It seems we would be fragmenting the file system doing this and would loose flexibility in placing our files, so I'd probably want to avoid this if growing the underlying MDT file system is a viable option.



 Comments   
Comment by Andreas Dilger [ 29/Aug/14 ]

I've done this several times on my home Lustre filesystem. It is possible to increase the size of the existing MDT LV using lvresize -L new_size /dev/vgXX/lvYY and then using resize2fs to resize the filesystem inside to the new LV size. The unfortunate thing is that resize2fs cannot currently resize the MDT or OST filesystem while it is mounted because there is no connection between the Lustre pseudo mountpoint for the MDT (which can only print the statfs() statistics for "df") and the underlying ldiskfs filesystem (LU-137 relates to that limitation). You will need to unmount the MDT, run e2fsck on it (a requirement of resize2fs), and then do the resize before mounting it again. You may also want to consider creating an LVM snapshot of the existing filesystem before resizing it, and/or create a device-level backup in case of problems.

The number of inodes added will be proportional to the increase in space of the MDT. It isn't possible to change the inode ratio for the MDT filesystem (i.e. pack more inodes into the same space) except at format time.

Another option would be to format a new MDT filesystem (possibly larger, possibly with more inodes per unit space using the "-i" option) in a separate LV, do a file-level copy of the whole MDT over to the new LV, and then use that to replace the existing MDT. The advantage of this is that you have a second copy of the filesystem (the old one can be kept for some time as a backup in case of problems), and you can potentially increase the inode packing (depending on how many inodes are in use in the existing MDT and how many free blocks it has). The drawback is that this may take a significant amount of time to do the copy since it needs to do a file-level copy and not a device-level copy. This process is described in the Lustre Manual under backup/restore procedures.

Adding a second MDT in DNE is an option, but this hasn't been used by (m)any users in production at this point. It does fragment the namespace, and there is a performance cost for cross-MDT operations. You would probably want to do this with a new filesystem where you can split up users or projects between the MDTs, since in 2.5 the granularity is only whole directory trees and there is no automatic space balancing yet.

Comment by Frederik Ferner (Inactive) [ 01/Sep/14 ]

Thanks for the details. Good do know we can increase the number of inodes. We have ~400GB free leaving some room for LVM snapshots, it still would give us some benefit over the currently 1.5TB MDT that would make it worth scheduling a down time for it. Currently our pain isn't big enough (yet) to go down the route of formatting a new MDT and migrating the file system (we've done this on a previous system to migrate to faster hardware, at least then it was very slow.)

Currently we're freeing up inodes by applying purging policies and identifying all those small/very small files that users have sneakily put on this file system.

For the second MDT, is moving a whole directory tree between MDTs supported?

Comment by Andreas Dilger [ 02/Sep/14 ]

In 2.4 and 2.5 the only way to move files to a second MDT is to create a new directory on the newly added MDT and the copy all of the files and directories over to this directory. The "mv" utility would do this, but it may result in a lot of data copying. In 2.6 there is prototype code for being able to migrate the directory entries over to the new MDT without copying the data, but this isn't fully supported yet.

Comment by Frederik Ferner (Inactive) [ 10/Sep/14 ]

Andreas,

thanks for looking into this. We have now gone down the file system resize route, increasing the number of inodes from ~400M to ~500M with additional 400GB of VG. It's good to know what the other alternatives are in case this is not sufficient...

In the mean time I'd say this issue can be closed.

Thanks,
Frederik

Comment by Peter Jones [ 10/Sep/14 ]

Thanks Ferderik

Generated at Sat Feb 10 01:52:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.