Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.3, Lustre 2.1.4
-
3
-
6116
Description
We have a lot of nodes with a large amount of unreclaimable memory (over 4GB). Whatever we try to do (manually shrinking the cache, clearing lru locks, ...) the memory can't be recovered. The only way to get the memory back is to umount the lustre filesystem.
After some troubleshooting, I was able to wrote a small reproducer where I just open(2) then close(2) files in O_RDWR (my reproducer use to open thousand of files to emphasize the issue).
Attached 2 programs :
- gentree.c (cc -o gentree gentree.c -lpthread) to generate a tree of known files (no need to use readdir in reproducer.c)
- reproducer.c (cc -o reproducer reproduver.c -lpthread) to reproduce the issue.
The macro BASE_DIR has to be adjust according the local cluster configuration (you should provide the name of a directory located on a lustre filesystem).
There is no link between the 2 phases as rebooting the client between gentree & reproducer does't avoid the problem. Running gentree (which open as much files as reproducer) doesn't show the issue.
Mike, I think there is no way to achieve this without server side changes, I can think of two ways so far:
1. Server treats open/close as committed transactions, and returns client both last committed transno & last real transno (on-disk transno), client drops committed open & close request immediately after close. That's what I did in my patch.
2. Server assigns no transno for open/close, and client open-replay mechanism must be adapted to this change (like Siyao mentioned in the review comment: track open handle in fs layer, and rebuild request when replay the open, and some other changes to open, close, open lock code could be required)
The second solution looks cleaner to me, but it requires more code changes, and it'll be little tricky to handle open-create & open differently on client side.