Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.3, Lustre 2.1.4
-
3
-
6116
Description
We have a lot of nodes with a large amount of unreclaimable memory (over 4GB). Whatever we try to do (manually shrinking the cache, clearing lru locks, ...) the memory can't be recovered. The only way to get the memory back is to umount the lustre filesystem.
After some troubleshooting, I was able to wrote a small reproducer where I just open(2) then close(2) files in O_RDWR (my reproducer use to open thousand of files to emphasize the issue).
Attached 2 programs :
- gentree.c (cc -o gentree gentree.c -lpthread) to generate a tree of known files (no need to use readdir in reproducer.c)
- reproducer.c (cc -o reproducer reproduver.c -lpthread) to reproduce the issue.
The macro BASE_DIR has to be adjust according the local cluster configuration (you should provide the name of a directory located on a lustre filesystem).
There is no link between the 2 phases as rebooting the client between gentree & reproducer does't avoid the problem. Running gentree (which open as much files as reproducer) doesn't show the issue.
Niu, in fact we don't need to wait for commit in case of closed open (no create) and exactly that case causes this bug with unreclaimable space. And I don't see why server help is needed here - client knows there was close and knows this is non-create open - that is enough to make decision to drop request from replay queue. I am not sure though how easy to distinguish non-create case from OPEN-CREATE, at first sign we need to check disposition flag for DISP_OPEN_CREATE bit. So possible solution can be:
1) after open reply check disposition for DISP_OPEN_CREATE bit and save that information in md_open_data, OR just take disposition from already saved mod_open_req during mdc_close()
2) in mdc_close() there is already mod->mod_open_req->rq_replay is set to 0, we set also mod_open_req->rq_commit_nowait or any other new flag for non-create open.
3) in ptlrpc_free_committed() check that rq_commit_nowait flag and free such request immediately no matter what transno it has.
Will that works? Am I missing something?