Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Upstream
-
None
-
3
-
9223372036854775807
Description
many racer runs timeout, bisected that to patch:
https://review.whamcloud.com/32158/
Subject: LU-10948 llite: Introduce inode open heat counter
Project: fs/lustre-release
Branch: master
Commit: 41d99c4902836b7265db946dfa49cf99381f0db4
Attachments
Issue Links
- is related to
-
LU-10948 client cache open lock after N opens
-
- Open
-
basically the problem is that:
1) openlock is taken on directories
2) the client doesn't cancel locks on umount:
/* obd_force == local only */ ldlm_cli_cancel_unused(obd->obd_namespace, NULL, obd->obd_force ? LCF_LOCAL : 0, NULL);
3) thus MDT has to clear those "lost" locks, so close the directories
4) a close may result in directory removal
5) directory can be striped, thus needs all involved MDTs to be healty
6) MDTs are stopped one by one
locally I "solved" the problem disabling opencache for directories:
but it looks like we've got few real problems to solve here. the most serious one, IMO, is to handle such a close on MDT in a better manner, so that umount doesn't get stuck indefinitely.