Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.6.0
-
3
-
12170
Description
In ll_md_blocking_ast() we try to avoid calling ll_md_real_close() by looking for a same mode OPEN lock on the file.
case LDLM_CB_CANCELING: {
struct inode *inode = ll_inode_from_resource_lock(lock);
__u64 bits = lock->l_policy_data.l_inodebits.bits;
...
if (bits & MDS_INODELOCK_XATTR)
ll_xattr_cache_destroy(inode);
/* For OPEN locks we differentiate between lock modes
* LCK_CR, LCK_CW, LCK_PR - bug 22891 */
if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE |
MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM))
ll_have_md_lock(inode, &bits, LCK_MINMODE);
if (bits & MDS_INODELOCK_OPEN)
ll_have_md_lock(inode, &bits, mode);
...
if (bits & MDS_INODELOCK_OPEN) {
int flags = 0;
switch (lock->l_req_mode) {
case LCK_CW:
flags = FMODE_WRITE;
break;
case LCK_PR:
flags = FMODE_EXEC;
break;
case LCK_CR:
flags = FMODE_READ;
break;
...
ll_md_real_close(inode, flags);
}
However the ll_have_md_lock(inode, &bits, LCK_MINMODE) call may match a lock which happens to include MDS_INODELOCK_OPEN but has an inappropriate mode. This will prevent ll_md_real_close() from being called when it should be and leave a stale obd_client_handle in the lli.
That handles are really being leaked is easy to see by using the patch http://review.whamcloud.com/#/c/6386/ from LU-946. Then do
# llmount.sh ... # DURATION=10 sh ./lustre/tests/racer.sh ... # lsof /mnt/lustre # lctl set_param ldlm.namespaces.*mdc*.lru_size=clear # lctl get_param ldlm.namespaces.*mdc*.lru_size # lctl dk > 1.dk # cat /proc/fs/lustre/mdt/lustre-MDT0000/exports/0\@lo/open_files [0x200000400:0x9d1:0x0] 04240000001 0xbd7c9c99fdb17cfc [0x200000401:0x893:0x0] 04240000001 0xbd7c9c99fdad82ad
Note that I have modified the patch to also print the flags and cookie of the MFD.