Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.6.0
-
3
-
12170
Description
In ll_md_blocking_ast() we try to avoid calling ll_md_real_close() by looking for a same mode OPEN lock on the file.
case LDLM_CB_CANCELING: { struct inode *inode = ll_inode_from_resource_lock(lock); __u64 bits = lock->l_policy_data.l_inodebits.bits; ... if (bits & MDS_INODELOCK_XATTR) ll_xattr_cache_destroy(inode); /* For OPEN locks we differentiate between lock modes * LCK_CR, LCK_CW, LCK_PR - bug 22891 */ if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM)) ll_have_md_lock(inode, &bits, LCK_MINMODE); if (bits & MDS_INODELOCK_OPEN) ll_have_md_lock(inode, &bits, mode); ... if (bits & MDS_INODELOCK_OPEN) { int flags = 0; switch (lock->l_req_mode) { case LCK_CW: flags = FMODE_WRITE; break; case LCK_PR: flags = FMODE_EXEC; break; case LCK_CR: flags = FMODE_READ; break; ... ll_md_real_close(inode, flags); }
However the ll_have_md_lock(inode, &bits, LCK_MINMODE) call may match a lock which happens to include MDS_INODELOCK_OPEN but has an inappropriate mode. This will prevent ll_md_real_close() from being called when it should be and leave a stale obd_client_handle in the lli.
That handles are really being leaked is easy to see by using the patch http://review.whamcloud.com/#/c/6386/ from LU-946. Then do
# llmount.sh ... # DURATION=10 sh ./lustre/tests/racer.sh ... # lsof /mnt/lustre # lctl set_param ldlm.namespaces.*mdc*.lru_size=clear # lctl get_param ldlm.namespaces.*mdc*.lru_size # lctl dk > 1.dk # cat /proc/fs/lustre/mdt/lustre-MDT0000/exports/0\@lo/open_files [0x200000400:0x9d1:0x0] 04240000001 0xbd7c9c99fdb17cfc [0x200000401:0x893:0x0] 04240000001 0xbd7c9c99fdad82ad
Note that I have modified the patch to also print the flags and cookie of the MFD.