[LU-4429] clients leaking open handles/bad lock matching in ll_md_blocking_ast Created: 03/Jan/14  Updated: 10/Feb/15  Resolved: 14/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: ldlm, llite

Issue Links:
Related
is related to LU-4053 client leaking objects/locks during IO Resolved
is related to LU-946 add lprocfs file on MDT to list open ... Resolved
is related to LU-4520 Text file busy error -- mainline 3.12... Resolved
is related to LU-6232 Text file busy error -- lustre 2.6.0 ... Resolved
Severity: 3
Rank (Obsolete): 12170

 Description   

In ll_md_blocking_ast() we try to avoid calling ll_md_real_close() by looking for a same mode OPEN lock on the file.

case LDLM_CB_CANCELING: {
    struct inode *inode = ll_inode_from_resource_lock(lock);
    __u64 bits = lock->l_policy_data.l_inodebits.bits;
    ...

    if (bits & MDS_INODELOCK_XATTR)
                        ll_xattr_cache_destroy(inode);

    /* For OPEN locks we differentiate between lock modes             
     * LCK_CR, LCK_CW, LCK_PR - bug 22891 */
    if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE |
                MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM))
        ll_have_md_lock(inode, &bits, LCK_MINMODE);

    if (bits & MDS_INODELOCK_OPEN)
        ll_have_md_lock(inode, &bits, mode);

    ...
    if (bits & MDS_INODELOCK_OPEN) {
        int flags = 0;
        switch (lock->l_req_mode) {
        case LCK_CW:
            flags = FMODE_WRITE;
            break;
        case LCK_PR:
            flags = FMODE_EXEC;
            break;
        case LCK_CR:
            flags = FMODE_READ;
            break;
        ...
        ll_md_real_close(inode, flags);
}

However the ll_have_md_lock(inode, &bits, LCK_MINMODE) call may match a lock which happens to include MDS_INODELOCK_OPEN but has an inappropriate mode. This will prevent ll_md_real_close() from being called when it should be and leave a stale obd_client_handle in the lli.

That handles are really being leaked is easy to see by using the patch http://review.whamcloud.com/#/c/6386/ from LU-946. Then do

# llmount.sh
...
# DURATION=10 sh ./lustre/tests/racer.sh
...
# lsof /mnt/lustre
# lctl set_param ldlm.namespaces.*mdc*.lru_size=clear
# lctl get_param ldlm.namespaces.*mdc*.lru_size
# lctl dk > 1.dk
# cat /proc/fs/lustre/mdt/lustre-MDT0000/exports/0\@lo/open_files
[0x200000400:0x9d1:0x0] 04240000001 0xbd7c9c99fdb17cfc
[0x200000401:0x893:0x0] 04240000001 0xbd7c9c99fdad82ad

Note that I have modified the patch to also print the flags and cookie of the MFD.



 Comments   
Comment by John Hammond [ 03/Jan/14 ]

Please see http://review.whamcloud.com/8718.

Comment by John Hammond [ 11/Jan/14 ]

Patch landed to master.

Comment by Bob Glossman (Inactive) [ 13/Feb/14 ]

backport to b2_5
http://review.whamcloud.com/9260

Generated at Sat Feb 10 01:42:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.