[LU-1777] open-by-fid: deadlock in lock_rename() Created: 21/Aug/12  Updated: 20/Jul/17

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: open-by-fid
Environment:
  1. uanme -r
    2.6.32-279.5.1.el6.x86_64
  2. cat /proc/fs/lustre/version
    lustre: 2.2.93
    kernel: patchless_client
    build: 2.2.93-gbaaf628-PRISTINE-2.6.32-279.5.1.el6.x86_64

Issue Links:
Related
is related to LU-1518 Missing/bad operations in mdd_{obf,do... Resolved
Severity: 3
Rank (Obsolete): 10464

 Description   
[root]# /usr/src/lustre-release/lustre/tests/llmount.sh
[root]# cd /mnt/lustre/
[root]# mkdir sanity
[root]# chown sanity: sanity
[root]# su sanity
[sanity]$ pwd
/mnt/lustre
[sanity]$ sys_path2fid .
[0x61ab:0xef3d87c8:0x0]
[sanity]$ sys_rename sanity .lustre/fid/[0x61ab:0xef3d87c8:0x0]/sanity

rename() wedges in lock_rename().

INFO: task sys_rename:2960 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sys_rename    D 0000000000000000     0  2960   2933 0x00000080
 ffff88005cc37cf8 0000000000000082 ffff88005cc37d08 ffffffff81189a05
 0000001000000000 ffff88007b01cb70 ffffffff8100bc0e ffff88005cc37cf8
 ffff880062a67098 ffff88005cc37fd8 000000000000fb88 ffff880062a67098
Call Trace:
 [<ffffffff81189a05>] ? __link_path_walk+0x155/0x1030
 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8104f18b>] ? mutex_spin_on_owner+0x9b/0xc0
 [<ffffffff814ff2fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ff19b>] mutex_lock+0x2b/0x50
 [<ffffffff811878e3>] lock_rename+0x73/0xe0
 [<ffffffff8118af83>] sys_renameat+0x113/0x260
 [<ffffffff8119a470>] ? mntput_no_expire+0x30/0x110
 [<ffffffff8117cb11>] ? __fput+0x1a1/0x210
 [<ffffffff81142c7e>] ? remove_vma+0x6e/0x90
 [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff815036de>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8118b0eb>] sys_rename+0x1b/0x20
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[root]# pidof sys_rename
2960
[root]# cat /proc/2960/stack
[<ffffffff811878e3>] lock_rename+0x73/0xe0
[<ffffffff8118af83>] sys_renameat+0x113/0x260
[<ffffffff8118b0eb>] sys_rename+0x1b/0x20
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


 Comments   
Comment by Peter Jones [ 27/Aug/12 ]

Niu

Could you please look at this one? It is similar to the work you just did for LU1518

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 27/Aug/12 ]

this should be fixed along with LU-1518.

Comment by Peter Jones [ 27/Aug/12 ]

ok then let's close this ticket as a duplicate and just ensure that the LU1518 fix cover this case also

Comment by Peter Jones [ 01/Sep/12 ]

As per John this was not fixed by the LU-1518 after all so reopening

Comment by Niu Yawei (Inactive) [ 03/Sep/12 ]

I think there isn't a quick fix for such deadlock. We need some way on server side to detect the recursive rename, which should check the 'fid' directory as well.

Given that rename files in the 'fid' directory isn't an legal usage, I suggest let's lower the priority of this ticket and fix it in later version.

Comment by Andreas Dilger [ 03/Sep/12 ]

While it would be good to get this fixed for 2.3, since this only affects the client and not the MDS, I'm removing this as a blocker for 2.3 and moving it to 2.4. This isn't a problem that can be hit accidentally.

Comment by Niu Yawei (Inactive) [ 10/Oct/12 ]

The client is dealock in lock_rename():

struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
{
        struct dentry *p;

        if (p1 == p2) {
                mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
                return NULL;
        }

        mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex);

        p = d_ancestor(p2, p1);
        if (p) {
                mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_PARENT);
                mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_CHILD);
                return p;
        }

        p = d_ancestor(p1, p2);
        if (p) {
                mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
                mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
                return p;
        }

        mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
        mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
        return NULL;
}

The root cause is that with 'fid' directory, we can have two directory dentries pointing to the same inode on client, so lock_rename() will try to lock the same inode from two different dentries twice. Without patching kernel, I'm not sure if there is any good way to solve it.

Anyway, I don't think it should be a blocker for 2.4. Andreas, any comments? Thanks.

Comment by Andreas Dilger [ 19/Oct/12 ]

Is it possible to block renames that involve the .lustre directory?

Comment by Niu Yawei (Inactive) [ 19/Oct/12 ]

No, I don't think so. It block renames that involve the 'fid' directory.

Comment by parinay v kondekar (Inactive) [ 03/Aug/15 ]

@Niu Yawei,
I am facing the deadlock in lock_rename while running sanity/test_154a, open_by_fid test.

Jul 31 16:10:01 localhost kernel: mrename       D 0000000000000000     0 19394  19337 0x00000080
Jul 31 16:10:01 localhost kernel: ffff8800054f1cf8 0000000000000082 0000000000000000 ffffffff811850f5
Jul 31 16:10:01 localhost kernel: 0000000000000000 ffffea00001267b8 ffffffff8100bc0e ffff8800054f1cf8
Jul 31 16:10:01 localhost kernel: ffff88000dffc678 ffff8800054f1fd8 000000000000f4e8 ffff88000dffc678
Jul 31 16:10:01 localhost kernel: Call Trace:
Jul 31 16:10:01 localhost kernel: [<ffffffff811850f5>] ? __link_path_walk+0x155/0x1030
Jul 31 16:10:01 localhost kernel: [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
Jul 31 16:10:01 localhost kernel: [<ffffffff8104d92d>] ? mutex_spin_on_owner+0x8d/0xc0
Jul 31 16:10:01 localhost kernel: [<ffffffff814eebbe>] __mutex_lock_slowpath+0x13e/0x180
Jul 31 16:10:01 localhost kernel: [<ffffffff81183b01>] ? path_put+0x31/0x40
Jul 31 16:10:01 localhost kernel: [<ffffffff814eea5b>] mutex_lock+0x2b/0x50
Jul 31 16:10:01 localhost kernel: [<ffffffff81182f83>] lock_rename+0x73/0xe0
Jul 31 16:10:01 localhost kernel: [<ffffffff81186673>] sys_renameat+0x113/0x260
Jul 31 16:10:01 localhost kernel: [<ffffffff81195b70>] ? mntput_no_expire+0x30/0x110
Jul 31 16:10:01 localhost kernel: [<ffffffff81178271>] ? __fput+0x1a1/0x210
Jul 31 16:10:01 localhost kernel: [<ffffffff8113f43e>] ? remove_vma+0x6e/0x90
Jul 31 16:10:01 localhost kernel: [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
Jul 31 16:10:01 localhost kernel: [<ffffffff814f2fce>] ? do_page_fault+0x3e/0xa0
Jul 31 16:10:01 localhost kernel: [<ffffffff811867db>] sys_rename+0x1b/0x20
Jul 31 16:10:01 localhost kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

This is reproducible every time I run sanity/154a. This is lustre 2.1.5 ( esp with backports of LU-4279, LU-3245 )
Can you provide some pointers so that I can work on the fix ?

Thanks

Comment by Vinayak (Inactive) [ 18/Sep/15 ]

Hello Andreas, Niu Yawei,

I have also faced this dead lock while renaming .lustre to .lustre using its fid. i.e

echo "rename .lustre to itself"
fid=$($LFS path2fid $DIR)
mrename $DIR/.lustre $DIR/.lustre/fid/$fid/.lustre &&
error "rename .lustre to itself should fail."

call trace.

 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 mrename       D 0000000000000000     0 25974  25917 0x00000080
 ffff880014401cf8 0000000000000086 ffff880014401d08 ffffffff811850f5
 0000000000000000 ffffea00004fe840 ffffffff8100bc0e ffff880014401cf8
 ffff8800054afa78 ffff880014401fd8 000000000000f4e8 ffff8800054afa78
 Call Trace:
 [<ffffffff811850f5>] ? __link_path_walk+0x155/0x1030
 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8104d92d>] ? mutex_spin_on_owner+0x8d/0xc0
 [<ffffffff814eebbe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff81183b01>] ? path_put+0x31/0x40
 [<ffffffff814eea5b>] mutex_lock+0x2b/0x50
 [<ffffffff81182f83>] lock_rename+0x73/0xe0
 [<ffffffff81186673>] sys_renameat+0x113/0x260
 [<ffffffff81195b70>] ? mntput_no_expire+0x30/0x110
 [<ffffffff81178271>] ? __fput+0x1a1/0x210
 [<ffffffff8113f43e>] ? remove_vma+0x6e/0x90
 [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff814f2fce>] ? do_page_fault+0x3e/0xa0
 [<ffffffff811867db>] sys_rename+0x1b/0x20
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

I have tried to catch this issue in llite layer and return -EPERM from there but not successful. Is this case not currently not supported by lustre or Am i doing something wrong here ?

Generated at Sat Feb 10 01:19:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.