Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1777

open-by-fid: deadlock in lock_rename()

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.4.0
    • # uanme -r
      2.6.32-279.5.1.el6.x86_64
      # cat /proc/fs/lustre/version
      lustre: 2.2.93
      kernel: patchless_client
      build: 2.2.93-gbaaf628-PRISTINE-2.6.32-279.5.1.el6.x86_64
    • 3
    • 10464

    Description

      [root]# /usr/src/lustre-release/lustre/tests/llmount.sh
      [root]# cd /mnt/lustre/
      [root]# mkdir sanity
      [root]# chown sanity: sanity
      [root]# su sanity
      [sanity]$ pwd
      /mnt/lustre
      [sanity]$ sys_path2fid .
      [0x61ab:0xef3d87c8:0x0]
      [sanity]$ sys_rename sanity .lustre/fid/[0x61ab:0xef3d87c8:0x0]/sanity
      

      rename() wedges in lock_rename().

      INFO: task sys_rename:2960 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      sys_rename    D 0000000000000000     0  2960   2933 0x00000080
       ffff88005cc37cf8 0000000000000082 ffff88005cc37d08 ffffffff81189a05
       0000001000000000 ffff88007b01cb70 ffffffff8100bc0e ffff88005cc37cf8
       ffff880062a67098 ffff88005cc37fd8 000000000000fb88 ffff880062a67098
      Call Trace:
       [<ffffffff81189a05>] ? __link_path_walk+0x155/0x1030
       [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
       [<ffffffff8104f18b>] ? mutex_spin_on_owner+0x9b/0xc0
       [<ffffffff814ff2fe>] __mutex_lock_slowpath+0x13e/0x180
       [<ffffffff814ff19b>] mutex_lock+0x2b/0x50
       [<ffffffff811878e3>] lock_rename+0x73/0xe0
       [<ffffffff8118af83>] sys_renameat+0x113/0x260
       [<ffffffff8119a470>] ? mntput_no_expire+0x30/0x110
       [<ffffffff8117cb11>] ? __fput+0x1a1/0x210
       [<ffffffff81142c7e>] ? remove_vma+0x6e/0x90
       [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0
       [<ffffffff815036de>] ? do_page_fault+0x3e/0xa0
       [<ffffffff8118b0eb>] sys_rename+0x1b/0x20
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      
      [root]# pidof sys_rename
      2960
      [root]# cat /proc/2960/stack
      [<ffffffff811878e3>] lock_rename+0x73/0xe0
      [<ffffffff8118af83>] sys_renameat+0x113/0x260
      [<ffffffff8118b0eb>] sys_rename+0x1b/0x20
      [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            [LU-1777] open-by-fid: deadlock in lock_rename()

            Hello Andreas, Niu Yawei,

            I have also faced this dead lock while renaming .lustre to .lustre using its fid. i.e

            echo "rename .lustre to itself"
            fid=$($LFS path2fid $DIR)
            mrename $DIR/.lustre $DIR/.lustre/fid/$fid/.lustre &&
            error "rename .lustre to itself should fail."

            call trace.

             "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
             mrename       D 0000000000000000     0 25974  25917 0x00000080
             ffff880014401cf8 0000000000000086 ffff880014401d08 ffffffff811850f5
             0000000000000000 ffffea00004fe840 ffffffff8100bc0e ffff880014401cf8
             ffff8800054afa78 ffff880014401fd8 000000000000f4e8 ffff8800054afa78
             Call Trace:
             [<ffffffff811850f5>] ? __link_path_walk+0x155/0x1030
             [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
             [<ffffffff8104d92d>] ? mutex_spin_on_owner+0x8d/0xc0
             [<ffffffff814eebbe>] __mutex_lock_slowpath+0x13e/0x180
             [<ffffffff81183b01>] ? path_put+0x31/0x40
             [<ffffffff814eea5b>] mutex_lock+0x2b/0x50
             [<ffffffff81182f83>] lock_rename+0x73/0xe0
             [<ffffffff81186673>] sys_renameat+0x113/0x260
             [<ffffffff81195b70>] ? mntput_no_expire+0x30/0x110
             [<ffffffff81178271>] ? __fput+0x1a1/0x210
             [<ffffffff8113f43e>] ? remove_vma+0x6e/0x90
             [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
             [<ffffffff814f2fce>] ? do_page_fault+0x3e/0xa0
             [<ffffffff811867db>] sys_rename+0x1b/0x20
             [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
            

            I have tried to catch this issue in llite layer and return -EPERM from there but not successful. Is this case not currently not supported by lustre or Am i doing something wrong here ?

            vinayakh Vinayak (Inactive) added a comment - Hello Andreas, Niu Yawei, I have also faced this dead lock while renaming .lustre to .lustre using its fid. i.e echo "rename .lustre to itself" fid=$($LFS path2fid $DIR) mrename $DIR/.lustre $DIR/.lustre/fid/$fid/.lustre && error "rename .lustre to itself should fail." call trace. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mrename D 0000000000000000 0 25974 25917 0x00000080 ffff880014401cf8 0000000000000086 ffff880014401d08 ffffffff811850f5 0000000000000000 ffffea00004fe840 ffffffff8100bc0e ffff880014401cf8 ffff8800054afa78 ffff880014401fd8 000000000000f4e8 ffff8800054afa78 Call Trace: [<ffffffff811850f5>] ? __link_path_walk+0x155/0x1030 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8104d92d>] ? mutex_spin_on_owner+0x8d/0xc0 [<ffffffff814eebbe>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81183b01>] ? path_put+0x31/0x40 [<ffffffff814eea5b>] mutex_lock+0x2b/0x50 [<ffffffff81182f83>] lock_rename+0x73/0xe0 [<ffffffff81186673>] sys_renameat+0x113/0x260 [<ffffffff81195b70>] ? mntput_no_expire+0x30/0x110 [<ffffffff81178271>] ? __fput+0x1a1/0x210 [<ffffffff8113f43e>] ? remove_vma+0x6e/0x90 [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff814f2fce>] ? do_page_fault+0x3e/0xa0 [<ffffffff811867db>] sys_rename+0x1b/0x20 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b I have tried to catch this issue in llite layer and return -EPERM from there but not successful. Is this case not currently not supported by lustre or Am i doing something wrong here ?

            @Niu Yawei,
            I am facing the deadlock in lock_rename while running sanity/test_154a, open_by_fid test.

            Jul 31 16:10:01 localhost kernel: mrename       D 0000000000000000     0 19394  19337 0x00000080
            Jul 31 16:10:01 localhost kernel: ffff8800054f1cf8 0000000000000082 0000000000000000 ffffffff811850f5
            Jul 31 16:10:01 localhost kernel: 0000000000000000 ffffea00001267b8 ffffffff8100bc0e ffff8800054f1cf8
            Jul 31 16:10:01 localhost kernel: ffff88000dffc678 ffff8800054f1fd8 000000000000f4e8 ffff88000dffc678
            Jul 31 16:10:01 localhost kernel: Call Trace:
            Jul 31 16:10:01 localhost kernel: [<ffffffff811850f5>] ? __link_path_walk+0x155/0x1030
            Jul 31 16:10:01 localhost kernel: [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
            Jul 31 16:10:01 localhost kernel: [<ffffffff8104d92d>] ? mutex_spin_on_owner+0x8d/0xc0
            Jul 31 16:10:01 localhost kernel: [<ffffffff814eebbe>] __mutex_lock_slowpath+0x13e/0x180
            Jul 31 16:10:01 localhost kernel: [<ffffffff81183b01>] ? path_put+0x31/0x40
            Jul 31 16:10:01 localhost kernel: [<ffffffff814eea5b>] mutex_lock+0x2b/0x50
            Jul 31 16:10:01 localhost kernel: [<ffffffff81182f83>] lock_rename+0x73/0xe0
            Jul 31 16:10:01 localhost kernel: [<ffffffff81186673>] sys_renameat+0x113/0x260
            Jul 31 16:10:01 localhost kernel: [<ffffffff81195b70>] ? mntput_no_expire+0x30/0x110
            Jul 31 16:10:01 localhost kernel: [<ffffffff81178271>] ? __fput+0x1a1/0x210
            Jul 31 16:10:01 localhost kernel: [<ffffffff8113f43e>] ? remove_vma+0x6e/0x90
            Jul 31 16:10:01 localhost kernel: [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
            Jul 31 16:10:01 localhost kernel: [<ffffffff814f2fce>] ? do_page_fault+0x3e/0xa0
            Jul 31 16:10:01 localhost kernel: [<ffffffff811867db>] sys_rename+0x1b/0x20
            Jul 31 16:10:01 localhost kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
            

            This is reproducible every time I run sanity/154a. This is lustre 2.1.5 ( esp with backports of LU-4279, LU-3245 )
            Can you provide some pointers so that I can work on the fix ?

            Thanks

            parinay parinay v kondekar (Inactive) added a comment - @Niu Yawei, I am facing the deadlock in lock_rename while running sanity/test_154a, open_by_fid test. Jul 31 16:10:01 localhost kernel: mrename D 0000000000000000 0 19394 19337 0x00000080 Jul 31 16:10:01 localhost kernel: ffff8800054f1cf8 0000000000000082 0000000000000000 ffffffff811850f5 Jul 31 16:10:01 localhost kernel: 0000000000000000 ffffea00001267b8 ffffffff8100bc0e ffff8800054f1cf8 Jul 31 16:10:01 localhost kernel: ffff88000dffc678 ffff8800054f1fd8 000000000000f4e8 ffff88000dffc678 Jul 31 16:10:01 localhost kernel: Call Trace: Jul 31 16:10:01 localhost kernel: [<ffffffff811850f5>] ? __link_path_walk+0x155/0x1030 Jul 31 16:10:01 localhost kernel: [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20 Jul 31 16:10:01 localhost kernel: [<ffffffff8104d92d>] ? mutex_spin_on_owner+0x8d/0xc0 Jul 31 16:10:01 localhost kernel: [<ffffffff814eebbe>] __mutex_lock_slowpath+0x13e/0x180 Jul 31 16:10:01 localhost kernel: [<ffffffff81183b01>] ? path_put+0x31/0x40 Jul 31 16:10:01 localhost kernel: [<ffffffff814eea5b>] mutex_lock+0x2b/0x50 Jul 31 16:10:01 localhost kernel: [<ffffffff81182f83>] lock_rename+0x73/0xe0 Jul 31 16:10:01 localhost kernel: [<ffffffff81186673>] sys_renameat+0x113/0x260 Jul 31 16:10:01 localhost kernel: [<ffffffff81195b70>] ? mntput_no_expire+0x30/0x110 Jul 31 16:10:01 localhost kernel: [<ffffffff81178271>] ? __fput+0x1a1/0x210 Jul 31 16:10:01 localhost kernel: [<ffffffff8113f43e>] ? remove_vma+0x6e/0x90 Jul 31 16:10:01 localhost kernel: [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0 Jul 31 16:10:01 localhost kernel: [<ffffffff814f2fce>] ? do_page_fault+0x3e/0xa0 Jul 31 16:10:01 localhost kernel: [<ffffffff811867db>] sys_rename+0x1b/0x20 Jul 31 16:10:01 localhost kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b This is reproducible every time I run sanity/154a. This is lustre 2.1.5 ( esp with backports of LU-4279 , LU-3245 ) Can you provide some pointers so that I can work on the fix ? Thanks

            No, I don't think so. It block renames that involve the 'fid' directory.

            niu Niu Yawei (Inactive) added a comment - No, I don't think so. It block renames that involve the 'fid' directory.

            Is it possible to block renames that involve the .lustre directory?

            adilger Andreas Dilger added a comment - Is it possible to block renames that involve the .lustre directory?

            The client is dealock in lock_rename():

            struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
            {
                    struct dentry *p;
            
                    if (p1 == p2) {
                            mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
                            return NULL;
                    }
            
                    mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex);
            
                    p = d_ancestor(p2, p1);
                    if (p) {
                            mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_PARENT);
                            mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_CHILD);
                            return p;
                    }
            
                    p = d_ancestor(p1, p2);
                    if (p) {
                            mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
                            mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
                            return p;
                    }
            
                    mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT);
                    mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD);
                    return NULL;
            }
            

            The root cause is that with 'fid' directory, we can have two directory dentries pointing to the same inode on client, so lock_rename() will try to lock the same inode from two different dentries twice. Without patching kernel, I'm not sure if there is any good way to solve it.

            Anyway, I don't think it should be a blocker for 2.4. Andreas, any comments? Thanks.

            niu Niu Yawei (Inactive) added a comment - The client is dealock in lock_rename(): struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) { struct dentry *p; if (p1 == p2) { mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); return NULL; } mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex); p = d_ancestor(p2, p1); if (p) { mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_PARENT); mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_CHILD); return p; } p = d_ancestor(p1, p2); if (p) { mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD); return p; } mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); mutex_lock_nested(&p2->d_inode->i_mutex, I_MUTEX_CHILD); return NULL; } The root cause is that with 'fid' directory, we can have two directory dentries pointing to the same inode on client, so lock_rename() will try to lock the same inode from two different dentries twice. Without patching kernel, I'm not sure if there is any good way to solve it. Anyway, I don't think it should be a blocker for 2.4. Andreas, any comments? Thanks.

            While it would be good to get this fixed for 2.3, since this only affects the client and not the MDS, I'm removing this as a blocker for 2.3 and moving it to 2.4. This isn't a problem that can be hit accidentally.

            adilger Andreas Dilger added a comment - While it would be good to get this fixed for 2.3, since this only affects the client and not the MDS, I'm removing this as a blocker for 2.3 and moving it to 2.4. This isn't a problem that can be hit accidentally.

            I think there isn't a quick fix for such deadlock. We need some way on server side to detect the recursive rename, which should check the 'fid' directory as well.

            Given that rename files in the 'fid' directory isn't an legal usage, I suggest let's lower the priority of this ticket and fix it in later version.

            niu Niu Yawei (Inactive) added a comment - I think there isn't a quick fix for such deadlock. We need some way on server side to detect the recursive rename, which should check the 'fid' directory as well. Given that rename files in the 'fid' directory isn't an legal usage, I suggest let's lower the priority of this ticket and fix it in later version.
            pjones Peter Jones added a comment -

            As per John this was not fixed by the LU-1518 after all so reopening

            pjones Peter Jones added a comment - As per John this was not fixed by the LU-1518 after all so reopening
            pjones Peter Jones added a comment -

            ok then let's close this ticket as a duplicate and just ensure that the LU1518 fix cover this case also

            pjones Peter Jones added a comment - ok then let's close this ticket as a duplicate and just ensure that the LU1518 fix cover this case also

            this should be fixed along with LU-1518.

            niu Niu Yawei (Inactive) added a comment - this should be fixed along with LU-1518 .

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: