Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      PID: 350193  TASK: ffff9bd65af446c0  CPU: 0   COMMAND: "getfattr"
       #0 [ffff9bd63ffb7950] __schedule at ffffffffba5a232d
          /tmp/kernel/kernel/sched/core.c: 3109
       #1 [ffff9bd63ffb79d8] schedule at ffffffffba5a2748
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 84
       #2 [ffff9bd63ffb79e8] rwsem_down_write_slowpath at ffffffffba0f41a7
          /tmp/kernel/./arch/x86/include/asm/current.h: 15
       #3 [ffff9bd63ffb7a88] down_write at ffffffffba5a691a
          /tmp/kernel/./include/linux/err.h: 36
       #4 [ffff9bd63ffb7ac0] vvp_inode_ops at ffffffffc116d57f [lustre]
          /home/lustre/linux-4.18.0-305.25.1.el8_4/./arch/x86/include/asm/current.h: 15
       #5 [ffff9bd63ffb7ae0] cl_object_inode_ops at ffffffffc0454a50 [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_object.c: 442
       #6 [ffff9bd63ffb7b18] lov_conf_set at ffffffffc0aa36c4 [lov]
          /home/lustre/master-mine/lustre/lov/lov_object.c: 1465
       #7 [ffff9bd63ffb7b88] cl_conf_set at ffffffffc04542d8 [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_object.c: 299
       #8 [ffff9bd63ffb7bb8] ll_layout_conf at ffffffffc111d110 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 5995
       #9 [ffff9bd63ffb7c28] ll_layout_refresh at ffffffffc111dad3 [lustre]
          /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155
      #10 [ffff9bd63ffb7cf0] vvp_io_init at ffffffffc116d019 [lustre]
          /home/lustre/master-mine/lustre/llite/vvp_io.c: 1870
      #11 [ffff9bd63ffb7d20] __cl_io_init at ffffffffc045e66f [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_io.c: 134
      #12 [ffff9bd63ffb7d58] cl_glimpse_size0 at ffffffffc11642ca [lustre]
          /home/lustre/master-mine/lustre/llite/glimpse.c: 204
      #13 [ffff9bd63ffb7da0] ll_getattr_dentry at ffffffffc111c65d [lustre]
          /home/lustre/master-mine/lustre/llite/llite_internal.h: 1677
      #14 [ffff9bd63ffb7e50] vfs_statx at ffffffffba1d4be9
          /tmp/kernel/fs/stat.c: 204
      

      checking the stack on the process above inode was found at 0xffff9bd60367d350:

      crash> p *(struct ll_inode_info *)(0xffff9bd60367d350-0x150)
        lli_inode_magic = 287116773,
      ...
        lli_inode_lock_owner = 0xffff9bd68f51d380
      

      now check task 0xffff9bd68f51d380:

      crash> p *(struct task_struct *)0xffff9bd68f51d380|more
      ...
        pid = 348428,
      ...
      PID: 348428  TASK: ffff9bd68f51d380  CPU: 1   COMMAND: "lfs"
       #0 [ffff9bd613c37968] __schedule at ffffffffba5a232d
          /tmp/kernel/kernel/sched/core.c: 3109
       #1 [ffff9bd613c379f0] schedule at ffffffffba5a2748
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 84
       #2 [ffff9bd613c37a00] schedule_preempt_disabled at ffffffffba5a2a6c
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 79
       #3 [ffff9bd613c37a08] __mutex_lock at ffffffffba5a3a40
          /tmp/kernel/kernel/locking/mutex.c: 1038
       #4 [ffff9bd613c37ac8] ll_layout_refresh at ffffffffc111d577 [lustre]
          /home/lustre/master-mine/lustre/llite/llite_internal.h: 1536
       #5 [ffff9bd613c37b88] vvp_io_init at ffffffffc116d019 [lustre]
          /home/lustre/master-mine/lustre/llite/vvp_io.c: 1870
       #6 [ffff9bd613c37bb8] __cl_io_init at ffffffffc045e66f [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_io.c: 134
       #7 [ffff9bd613c37bf0] ll_ioc_data_version at ffffffffc110c665 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 3193
       #8 [ffff9bd613c37c28] ll_migrate at ffffffffc111b244 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 3227
       #9 [ffff9bd613c37ca8] ll_dir_ioctl at ffffffffc1105563 [lustre]
          /home/lustre/master-mine/lustre/llite/dir.c: 2277
      #10 [ffff9bd613c37e88] do_vfs_ioctl at ffffffffba1e3199
          /tmp/kernel/fs/ioctl.c: 48
      

      it seems this is an locking order issue:
      ll_migrate() takes inode lock, then lli_layout_mutex (in ll_layout_refresh()) while other ops (like getfattr) use the reversed order.

      Attachments

        Issue Links

          Activity

            [LU-16958] migrate vs regular ops deadlock

            "Vladimir Saveliev <vladimir.saveliev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58280
            Subject: LU-16958 tests: race between setxattr and layout refresh
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 01d051deee047c1f150fdddb4d98f6e275c92f99

            gerrit Gerrit Updater added a comment - "Vladimir Saveliev <vladimir.saveliev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58280 Subject: LU-16958 tests: race between setxattr and layout refresh Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 01d051deee047c1f150fdddb4d98f6e275c92f99

            "LU-16958 llite: migrate deadlock on not responding lock cancel" makes
            the following race possible which causes eviction of a client as it
            does not cancel a layout ldlm lock in time (noticed in racer runs):

            P1:
            ll_file_write
              ..
              ll_file_io_generic
                cl_io_rw_init
                  cl_io_init
                    __cl_io_init
                      vvp_io_init
                        ll_layout_refresh
                          ll_take_md_lock  <<< ldlm lock is being held from here
                            ll_layout_lock_set
                              ll_layout_conf
                                cl_conf_set
                                  lov_conf_set
                                    cl_object_inode_ops
                                      vvp_inode_ops(COIO_INODE_LOCK)
                                        ll_inode_lock
                                          inode_lock  <<< stuck as P2 does not
                                                                      release the inode lock
                                 ldlm_lock_decref
            

            P2:

            vfs_removexattr
              inode_lock(inode)    <<< inode lock is taken
              ..
              lmv_setxattr
                mdc_setxattr
                  mdc_xattr_common
                    ptlrpc_queue_wait <<< stuck on server as P1 does not cancel
                                                            ldlm layout lock
            

            This results in the client eviction:

            [102925.161943] Lustre: mdt00_005: service thread pid 40421 was inactive for 40.055 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
            [102925.161948] Pid: 40421, comm: mdt00_005 3.10.0-my #10 SMP Mon Feb 24 14:14:43 MSK 2025
            [102925.161950] Call Trace:
            [102925.162835] [<0>] ldlm_completion_ast+0x903/0xd40 [ptlrpc]
            ..
            [102985.329244] LustreError: 38239:0:(ldlm_lockd.c:252:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 192.168.100.181@tcp  ns: mdt-lustre-MDT0000_UUID lock: ffff9a2a52626a00/0x837949271f786277 lrc: 3/0,0 mode: CR/CR res: [0x200001b72:0x4:0x0].0x0 bits 0xa/0x0 rrc: 4 type: IBT gid 0 flags: 0x60200400000020 nid: 192.168.100.181@tcp remote: 0x27ac170200a62996 expref: 13 pid: 40421 timeout: 102982 lvb_type: 0
            
            vsaveliev Vladimir Saveliev added a comment - " LU-16958 llite: migrate deadlock on not responding lock cancel" makes the following race possible which causes eviction of a client as it does not cancel a layout ldlm lock in time (noticed in racer runs): P1: ll_file_write   ..   ll_file_io_generic     cl_io_rw_init       cl_io_init         __cl_io_init           vvp_io_init             ll_layout_refresh               ll_take_md_lock  <<< ldlm lock is being held from here                 ll_layout_lock_set                   ll_layout_conf                     cl_conf_set                       lov_conf_set                         cl_object_inode_ops                           vvp_inode_ops(COIO_INODE_LOCK)                             ll_inode_lock                               inode_lock  <<< stuck as P2 does not                                               release the inode lock                      ldlm_lock_decref P2: vfs_removexattr   inode_lock(inode)    <<< inode lock is taken   ..   lmv_setxattr     mdc_setxattr       mdc_xattr_common         ptlrpc_queue_wait <<< stuck on server as P1 does not cancel                               ldlm layout lock This results in the client eviction: [102925.161943] Lustre: mdt00_005: service thread pid 40421 was inactive for 40.055 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [102925.161948] Pid: 40421, comm: mdt00_005 3.10.0-my #10 SMP Mon Feb 24 14:14:43 MSK 2025 [102925.161950] Call Trace: [102925.162835] [<0>] ldlm_completion_ast+0x903/0xd40 [ptlrpc] .. [102985.329244] LustreError: 38239:0:(ldlm_lockd.c:252:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 192.168.100.181@tcp ns: mdt-lustre-MDT0000_UUID lock: ffff9a2a52626a00/0x837949271f786277 lrc: 3/0,0 mode: CR/CR res: [0x200001b72:0x4:0x0].0x0 bits 0xa/0x0 rrc: 4 type: IBT gid 0 flags: 0x60200400000020 nid: 192.168.100.181@tcp remote: 0x27ac170200a62996 expref: 13 pid: 40421 timeout: 102982 lvb_type: 0

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57654
            Subject: LU-16958 llite: migrate vs regular ops deadlock
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 98c0076ac47ee2a5bc97d9abebc4778fa0153351

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57654 Subject: LU-16958 llite: migrate vs regular ops deadlock Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 98c0076ac47ee2a5bc97d9abebc4778fa0153351
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52388/
            Subject: LU-16958 llite: migrate deadlock on not responding lock cancel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 37646c74bf884c535149d530af840d728814792b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52388/ Subject: LU-16958 llite: migrate deadlock on not responding lock cancel Project: fs/lustre-release Branch: master Current Patch Set: Commit: 37646c74bf884c535149d530af840d728814792b

            I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388, and the racer still hang at the server, looks more like LU-15491

            There could definitely be multiple different issues affecting racer testing, so that doesn't mean the above patch is not fixing a problem.

            adilger Andreas Dilger added a comment - I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388 , and the racer still hang at the server, looks more like LU-15491 There could definitely be multiple different issues affecting racer testing, so that doesn't mean the above patch is not fixing a problem.

            Another patch was pushed under this ticket.

            adilger Andreas Dilger added a comment - Another patch was pushed under this ticket.
            qian_wc Qian Yingjin added a comment -

            Found another deadlock for parallel DIO:

            T1: writer
            Obtain DLM extent lock: L1=PW[0, EOF]
            
            T2: DIO reader: 50M data, iosize=64M, max_pages_per_rpc=1024 (4M) max_rpcs_in_flight=8
            ll_direct_IO_impl()
            use all available RPC slots: number of read RPC in flight is 9
            on the server side:
            ->tgt_brw_read()
            ->tgt_brw_lock() # server side locking
            -> Try to cancel the conflict locks on client: L1=PW[0, EOF]
            
            T3: reader
            take DLM lock ref on L1=PW[0, EOF]
            Read-ahead pages (prepare pages);
            wait for RPC slots to send the read RPCs to OST
            
            deadlock: T2->T3: T2 is waiting for T3 to release DLM extent lock L1;
                      T3->T2: T3 is waiting for T2 finished to free RPC slots...

            The possible solution is that when found all RPC slots are used by srvlock DIO, and there are urgent I/O, force to send the I/O RPC to OST? 

            qian_wc Qian Yingjin added a comment - Found another deadlock for parallel DIO: T1: writer Obtain DLM extent lock: L1=PW[0, EOF] T2: DIO reader: 50M data, iosize=64M, max_pages_per_rpc=1024 (4M) max_rpcs_in_flight=8 ll_direct_IO_impl() use all available RPC slots: number of read RPC in flight is 9 on the server side: ->tgt_brw_read() ->tgt_brw_lock() # server side locking -> Try to cancel the conflict locks on client: L1=PW[0, EOF] T3: reader take DLM lock ref on L1=PW[0, EOF] Read-ahead pages (prepare pages); wait for RPC slots to send the read RPCs to OST deadlock: T2->T3: T2 is waiting for T3 to release DLM extent lock L1; T3->T2: T3 is waiting for T2 finished to free RPC slots... The possible solution is that when found all RPC slots are used by srvlock DIO, and there are urgent I/O, force to send the I/O RPC to OST? 
            bobijam Zhenyu Xu added a comment - - edited

            I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388, and the racer still hang at the server, looks more like LU-15491

            bobijam Zhenyu Xu added a comment - - edited I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388 , and the racer still hang at the server, looks more like LU-15491
            bobijam Zhenyu Xu added a comment -

            another deadlock found

            T1:
            vvp_io_init()
              ->ll_layout_refresh() <= take lli_layout_mutex
              ->ll_layout_intent()
              ->ll_take_md_lock()  <= take the CR layout lock ref
              ->ll_layout_conf()
                ->vvp_prune()
                ->vvp_inode_ops() <= release lli_layout_mtex
                ->vvp_inode_ops() <= try to acquire lli_layout_mutex
                -> racer wait here
            T2:
            ->ll_file_write_iter()
              ->vvp_io_init()
                ->ll_layout_refresh() <= take lli_layout_mutex
                ->ll_layout_intent() <= Request layout from MDT
                -> racer wait ...
            
            T3: occure in PCC-RO attach, It can happen in normal case without PCC-RO.
            ->pcc_readonly_attach()
              ->ll_layout_intent_write()
              ->ll_intent_lock()
                 -> on MDT, it will try to obtain EX layout lock to change layout.
                    but the client T1 hold CR layout lock, and T2's lock request is in lock waiting list to wai for T3 finished, thus cause dealock...
            
            bobijam Zhenyu Xu added a comment - another deadlock found T1: vvp_io_init() ->ll_layout_refresh() <= take lli_layout_mutex ->ll_layout_intent() ->ll_take_md_lock() <= take the CR layout lock ref ->ll_layout_conf() ->vvp_prune() ->vvp_inode_ops() <= release lli_layout_mtex ->vvp_inode_ops() <= try to acquire lli_layout_mutex -> racer wait here T2: ->ll_file_write_iter() ->vvp_io_init() ->ll_layout_refresh() <= take lli_layout_mutex ->ll_layout_intent() <= Request layout from MDT -> racer wait ... T3: occure in PCC-RO attach, It can happen in normal case without PCC-RO. ->pcc_readonly_attach() ->ll_layout_intent_write() ->ll_intent_lock() -> on MDT, it will try to obtain EX layout lock to change layout. but the client T1 hold CR layout lock, and T2's lock request is in lock waiting list to wai for T3 finished, thus cause dealock...

            People

              bobijam Zhenyu Xu
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: