Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      PID: 350193  TASK: ffff9bd65af446c0  CPU: 0   COMMAND: "getfattr"
       #0 [ffff9bd63ffb7950] __schedule at ffffffffba5a232d
          /tmp/kernel/kernel/sched/core.c: 3109
       #1 [ffff9bd63ffb79d8] schedule at ffffffffba5a2748
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 84
       #2 [ffff9bd63ffb79e8] rwsem_down_write_slowpath at ffffffffba0f41a7
          /tmp/kernel/./arch/x86/include/asm/current.h: 15
       #3 [ffff9bd63ffb7a88] down_write at ffffffffba5a691a
          /tmp/kernel/./include/linux/err.h: 36
       #4 [ffff9bd63ffb7ac0] vvp_inode_ops at ffffffffc116d57f [lustre]
          /home/lustre/linux-4.18.0-305.25.1.el8_4/./arch/x86/include/asm/current.h: 15
       #5 [ffff9bd63ffb7ae0] cl_object_inode_ops at ffffffffc0454a50 [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_object.c: 442
       #6 [ffff9bd63ffb7b18] lov_conf_set at ffffffffc0aa36c4 [lov]
          /home/lustre/master-mine/lustre/lov/lov_object.c: 1465
       #7 [ffff9bd63ffb7b88] cl_conf_set at ffffffffc04542d8 [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_object.c: 299
       #8 [ffff9bd63ffb7bb8] ll_layout_conf at ffffffffc111d110 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 5995
       #9 [ffff9bd63ffb7c28] ll_layout_refresh at ffffffffc111dad3 [lustre]
          /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155
      #10 [ffff9bd63ffb7cf0] vvp_io_init at ffffffffc116d019 [lustre]
          /home/lustre/master-mine/lustre/llite/vvp_io.c: 1870
      #11 [ffff9bd63ffb7d20] __cl_io_init at ffffffffc045e66f [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_io.c: 134
      #12 [ffff9bd63ffb7d58] cl_glimpse_size0 at ffffffffc11642ca [lustre]
          /home/lustre/master-mine/lustre/llite/glimpse.c: 204
      #13 [ffff9bd63ffb7da0] ll_getattr_dentry at ffffffffc111c65d [lustre]
          /home/lustre/master-mine/lustre/llite/llite_internal.h: 1677
      #14 [ffff9bd63ffb7e50] vfs_statx at ffffffffba1d4be9
          /tmp/kernel/fs/stat.c: 204
      

      checking the stack on the process above inode was found at 0xffff9bd60367d350:

      crash> p *(struct ll_inode_info *)(0xffff9bd60367d350-0x150)
        lli_inode_magic = 287116773,
      ...
        lli_inode_lock_owner = 0xffff9bd68f51d380
      

      now check task 0xffff9bd68f51d380:

      crash> p *(struct task_struct *)0xffff9bd68f51d380|more
      ...
        pid = 348428,
      ...
      PID: 348428  TASK: ffff9bd68f51d380  CPU: 1   COMMAND: "lfs"
       #0 [ffff9bd613c37968] __schedule at ffffffffba5a232d
          /tmp/kernel/kernel/sched/core.c: 3109
       #1 [ffff9bd613c379f0] schedule at ffffffffba5a2748
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 84
       #2 [ffff9bd613c37a00] schedule_preempt_disabled at ffffffffba5a2a6c
          /tmp/kernel/./arch/x86/include/asm/preempt.h: 79
       #3 [ffff9bd613c37a08] __mutex_lock at ffffffffba5a3a40
          /tmp/kernel/kernel/locking/mutex.c: 1038
       #4 [ffff9bd613c37ac8] ll_layout_refresh at ffffffffc111d577 [lustre]
          /home/lustre/master-mine/lustre/llite/llite_internal.h: 1536
       #5 [ffff9bd613c37b88] vvp_io_init at ffffffffc116d019 [lustre]
          /home/lustre/master-mine/lustre/llite/vvp_io.c: 1870
       #6 [ffff9bd613c37bb8] __cl_io_init at ffffffffc045e66f [obdclass]
          /home/lustre/master-mine/lustre/obdclass/cl_io.c: 134
       #7 [ffff9bd613c37bf0] ll_ioc_data_version at ffffffffc110c665 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 3193
       #8 [ffff9bd613c37c28] ll_migrate at ffffffffc111b244 [lustre]
          /home/lustre/master-mine/lustre/llite/file.c: 3227
       #9 [ffff9bd613c37ca8] ll_dir_ioctl at ffffffffc1105563 [lustre]
          /home/lustre/master-mine/lustre/llite/dir.c: 2277
      #10 [ffff9bd613c37e88] do_vfs_ioctl at ffffffffba1e3199
          /tmp/kernel/fs/ioctl.c: 48
      

      it seems this is an locking order issue:
      ll_migrate() takes inode lock, then lli_layout_mutex (in ll_layout_refresh()) while other ops (like getfattr) use the reversed order.

      Attachments

        Issue Links

          Activity

            [LU-16958] migrate vs regular ops deadlock

            I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388, and the racer still hang at the server, looks more like LU-15491

            There could definitely be multiple different issues affecting racer testing, so that doesn't mean the above patch is not fixing a problem.

            adilger Andreas Dilger added a comment - I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388 , and the racer still hang at the server, looks more like LU-15491 There could definitely be multiple different issues affecting racer testing, so that doesn't mean the above patch is not fixing a problem.

            Another patch was pushed under this ticket.

            adilger Andreas Dilger added a comment - Another patch was pushed under this ticket.
            qian_wc Qian Yingjin added a comment -

            Found another deadlock for parallel DIO:

            T1: writer
            Obtain DLM extent lock: L1=PW[0, EOF]
            
            T2: DIO reader: 50M data, iosize=64M, max_pages_per_rpc=1024 (4M) max_rpcs_in_flight=8
            ll_direct_IO_impl()
            use all available RPC slots: number of read RPC in flight is 9
            on the server side:
            ->tgt_brw_read()
            ->tgt_brw_lock() # server side locking
            -> Try to cancel the conflict locks on client: L1=PW[0, EOF]
            
            T3: reader
            take DLM lock ref on L1=PW[0, EOF]
            Read-ahead pages (prepare pages);
            wait for RPC slots to send the read RPCs to OST
            
            deadlock: T2->T3: T2 is waiting for T3 to release DLM extent lock L1;
                      T3->T2: T3 is waiting for T2 finished to free RPC slots...

            The possible solution is that when found all RPC slots are used by srvlock DIO, and there are urgent I/O, force to send the I/O RPC to OST? 

            qian_wc Qian Yingjin added a comment - Found another deadlock for parallel DIO: T1: writer Obtain DLM extent lock: L1=PW[0, EOF] T2: DIO reader: 50M data, iosize=64M, max_pages_per_rpc=1024 (4M) max_rpcs_in_flight=8 ll_direct_IO_impl() use all available RPC slots: number of read RPC in flight is 9 on the server side: ->tgt_brw_read() ->tgt_brw_lock() # server side locking -> Try to cancel the conflict locks on client: L1=PW[0, EOF] T3: reader take DLM lock ref on L1=PW[0, EOF] Read-ahead pages (prepare pages); wait for RPC slots to send the read RPCs to OST deadlock: T2->T3: T2 is waiting for T3 to release DLM extent lock L1; T3->T2: T3 is waiting for T2 finished to free RPC slots... The possible solution is that when found all RPC slots are used by srvlock DIO, and there are urgent I/O, force to send the I/O RPC to OST? 
            bobijam Zhenyu Xu added a comment - - edited

            I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388, and the racer still hang at the server, looks more like LU-15491

            bobijam Zhenyu Xu added a comment - - edited I thought deadlock due to this patch , but I reverted the essential part of this patch at https://review.whamcloud.com/52388 , and the racer still hang at the server, looks more like LU-15491
            bobijam Zhenyu Xu added a comment -

            another deadlock found

            T1:
            vvp_io_init()
              ->ll_layout_refresh() <= take lli_layout_mutex
              ->ll_layout_intent()
              ->ll_take_md_lock()  <= take the CR layout lock ref
              ->ll_layout_conf()
                ->vvp_prune()
                ->vvp_inode_ops() <= release lli_layout_mtex
                ->vvp_inode_ops() <= try to acquire lli_layout_mutex
                -> racer wait here
            T2:
            ->ll_file_write_iter()
              ->vvp_io_init()
                ->ll_layout_refresh() <= take lli_layout_mutex
                ->ll_layout_intent() <= Request layout from MDT
                -> racer wait ...
            
            T3: occure in PCC-RO attach, It can happen in normal case without PCC-RO.
            ->pcc_readonly_attach()
              ->ll_layout_intent_write()
              ->ll_intent_lock()
                 -> on MDT, it will try to obtain EX layout lock to change layout.
                    but the client T1 hold CR layout lock, and T2's lock request is in lock waiting list to wai for T3 finished, thus cause dealock...
            
            bobijam Zhenyu Xu added a comment - another deadlock found T1: vvp_io_init() ->ll_layout_refresh() <= take lli_layout_mutex ->ll_layout_intent() ->ll_take_md_lock() <= take the CR layout lock ref ->ll_layout_conf() ->vvp_prune() ->vvp_inode_ops() <= release lli_layout_mtex ->vvp_inode_ops() <= try to acquire lli_layout_mutex -> racer wait here T2: ->ll_file_write_iter() ->vvp_io_init() ->ll_layout_refresh() <= take lli_layout_mutex ->ll_layout_intent() <= Request layout from MDT -> racer wait ... T3: occure in PCC-RO attach, It can happen in normal case without PCC-RO. ->pcc_readonly_attach() ->ll_layout_intent_write() ->ll_intent_lock() -> on MDT, it will try to obtain EX layout lock to change layout. but the client T1 hold CR layout lock, and T2's lock request is in lock waiting list to wai for T3 finished, thus cause dealock...
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51641/
            Subject: LU-16958 llite: migrate vs regular ops deadlock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8f2c1592c3bbd0351ab3984a88a3eed7075690c8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51641/ Subject: LU-16958 llite: migrate vs regular ops deadlock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8f2c1592c3bbd0351ab3984a88a3eed7075690c8

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51641
            Subject: LU-16958 llite: migrate vs regular ops deadlock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8de4a374979a37d09a057cbcdfd9914775cfc59b

            gerrit Gerrit Updater added a comment - "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51641 Subject: LU-16958 llite: migrate vs regular ops deadlock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8de4a374979a37d09a057cbcdfd9914775cfc59b

            People

              bobijam Zhenyu Xu
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: