Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6918

Deadlock on transaction with iget()/clear_inode()

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      Thread 1:

      schedule
      start_this_handle
      jbd2_journal_start
      ldiskfs_journal_start_sb
      ldiskfs_dquot_drop
      vfs_dq_drop
      clear_inode
      dispose_list
      shrink_icache_memory
      shrink_slab
      zone_reclaim
      get_page_from_freelist
      __alloc_pages_nodemask
      alloc_pages_vma
      do_huge_pmd_anonymous_page
      handle_mm_fault
      __do_page_fault
      do_page_fault
      page_fault
      

      Thread2:

      __wait_on_freeing_inode
      find_inode_fast
      ifind_fast
      iget_locked
      ldiskfs_iget
      osd_iget
      osd_index_ea_delete
      out_obj_index_delete
      out_tx_index_delete_exec
      out_tx_end
      out_handle
      tgt_request_handle
      ptlrpc_main
      kthread
      kernel_thread
      

      Attachments

        Issue Links

          Activity

            [LU-6918] Deadlock on transaction with iget()/clear_inode()
            pjones Peter Jones made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Duplicate of LU-6969

            pjones Peter Jones added a comment - Duplicate of LU-6969
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-6969 [ LU-6969 ]
            bzzz Alex Zhuravlev added a comment - http://review.whamcloud.com/#/c/15924/

            We have several reports from different sites. It isn't easily reproducible.

            askulysh Andriy Skulysh added a comment - We have several reports from different sites. It isn't easily reproducible.

            Andriy, how easily can this deadlock be hit, and what is the workload to trigger it?

            adilger Andreas Dilger added a comment - Andriy, how easily can this deadlock be hit, and what is the workload to trigger it?
            adilger Andreas Dilger made changes -
            Priority Original: Minor [ 4 ] New: Major [ 3 ]
            adilger Andreas Dilger made changes -
            Affects Version/s New: Lustre 2.8.0 [ 11113 ]
            adilger Andreas Dilger made changes -
            Description Original: Thread 1:
            schedule
            start_this_handle
            jbd2_journal_start
            ldiskfs_journal_start_sb
            ldiskfs_dquot_drop
            vfs_dq_drop
            clear_inode
            dispose_list
            shrink_icache_memory
            shrink_slab
            zone_reclaim
            get_page_from_freelist
            __alloc_pages_nodemask
            alloc_pages_vma
            do_huge_pmd_anonymous_page
            handle_mm_fault
            __do_page_fault
            do_page_fault
            page_fault

            Thread2:
            __wait_on_freeing_inode
            find_inode_fast
            ifind_fast
            iget_locked
            ldiskfs_iget
            osd_iget
            osd_index_ea_delete
            out_obj_index_delete
            out_tx_index_delete_exec
            out_tx_end
            out_handle
            tgt_request_handle
            ptlrpc_main
            kthread
            kernel_thread
            New: Thread 1:
            {noformat}
            schedule
            start_this_handle
            jbd2_journal_start
            ldiskfs_journal_start_sb
            ldiskfs_dquot_drop
            vfs_dq_drop
            clear_inode
            dispose_list
            shrink_icache_memory
            shrink_slab
            zone_reclaim
            get_page_from_freelist
            __alloc_pages_nodemask
            alloc_pages_vma
            do_huge_pmd_anonymous_page
            handle_mm_fault
            __do_page_fault
            do_page_fault
            page_fault
            {noformat}
            Thread2:
            {noformat}
            __wait_on_freeing_inode
            find_inode_fast
            ifind_fast
            iget_locked
            ldiskfs_iget
            osd_iget
            osd_index_ea_delete
            out_obj_index_delete
            out_tx_index_delete_exec
            out_tx_end
            out_handle
            tgt_request_handle
            ptlrpc_main
            kthread
            kernel_thread
            {noformat}

            no, we shouldn't do this in the target code because the target has no idea of agent inodes and it can't address that inode given an agent inode has no FID assigned. I think the only solution is to postpone inode destroy. this can be done in different ways. the most trivial is to have a list of inode numbers in memory. this can lead to an orphan, but given the number of agent inodes is very small, they don't occupy much space and at some point they will be discovered by LFSCK - probably good enough. if not, then we can do something similar to ext4_orphan_add()..

            bzzz Alex Zhuravlev added a comment - no, we shouldn't do this in the target code because the target has no idea of agent inodes and it can't address that inode given an agent inode has no FID assigned. I think the only solution is to postpone inode destroy. this can be done in different ways. the most trivial is to have a list of inode numbers in memory. this can lead to an orphan, but given the number of agent inodes is very small, they don't occupy much space and at some point they will be discovered by LFSCK - probably good enough. if not, then we can do something similar to ext4_orphan_add()..

            People

              wc-triage WC Triage
              askulysh Andriy Skulysh
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: