Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10859

Deadlock with heavy memory pressure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0, Lustre 2.10.4
    • None
    • RHEL7 Server
      Lustre 2.7.x series
    • 3
    • 9223372036854775807

    Description

      On one Customer site, we hit following deadlock:
      
          Thread 1:
      
          ofd_object_punch
      
           osd_punch
      
            ldiskfs_truncate
      
             ldiskfs_inode_attach_jinode
      
               ...
      
               do_try_to_free_pages
      
                lu_cache_shrink
      
                 mutex_lock -->try to hold @lu_sites_guard
      
          
      
          kswapd thread2:
      
          kthread
      
           shrink_slab
      
            lu_cache_shrink
      
              mutex_lock ---->hold already.
      
               ...
      
               dqget
      
                ldiskfs_acquire_dquot
      
                 jbd2__journal_start-->blocked to wait for more credits.
      
          
      
          Thread3:
      
          kthread
      
           kjournald2
      
            jbd2_journal_commit_transaction-->blocked to wait Thread2 finished,
      
                                       since Thread1 add a handle into transaction.
      
          
      
          So deadlock happens because of Thread1 wait Thread2, Thread2 wait Thread3..
      
          but Thread3 wait Thread1....
      
          
      
          This problem still exists even we have switched @lu_sites_guard
      
          into a read/write lock, sine we hold write lock at lu_cahce_shrink().
      
          
      
          Fixed the problem by making ldiskfs_inode_attach_jinode() use
      
          GPF_NOFS.
      
       
      
      

      Attachments

        Activity

          People

            wangshilong Wang Shilong (Inactive)
            wangshilong Wang Shilong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: