Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14713

Process hung with waiting for mmap_sem

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Lustre 2.12.7, Lustre 2.12.8
    • None
    • 3
    • 9223372036854775807

    Description

      Write and Truncate IO will serialized on ll_trunc_sem::ll_trunc_{readers|waiters}, if one process quit abruptly (be killed), the other will keep waiting for the semaphore (task state be set as TASK_INTERRUPTIBLE):

       INFO: task a.out:109684 blocked for more than 120 seconds.
            Tainted: G          IOE    --------- -  - 4.18.0-240.15.1.el8_3.x86_64 #1
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       Call Trace:
       __schedule+0x2a6/0x700
       schedule+0x38/0xa0
       trunc_sem_down_read+0xa6/0xb0 [lustre]
       vvp_io_write_start+0x107/0xb80 [lustre]
       cl_io_start+0x59/0x110 [obdclass]
       cl_io_loop+0x9a/0x1e0 [obdclass]
       ll_file_io_generic+0x380/0xb10 [lustre]
       ll_file_write_iter+0x136/0x5a0 [lustre]
       new_sync_write+0x124/0x170
       vfs_write+0xa5/0x1a0
       ksys_write+0x4f/0xb0
       do_syscall_64+0x5b/0x1a0
      

      Attachments

        Issue Links

          Activity

            [LU-14713] Process hung with waiting for mmap_sem
            eaujames Etienne Aujames added a comment - - edited

            We likely hit this issue at the CEA in production.
            After upgrading from 2.12.6 to 2.12.7, we observed regularly hangs on the clients when canceling a job.
            The hang is observed on mmap_sem when closing the /dev/infiniband/uverb* device:

            void ib_umem_release(struct ib_umem *umem)
            {
                    if (!umem)
                            return;
            .....
            
            #ifdef HAVE_ATOMIC_PINNED_VM
                    atomic64_sub(ib_umem_num_pages(umem), &umem->owning_mm->pinned_vm);
            #else
                    down_write(&umem->owning_mm->mmap_sem);                 <----- the thread hang here, rw_semaphore owner is garbage (already free)
            

            MOFED: 5.4
            kernel: 3.10
            lustre: client 2.12.7

            So I am not 100% sure, but I think we hit the regression from 21dc165991 "LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE" (landed in 2.12.7).
            This patch could cause mmap_sem to be released (up_read) twice (only for kernel < 5.1).

            eaujames Etienne Aujames added a comment - - edited We likely hit this issue at the CEA in production. After upgrading from 2.12.6 to 2.12.7, we observed regularly hangs on the clients when canceling a job. The hang is observed on mmap_sem when closing the /dev/infiniband/uverb* device: void ib_umem_release(struct ib_umem *umem) { if (!umem) return; ..... #ifdef HAVE_ATOMIC_PINNED_VM atomic64_sub(ib_umem_num_pages(umem), &umem->owning_mm->pinned_vm); #else down_write(&umem->owning_mm->mmap_sem); <----- the thread hang here, rw_semaphore owner is garbage (already free) MOFED: 5.4 kernel: 3.10 lustre: client 2.12.7 So I am not 100% sure, but I think we hit the regression from 21dc165991 " LU-13182 llite: Avoid eternel retry loops with MAP_POPULATE" (landed in 2.12.7). This patch could cause mmap_sem to be released (up_read) twice (only for kernel < 5.1).

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47404
            Subject: LU-14713 llite: tighten condition for fault not drop mmap_sem
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: cdf66f48b5100b0b209be3f1741e6163000299e2

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47404 Subject: LU-14713 llite: tighten condition for fault not drop mmap_sem Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: cdf66f48b5100b0b209be3f1741e6163000299e2
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43844/
            Subject: LU-14713 llite: mend the trunc_sem_up_write()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 39745c8b5493159bbca62add54ca9be7cac6564f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43844/ Subject: LU-14713 llite: mend the trunc_sem_up_write() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 39745c8b5493159bbca62add54ca9be7cac6564f

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44715/
            Subject: LU-14713 llite: tighten condition for fault not drop mmap_sem
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 81aec05103558f57adb10fd319847304cdf44aa7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44715/ Subject: LU-14713 llite: tighten condition for fault not drop mmap_sem Project: fs/lustre-release Branch: master Current Patch Set: Commit: 81aec05103558f57adb10fd319847304cdf44aa7
            bobijam Zhenyu Xu added a comment -

            As ll_fault0() shows, if ll_filemap_fault() returns VM_FAULT_RETRY, it presumes mmap_sem has not dropped and continues to normal fault under DLM lock, which will eventually drop the mmap_sem.

            ll_fault0()
            279         if (ll_sbi_has_fast_read(ll_i2sbi(file_inode(vma->vm_file)))) {         
            280                 /* do fast fault */                                             
            281                 bool has_retry = vmf->flags & FAULT_FLAG_RETRY_NOWAIT;          
            282                                                                                 
            283                 /* To avoid loops, instruct downstream to not drop mmap_sem */  
            284                 vmf->flags |= FAULT_FLAG_RETRY_NOWAIT;                          
            285                 ll_cl_add(vma->vm_file, env, NULL, LCC_MMAP);                   
            286                 fault_ret = ll_filemap_fault(vma, vmf);                         
            287                 ll_cl_remove(vma->vm_file, env);                                
            288                 if (!has_retry)                                                 
            289                         vmf->flags &= ~FAULT_FLAG_RETRY_NOWAIT;                 
            290                                                                                 
            291                 /* - If there is no error, then the page was found in cache and 
            292                  *   uptodate;                                                  
            293                  * - If VM_FAULT_RETRY is set, the page existed but failed to   
            294                  *   lock. We will try slow path to avoid loops.                
            295                  * - Otherwise, it should try normal fault under DLM lock. */   
            296                 if (!(fault_ret & VM_FAULT_RETRY) &&                            
            297                     !(fault_ret & VM_FAULT_ERROR))                              
            298                         GOTO(out, result = 0);                                  
            299                                                                                 
            300                 fault_ret = 0;                                                  
            301         }            

            But filemap_fault() shows that returning VM_FAULT_RETRY while not dopping mmap_sem needs vmf->flags contains both FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT, otherwise it could possibly return VM_FAULT_RETRY with mmap_sem released.

            filemap_fault()
            2421         if (!lock_page_or_retry(page, vma->vm_mm, vmf->flags)) {                
            2422                 page_cache_release(page);                                       
            2423                 return ret | VM_FAULT_RETRY;                                            
            2424         }      
            410 static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,   
            411                                      unsigned int flags)                                 
            412 {                                                                               
            413         might_sleep();                                                          
            414         return trylock_page(page) || __lock_page_or_retry(page, mm, flags);     
            415 } 
            
             854 int __lock_page_or_retry(struct page *page, struct mm_struct *mm,               
             855                          unsigned int flags)                                    
             856 {                                                                               
             857         if (flags & FAULT_FLAG_ALLOW_RETRY) {                                   
             858                 /*                                                              
             859                  * CAUTION! In this case, mmap_sem is not released              
             860                  * even though return 0.                                        
             861                  */                                                             
             862                 if (flags & FAULT_FLAG_RETRY_NOWAIT)                            
             863                         return 0;                                               
             864                                                                                 
             865                 up_read(&mm->mmap_sem);                                         
             866                 if (flags & FAULT_FLAG_KILLABLE)                                
             867                         wait_on_page_locked_killable(page);                     
             868                 else                                                            
             869                         wait_on_page_locked(page);                                      
             870                 return 0;  
             871         } else {                                                                
             872                 if (flags & FAULT_FLAG_KILLABLE) {                              
             873                         int ret;                                                
             874                                                                                 
             875                         ret = __lock_page_killable(page);                       
             876                         if (ret) {                                              
             877                                 up_read(&mm->mmap_sem);                         
             878                                 return 0;                                               
             879                         }                                                       
             880                 } else                                                          
             881                         __lock_page(page);                                      
             882                 return 1;                                                       
             883         }     
             884 }                           
             
            bobijam Zhenyu Xu added a comment - As ll_fault0() shows, if ll_filemap_fault() returns VM_FAULT_RETRY, it presumes mmap_sem has not dropped and continues to normal fault under DLM lock, which will eventually drop the mmap_sem. ll_fault0() 279 if (ll_sbi_has_fast_read(ll_i2sbi(file_inode(vma->vm_file)))) { 280 /* do fast fault */ 281 bool has_retry = vmf->flags & FAULT_FLAG_RETRY_NOWAIT; 282 283 /* To avoid loops, instruct downstream to not drop mmap_sem */ 284 vmf->flags |= FAULT_FLAG_RETRY_NOWAIT; 285 ll_cl_add(vma->vm_file, env, NULL, LCC_MMAP); 286 fault_ret = ll_filemap_fault(vma, vmf); 287 ll_cl_remove(vma->vm_file, env); 288 if (!has_retry) 289 vmf->flags &= ~FAULT_FLAG_RETRY_NOWAIT; 290 291 /* - If there is no error, then the page was found in cache and 292 * uptodate; 293 * - If VM_FAULT_RETRY is set, the page existed but failed to 294 * lock. We will try slow path to avoid loops. 295 * - Otherwise, it should try normal fault under DLM lock. */ 296 if (!(fault_ret & VM_FAULT_RETRY) && 297 !(fault_ret & VM_FAULT_ERROR)) 298 GOTO(out, result = 0); 299 300 fault_ret = 0; 301 } But filemap_fault() shows that returning VM_FAULT_RETRY while not dopping mmap_sem needs vmf->flags contains both FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT, otherwise it could possibly return VM_FAULT_RETRY with mmap_sem released. filemap_fault() 2421 if (!lock_page_or_retry(page, vma->vm_mm, vmf->flags)) { 2422 page_cache_release(page); 2423 return ret | VM_FAULT_RETRY; 2424 } 410 static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm, 411 unsigned int flags) 412 { 413 might_sleep(); 414 return trylock_page(page) || __lock_page_or_retry(page, mm, flags); 415 } 854 int __lock_page_or_retry(struct page *page, struct mm_struct *mm, 855 unsigned int flags) 856 { 857 if (flags & FAULT_FLAG_ALLOW_RETRY) { 858 /* 859 * CAUTION! In this case , mmap_sem is not released 860 * even though return 0. 861 */ 862 if (flags & FAULT_FLAG_RETRY_NOWAIT) 863 return 0; 864 865 up_read(&mm->mmap_sem); 866 if (flags & FAULT_FLAG_KILLABLE) 867 wait_on_page_locked_killable(page); 868 else 869 wait_on_page_locked(page); 870 return 0; 871 } else { 872 if (flags & FAULT_FLAG_KILLABLE) { 873 int ret; 874 875 ret = __lock_page_killable(page); 876 if (ret) { 877 up_read(&mm->mmap_sem); 878 return 0; 879 } 880 } else 881 __lock_page(page); 882 return 1; 883 } 884 }

            People

              bobijam Zhenyu Xu
              bobijam Zhenyu Xu
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: