Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12508

(llite_mmap.c:71:our_vma()) ASSERTION( !down_write_trylock(&mm->mmap_sem) ) failed when writing in multiple threads

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.10.1
    • None
    • Linux nanny1926 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 9223372036854775807

    Description

      2019-07-02T01:45:11-05:00 nanny1926 kernel: LustreError:
       251884:0:(llite_mmap.c:71:our_vma()) ASSERTION(
       !down_write_trylock(&mm->mmap_sem) ) failed:
       2019-07-02T01:45:11-05:00 nanny1926 kernel: LustreError:
       251884:0:(llite_mmap.c:71:our_vma()) LBUG
       2019-07-02T01:45:11-05:00 nanny1926 kernel: Pid: 251884, comm: java
       2019-07-02T01:45:11-05:00 nanny1926 kernel: #012Call Trace:
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc03d67ae>]
       libcfs_call_trace+0x4e/0x60 [libcfs]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc03d683c>]
       lbug_with_loc+0x4c/0xb0 [libcfs]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc116e66b>]
       our_vma+0x16b/0x170 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc11857f9>]
       vvp_io_rw_lock+0x409/0x6e0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc0fbb312>] ?
       lov_io_iter_init+0x302/0x8b0 [lov]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1185b29>]
       vvp_io_write_lock+0x59/0xf0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc063ebec>]
       cl_io_lock+0x5c/0x3d0 [obdclass]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc063f1db>]
       cl_io_loop+0x11b/0xc90 [obdclass]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1133258>]
       ll_file_io_generic+0x498/0xc40 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1133cdd>]
       ll_file_aio_write+0x12d/0x1f0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1133e6e>]
       ll_file_write+0xce/0x1e0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff81200cad>]
       vfs_write+0xbd/0x1e0
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff8111f394>] ?
       __audit_syscall_entry+0xb4/0x110
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff81201abf>]
       SyS_write+0x7f/0xe0
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff816b5292>]
       tracesys+0xdd/0xe2
       2019-07-02T01:45:11-05:00 nanny1926 kernel:
       2019-07-02T01:45:11-05:00 nanny1926 kernel: Kernel panic - not syncing: LBUG
      

      It is reading in up to 256 threads. And writing 16 files in up to 16 threads. 
       
      It is reproducible (but does not fail every time) on this particular machine, which might just be a particular network timing.
      I will try to reproduce it on another machine and get back to you if successful.
       
      Any ideas why this lock would have failed?
      A quick analysis shows that the only place where our_vma is called is lustre/llite/vvp_io.c:453, and it only acquires read lock:
      vvp_mmap_locks:

      452                 down_read(&mm->mmap_sem);
       453                 while((vma = our_vma(mm, addr, count)) != NULL) {
       454                         struct dentry *de = file_dentry(vma->vm_file);
       455                         struct inode *inode = de->d_inode;
       456                         int flags = CEF_MUST;
       

      whereas our_vma has this:

      70         /* mmap_sem must have been held by caller. */
      71         LASSERT(!down_write_trylock(&mm->mmap_sem));
       

      So i guess if there are multiple threads in vvp_mmap_locks and more than one happen to acquire read_lock, or one of them acquires write lock then the other would fail, no?

      Attachments

        Issue Links

          Activity

            [LU-12508] (llite_mmap.c:71:our_vma()) ASSERTION( !down_write_trylock(&mm->mmap_sem) ) failed when writing in multiple threads
            jamesanunez James Nunez (Inactive) made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 26503 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-15156 [ LU-15156 ]
            jamesanunez James Nunez (Inactive) made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 25544 ]
            jamesanunez James Nunez (Inactive) made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 25127 ]
            adilger Andreas Dilger made changes -
            Link Original: This issue is duplicated by DDN-519 [ DDN-519 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is duplicated by DDN-519 [ DDN-519 ]
            Tomaka Jacek Tomaka (Inactive) made changes -
            Resolution New: Not a Bug [ 6 ]
            Status Original: Open [ 1 ] New: Closed [ 6 ]

            Hi YangSheng,
            I would like confirm that applying patch you reference to kernel make the otherwise reliable repoducer not hit this issue anymore. Thank you for your help!
            Regards.
            Jacek Tomaka

            Tomaka Jacek Tomaka (Inactive) added a comment - Hi YangSheng, I would like confirm that applying patch you reference to kernel make the otherwise reliable repoducer not hit this issue anymore. Thank you for your help! Regards. Jacek Tomaka

            That looks reasonable. Thanks!

            Tomaka Jacek Tomaka (Inactive) added a comment - That looks reasonable. Thanks!
            ys Yang Sheng added a comment -

            Hi, Jacek,

            This is patch comment:

            From a9e9bcb45b1525ba7aea26ed9441e8632aeeda58 Mon Sep 17 00:00:00 2001
            From: Waiman Long <longman@redhat.com>
            Date: Sun, 28 Apr 2019 17:25:38 -0400
            Subject: [PATCH] locking/rwsem: Prevent decrement of reader count before
             increment
            
            During my rwsem testing, it was found that after a down_read(), the
            reader count may occasionally become 0 or even negative. Consequently,
            a writer may steal the lock at that time and execute with the reader
            in parallel thus breaking the mutual exclusion guarantee of the write
            lock. In other words, both readers and writer can become rwsem owners
            simultaneously.
            
            The current reader wakeup code does it in one pass to clear waiter->task
            and put them into wake_q before fully incrementing the reader count.
            Once waiter->task is cleared, the corresponding reader may see it,
            finish the critical section and do unlock to decrement the count before
            the count is incremented. This is not a problem if there is only one
            reader to wake up as the count has been pre-incremented by 1.  It is
            a problem if there are more than one readers to be woken up and writer
            can steal the lock.
            
            The wakeup was actually done in 2 passes before the following v4.9 commit:
            
              70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once")
            
            To fix this problem, the wakeup is now done in two passes
            again. In the first pass, we collect the readers and count them.
            The reader count is then fully incremented. In the second pass, the
            waiter->task is then cleared and they are put into wake_q to be woken
            up later.
            Thanks,
            
            

            Thanks,
            YangSheng

            ys Yang Sheng added a comment - Hi, Jacek, This is patch comment: From a9e9bcb45b1525ba7aea26ed9441e8632aeeda58 Mon Sep 17 00:00:00 2001 From: Waiman Long <longman@redhat.com> Date: Sun, 28 Apr 2019 17:25:38 -0400 Subject: [PATCH] locking/rwsem: Prevent decrement of reader count before increment During my rwsem testing, it was found that after a down_read(), the reader count may occasionally become 0 or even negative. Consequently, a writer may steal the lock at that time and execute with the reader in parallel thus breaking the mutual exclusion guarantee of the write lock. In other words, both readers and writer can become rwsem owners simultaneously. The current reader wakeup code does it in one pass to clear waiter->task and put them into wake_q before fully incrementing the reader count. Once waiter->task is cleared, the corresponding reader may see it, finish the critical section and do unlock to decrement the count before the count is incremented. This is not a problem if there is only one reader to wake up as the count has been pre-incremented by 1. It is a problem if there are more than one readers to be woken up and writer can steal the lock. The wakeup was actually done in 2 passes before the following v4.9 commit: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once") To fix this problem, the wakeup is now done in two passes again. In the first pass, we collect the readers and count them. The reader count is then fully incremented. In the second pass, the waiter->task is then cleared and they are put into wake_q to be woken up later. Thanks, Thanks, YangSheng

            People

              ys Yang Sheng
              Tomaka Jacek Tomaka (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: