Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12508

(llite_mmap.c:71:our_vma()) ASSERTION( !down_write_trylock(&mm->mmap_sem) ) failed when writing in multiple threads

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.10.1
    • None
    • Linux nanny1926 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 9223372036854775807

    Description

      2019-07-02T01:45:11-05:00 nanny1926 kernel: LustreError:
       251884:0:(llite_mmap.c:71:our_vma()) ASSERTION(
       !down_write_trylock(&mm->mmap_sem) ) failed:
       2019-07-02T01:45:11-05:00 nanny1926 kernel: LustreError:
       251884:0:(llite_mmap.c:71:our_vma()) LBUG
       2019-07-02T01:45:11-05:00 nanny1926 kernel: Pid: 251884, comm: java
       2019-07-02T01:45:11-05:00 nanny1926 kernel: #012Call Trace:
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc03d67ae>]
       libcfs_call_trace+0x4e/0x60 [libcfs]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc03d683c>]
       lbug_with_loc+0x4c/0xb0 [libcfs]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc116e66b>]
       our_vma+0x16b/0x170 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc11857f9>]
       vvp_io_rw_lock+0x409/0x6e0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc0fbb312>] ?
       lov_io_iter_init+0x302/0x8b0 [lov]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1185b29>]
       vvp_io_write_lock+0x59/0xf0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc063ebec>]
       cl_io_lock+0x5c/0x3d0 [obdclass]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc063f1db>]
       cl_io_loop+0x11b/0xc90 [obdclass]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1133258>]
       ll_file_io_generic+0x498/0xc40 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1133cdd>]
       ll_file_aio_write+0x12d/0x1f0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffffc1133e6e>]
       ll_file_write+0xce/0x1e0 [lustre]
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff81200cad>]
       vfs_write+0xbd/0x1e0
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff8111f394>] ?
       __audit_syscall_entry+0xb4/0x110
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff81201abf>]
       SyS_write+0x7f/0xe0
       2019-07-02T01:45:11-05:00 nanny1926 kernel: [<ffffffff816b5292>]
       tracesys+0xdd/0xe2
       2019-07-02T01:45:11-05:00 nanny1926 kernel:
       2019-07-02T01:45:11-05:00 nanny1926 kernel: Kernel panic - not syncing: LBUG
      

      It is reading in up to 256 threads. And writing 16 files in up to 16 threads. 
       
      It is reproducible (but does not fail every time) on this particular machine, which might just be a particular network timing.
      I will try to reproduce it on another machine and get back to you if successful.
       
      Any ideas why this lock would have failed?
      A quick analysis shows that the only place where our_vma is called is lustre/llite/vvp_io.c:453, and it only acquires read lock:
      vvp_mmap_locks:

      452                 down_read(&mm->mmap_sem);
       453                 while((vma = our_vma(mm, addr, count)) != NULL) {
       454                         struct dentry *de = file_dentry(vma->vm_file);
       455                         struct inode *inode = de->d_inode;
       456                         int flags = CEF_MUST;
       

      whereas our_vma has this:

      70         /* mmap_sem must have been held by caller. */
      71         LASSERT(!down_write_trylock(&mm->mmap_sem));
       

      So i guess if there are multiple threads in vvp_mmap_locks and more than one happen to acquire read_lock, or one of them acquires write lock then the other would fail, no?

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              Tomaka Jacek Tomaka (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: