Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4650

contention on ll_inode_size_lock with mmap'ed file

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.1.6, Lustre 2.4.2
    • None
    • rhel 6.4
      kernel 2.6.32-431

    Description

      Our customer (CEA) is suffering huge contention when using a debugging tool (Distributed Debugging Tool) with a binary file located on Lustre filesystem. The binary file is quite large (~300 MB). The debugging tool launches one gdb instance per core on the client, which reveals high contention on large SMP nodes (32 cores).

      Global launch time appears to be 3 minutes when binary file is on Lustre compared to 20 seconds only on NFS.

      After analysis of the operations done by gdb, I have created a test program that reproduces the issue (mmaptest.c). It does:

      • open a file O_RDONLY
      • mmap it entirely PROT_READ, MAP_PRIVATE
      • access each page of the memory region (from first to last page)

      Launch command is:

      1. \cp file1G /dev/null; time ./launch_mmaptest.sh 16 file1G

      Run time with lustre 2.1.6

        ext4 lustre
      1 instance 0.339s 2.951s
      32 instances 0.558s 9m20.669s

      Run time with lustre 2.4.2

        ext4 lustre
      1 instance 0.349s 6.542s
      16 instances 0.373s 45.588s

      With several instances, processes are waiting on inode size lock. Here is the stack of most of the instances during the test

      [<ffffffff810a0371>] down+0x41/0x50
      [<ffffffffa0b5daa2>] ll_inode_size_lock+0x52/0x110 [lustre]
      [<ffffffffa0b97a06>] ccc_prep_size+0x86/0x270 [lustre]
      [<ffffffffa0b9f4a1>] vvp_io_fault_start+0xf1/0xb00 [lustre]
      [<ffffffffa060061a>] cl_io_start+0x6a/0x140 [obdclass]
      [<ffffffffa0604d54>] cl_io_loop+0xb4/0x1b0 [obdclass]
      [<ffffffffa0b827a2>] ll_fault+0x2c2/0x4d0 [lustre]
      [<ffffffff8114a4c4>] __do_fault+0x54/0x540
      [<ffffffff8114aa4d>] handle_pte_fault+0x9d/0xbd0
      [<ffffffff8114b7aa>] handle_mm_fault+0x22a/0x300
      [<ffffffff8104aa68>] __do_page_fault+0x138/0x480
      [<ffffffff8152e2fe>] do_page_fault+0x3e/0xa0
      [<ffffffff8152b6b5>] page_fault+0x25/0x30
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            [LU-4650] contention on ll_inode_size_lock with mmap'ed file

            duplication of LU-4257

            jay Jinshan Xiong (Inactive) added a comment - duplication of LU-4257

            Jinshan,
            Could you answer this question please?

            dmiter Dmitry Eremin (Inactive) added a comment - Jinshan, Could you answer this question please?

            Could you explain what the new design does ? Is there a HLD document available ?

            pichong Gregoire Pichon added a comment - Could you explain what the new design does ? Is there a HLD document available ?

            This patch is temporary solution that should improve situation right now. We are working on redesign of this code and avoid this lock at all. The results are promising. But the patch will be late.

            dmiter Dmitry Eremin (Inactive) added a comment - This patch is temporary solution that should improve situation right now. We are working on redesign of this code and avoid this lock at all. The results are promising. But the patch will be late.

            Thanks. The patch might improve performance because it improves lock management. But I think there is still a design/implementation issue.

            Why inode size lock need to be taken, since file size does not change and accesses are read only (file is open with O_RDONLY, mmap is done with PROT_READ) ?

            pichong Gregoire Pichon added a comment - Thanks. The patch might improve performance because it improves lock management. But I think there is still a design/implementation issue. Why inode size lock need to be taken, since file size does not change and accesses are read only (file is open with O_RDONLY, mmap is done with PROT_READ) ?
            dmiter Dmitry Eremin (Inactive) added a comment - Patch http://review.whamcloud.com/9095/ should help with this.

            The root cause the same as in LU-4257.

            dmiter Dmitry Eremin (Inactive) added a comment - The root cause the same as in LU-4257 .

            People

              dmiter Dmitry Eremin (Inactive)
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: