[LU-4650] contention on ll_inode_size_lock with mmap'ed file Created: 19/Feb/14  Updated: 19/May/16  Resolved: 19/May/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6, Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Gregoire Pichon Assignee: Dmitry Eremin (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

rhel 6.4
kernel 2.6.32-431


Attachments: File launch_mmaptest.sh     File mmaptest.c    
Issue Links:
Related
is related to LU-4257 parallel dds are slower than serial dds Resolved
Severity: 3
Epic: contention, mmap
Rank (Obsolete): 12719

 Description   

Our customer (CEA) is suffering huge contention when using a debugging tool (Distributed Debugging Tool) with a binary file located on Lustre filesystem. The binary file is quite large (~300 MB). The debugging tool launches one gdb instance per core on the client, which reveals high contention on large SMP nodes (32 cores).

Global launch time appears to be 3 minutes when binary file is on Lustre compared to 20 seconds only on NFS.

After analysis of the operations done by gdb, I have created a test program that reproduces the issue (mmaptest.c). It does:

  • open a file O_RDONLY
  • mmap it entirely PROT_READ, MAP_PRIVATE
  • access each page of the memory region (from first to last page)

Launch command is:

  1. \cp file1G /dev/null; time ./launch_mmaptest.sh 16 file1G

Run time with lustre 2.1.6

  ext4 lustre
1 instance 0.339s 2.951s
32 instances 0.558s 9m20.669s

Run time with lustre 2.4.2

  ext4 lustre
1 instance 0.349s 6.542s
16 instances 0.373s 45.588s

With several instances, processes are waiting on inode size lock. Here is the stack of most of the instances during the test

[<ffffffff810a0371>] down+0x41/0x50
[<ffffffffa0b5daa2>] ll_inode_size_lock+0x52/0x110 [lustre]
[<ffffffffa0b97a06>] ccc_prep_size+0x86/0x270 [lustre]
[<ffffffffa0b9f4a1>] vvp_io_fault_start+0xf1/0xb00 [lustre]
[<ffffffffa060061a>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa0604d54>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0b827a2>] ll_fault+0x2c2/0x4d0 [lustre]
[<ffffffff8114a4c4>] __do_fault+0x54/0x540
[<ffffffff8114aa4d>] handle_pte_fault+0x9d/0xbd0
[<ffffffff8114b7aa>] handle_mm_fault+0x22a/0x300
[<ffffffff8104aa68>] __do_page_fault+0x138/0x480
[<ffffffff8152e2fe>] do_page_fault+0x3e/0xa0
[<ffffffff8152b6b5>] page_fault+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff


 Comments   
Comment by Dmitry Eremin (Inactive) [ 19/Feb/14 ]

The root cause the same as in LU-4257.

Comment by Dmitry Eremin (Inactive) [ 19/Feb/14 ]

Patch http://review.whamcloud.com/9095/ should help with this.

Comment by Gregoire Pichon [ 19/Feb/14 ]

Thanks. The patch might improve performance because it improves lock management. But I think there is still a design/implementation issue.

Why inode size lock need to be taken, since file size does not change and accesses are read only (file is open with O_RDONLY, mmap is done with PROT_READ) ?

Comment by Dmitry Eremin (Inactive) [ 19/Feb/14 ]

This patch is temporary solution that should improve situation right now. We are working on redesign of this code and avoid this lock at all. The results are promising. But the patch will be late.

Comment by Gregoire Pichon [ 20/Feb/14 ]

Could you explain what the new design does ? Is there a HLD document available ?

Comment by Dmitry Eremin (Inactive) [ 21/Feb/14 ]

Jinshan,
Could you answer this question please?

Comment by Jinshan Xiong (Inactive) [ 19/May/16 ]

duplication of LU-4257

Generated at Sat Feb 10 01:44:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.