Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.1.6, Lustre 2.4.2
-
None
-
rhel 6.4
kernel 2.6.32-431
-
3
-
12719
Description
Our customer (CEA) is suffering huge contention when using a debugging tool (Distributed Debugging Tool) with a binary file located on Lustre filesystem. The binary file is quite large (~300 MB). The debugging tool launches one gdb instance per core on the client, which reveals high contention on large SMP nodes (32 cores).
Global launch time appears to be 3 minutes when binary file is on Lustre compared to 20 seconds only on NFS.
After analysis of the operations done by gdb, I have created a test program that reproduces the issue (mmaptest.c). It does:
- open a file O_RDONLY
- mmap it entirely PROT_READ, MAP_PRIVATE
- access each page of the memory region (from first to last page)
Launch command is:
- \cp file1G /dev/null; time ./launch_mmaptest.sh 16 file1G
Run time with lustre 2.1.6
ext4 | lustre | |
---|---|---|
1 instance | 0.339s | 2.951s |
32 instances | 0.558s | 9m20.669s |
Run time with lustre 2.4.2
ext4 | lustre | |
---|---|---|
1 instance | 0.349s | 6.542s |
16 instances | 0.373s | 45.588s |
With several instances, processes are waiting on inode size lock. Here is the stack of most of the instances during the test
[<ffffffff810a0371>] down+0x41/0x50 [<ffffffffa0b5daa2>] ll_inode_size_lock+0x52/0x110 [lustre] [<ffffffffa0b97a06>] ccc_prep_size+0x86/0x270 [lustre] [<ffffffffa0b9f4a1>] vvp_io_fault_start+0xf1/0xb00 [lustre] [<ffffffffa060061a>] cl_io_start+0x6a/0x140 [obdclass] [<ffffffffa0604d54>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa0b827a2>] ll_fault+0x2c2/0x4d0 [lustre] [<ffffffff8114a4c4>] __do_fault+0x54/0x540 [<ffffffff8114aa4d>] handle_pte_fault+0x9d/0xbd0 [<ffffffff8114b7aa>] handle_mm_fault+0x22a/0x300 [<ffffffff8104aa68>] __do_page_fault+0x138/0x480 [<ffffffff8152e2fe>] do_page_fault+0x3e/0xa0 [<ffffffff8152b6b5>] page_fault+0x25/0x30 [<ffffffffffffffff>] 0xffffffffffffffff
Attachments
Issue Links
- is related to
-
LU-4257 parallel dds are slower than serial dds
- Resolved