Details
-
Bug
-
Resolution: Duplicate
-
Major
-
Lustre 2.4.0
-
3
-
6133
Description
During testing on Sequoia, we've seen specific IO nodes heavily contend on the cl_envs_guard lock while memory reclaim is happening. As a result, this severely degrades the performance of the problem client.
An example stack trace of a thread spinning on the lock is below:
CPU56: Call Trace: [c00000000fe3bb30] [c000000000008d1c] .show_stack+0x7c/0x184 (unreliable) [c00000000fe3bbe0] [c00000000027604c] .showacpu+0x64/0x94 [c00000000fe3bc70] [c000000000068b30] .generic_smp_call_function_interrupt+0x10c/0x230 [c00000000fe3bd40] [c00000000001d11c] .smp_message_recv+0x34/0x78 [c00000000fe3bdc0] [c00000000002526c] .bgq_ipi_dispatch+0x118/0x18c [c00000000fe3be50] [c00000000007b20c] .handle_IRQ_event+0x88/0x18c [c00000000fe3bf00] [c00000000007dc90] .handle_percpu_irq+0x8c/0x100 [c00000000fe3bf90] [c00000000001b808] .call_handle_irq+0x1c/0x2c [c0000003e1c4a4c0] [c0000000000059f0] .do_IRQ+0x154/0x1e0 [c0000003e1c4a570] [c0000000000144dc] exc_external_input_book3e+0x110/0x114 --- Exception: 501 at ._raw_spin_lock+0xd8/0x1a8 LR = ._raw_spin_lock+0x104/0x1a8 [c0000003e1c4a860] [8000000000b04f38] libcfs_nidstrings+0x2acc/0xfffffffffffe5824 [libcfs] (unreliable) [c0000003e1c4a910] [c00000000042d4cc] ._spin_lock+0x10/0x24 [c0000003e1c4a980] [80000000024c2f4c] .cl_env_get+0xec/0x480 [obdclass] [c0000003e1c4aa60] [80000000024c336c] .cl_env_nested_get+0x8c/0xf0 [obdclass] [c0000003e1c4aaf0] [800000000692070c] .ll_releasepage+0xbc/0x200 [lustre] [c0000003e1c4aba0] [c000000000094110] .try_to_release_page+0x68/0x8c [c0000003e1c4ac10] [c0000000000a4190] .shrink_page_list.clone.0+0x3d8/0x63c [c0000003e1c4adc0] [c0000000000a47d8] .shrink_inactive_list+0x3e4/0x690 [c0000003e1c4af90] [c0000000000a4f54] .shrink_zone+0x4d0/0x4d4 [c0000003e1c4b0c0] [c0000000000a5a68] .try_to_free_pages+0x204/0x3d0 [c0000003e1c4b220] [c00000000009d044] .__alloc_pages_nodemask+0x460/0x738 [c0000003e1c4b3a0] [c000000000095af4] .grab_cache_page_write_begin+0x7c/0xec [c0000003e1c4b450] [8000000006920964] .ll_write_begin+0x94/0x270 [lustre] [c0000003e1c4b520] [c0000000000968c8] .generic_file_buffered_write+0x148/0x374 [c0000003e1c4b660] [c000000000097050] .__generic_file_aio_write+0x374/0x3d8 [c0000003e1c4b760] [c00000000009712c] .generic_file_aio_write+0x78/0xe8 [c0000003e1c4b810] [800000000693ed4c] .vvp_io_write_start+0xfc/0x3e0 [lustre] [c0000003e1c4b8e0] [80000000024d9c6c] .cl_io_start+0xcc/0x220 [obdclass] [c0000003e1c4b980] [80000000024e1a84] .cl_io_loop+0x194/0x2c0 [obdclass] [c0000003e1c4ba30] [80000000068ba1d8] .ll_file_io_generic+0x498/0x670 [lustre] [c0000003e1c4bb30] [80000000068ba834] .ll_file_aio_write+0x1d4/0x3a0 [lustre] [c0000003e1c4bc00] [80000000068bab50] .ll_file_write+0x150/0x320 [lustre] [c0000003e1c4bce0] [c0000000000d1ba8] .vfs_write+0xd0/0x1c4 [c0000003e1c4bd80] [c0000000000d1d98] .SyS_write+0x54/0x98 [c0000003e1c4be30] [c000000000000580] syscall_exit+0x0/0x2c
I also see this for calls in cl_env_put:
CPU63: Call Trace: [c00000000fe03b30] [c000000000008d1c] .show_stack+0x7c/0x184 (unreliable) [c00000000fe03be0] [c00000000027604c] .showacpu+0x64/0x94 [c00000000fe03c70] [c000000000068b30] .generic_smp_call_function_interrupt+0x10c/0x230 [c00000000fe03d40] [c00000000001d11c] .smp_message_recv+0x34/0x78 [c00000000fe03dc0] [c00000000002526c] .bgq_ipi_dispatch+0x118/0x18c [c00000000fe03e50] [c00000000007b20c] .handle_IRQ_event+0x88/0x18c [c00000000fe03f00] [c00000000007dc90] .handle_percpu_irq+0x8c/0x100 [c00000000fe03f90] [c00000000001b808] .call_handle_irq+0x1c/0x2c [c0000003c4f0a510] [c0000000000059f0] .do_IRQ+0x154/0x1e0 [c0000003c4f0a5c0] [c0000000000144dc] exc_external_input_book3e+0x110/0x114 --- Exception: 501 at ._raw_spin_lock+0xdc/0x1a8 LR = ._raw_spin_lock+0x104/0x1a8 [c0000003c4f0a8b0] [800000000697a578] msgdata.87439+0x20/0xfffffffffffccf88 [lustre] (unreliable) [c0000003c4f0a960] [c00000000042d4cc] ._spin_lock+0x10/0x24 [c0000003c4f0a9d0] [80000000024c17e8] .cl_env_put+0x178/0x420 [obdclass] [c0000003c4f0aa70] [80000000024c1ab0] .cl_env_nested_put+0x20/0x40 [obdclass] [c0000003c4f0aaf0] [8000000006920794] .ll_releasepage+0x144/0x200 [lustre] [c0000003c4f0aba0] [c000000000094110] .try_to_release_page+0x68/0x8c [c0000003c4f0ac10] [c0000000000a4190] .shrink_page_list.clone.0+0x3d8/0x63c [c0000003c4f0adc0] [c0000000000a47d8] .shrink_inactive_list+0x3e4/0x690 [c0000003c4f0af90] [c0000000000a4f54] .shrink_zone+0x4d0/0x4d4 [c0000003c4f0b0c0] [c0000000000a5a68] .try_to_free_pages+0x204/0x3d0 [c0000003c4f0b220] [c00000000009d044] .__alloc_pages_nodemask+0x460/0x738 [c0000003c4f0b3a0] [c000000000095af4] .grab_cache_page_write_begin+0x7c/0xec [c0000003c4f0b450] [8000000006920964] .ll_write_begin+0x94/0x270 [lustre] [c0000003c4f0b520] [c0000000000968c8] .generic_file_buffered_write+0x148/0x374 [c0000003c4f0b660] [c000000000097050] .__generic_file_aio_write+0x374/0x3d8 [c0000003c4f0b760] [c00000000009712c] .generic_file_aio_write+0x78/0xe8 [c0000003c4f0b810] [800000000693ed4c] .vvp_io_write_start+0xfc/0x3e0 [lustre] [c0000003c4f0b8e0] [80000000024d9c6c] .cl_io_start+0xcc/0x220 [obdclass] [c0000003c4f0b980] [80000000024e1a84] .cl_io_loop+0x194/0x2c0 [obdclass] [c0000003c4f0ba30] [80000000068ba1d8] .ll_file_io_generic+0x498/0x670 [lustre] [c0000003c4f0bb30] [80000000068ba834] .ll_file_aio_write+0x1d4/0x3a0 [lustre] [c0000003c4f0bc00] [80000000068bab50] .ll_file_write+0x150/0x320 [lustre] [c0000003c4f0bce0] [c0000000000d1ba8] .vfs_write+0xd0/0x1c4 [c0000003c4f0bd80] [c0000000000d1d98] .SyS_write+0x54/0x98 [c0000003c4f0be30] [c000000000000580] syscall_exit+0x0/0x2c
Listing the cl_env_* addresses:
cl_env_get+0xec
(gdb) l *cl_env_get+0xec 0xa2f4c is in cl_env_get (/builddir/build/BUILD/lustre-2.3.58/lustre/obdclass/cl_object.c:804). 803 ENTRY; 804 spin_lock(&cl_envs_guard); 805 LASSERT(equi(cl_envs_cached_nr == 0, cfs_list_empty(&cl_envs)));
cl_env_put+0x178
(gdb) l *cl_env_put+0x178 0xa17e8 is in cl_env_put (/builddir/build/BUILD/lustre-2.3.58/lustre/obdclass/cl_object.c:979). 978 (env->le_ses->lc_tags & ~LCT_HAS_EXIT) == LCT_SESSION) { 979 spin_lock(&cl_envs_guard); 980 cfs_list_add(&cle->ce_linkage, &cl_envs);
Dumping the stacks for all active CPUs shows nearly every one of them contending on this lock.
Attachments
Issue Links
- is related to
-
LU-3321 2.x single thread/process throughput degraded from 1.8
-
- Resolved
-
Jinshan, I briefly looked into making the memory allocations atomic under ll_releasepage, but it doesn't look as simple as I hoped. From what I can tell there will potentially by 3 or more memory allocations when creating a new lu_env structure: 1. for the lu_env itself, 2. for the lu_env's keys, 3. for the lu_global_key 4. any allocations for other keys. So I don't think just making steps 1 and 2 atomic will provide us any benefit. We also need to ensure steps 3+ are atomically done as well, right? With that said, I'm not sure making these allocations atomic buys us anything. Shouldn't we just pass in the correct gfp flags to ensure we loop back on ourselves (e.g GFP_NOIO)?
Also, I made a debug patch to simply drop the lu_env altogether from ll_releasepage, and pushed that through Maloo, http://review.whamcloud.com/5847. I didn't get any failures, so that is probably a better approach to spend time on, IMO. Although merging a change like that will involve changes to the debug infrastructure. I opened
LU-3033to facilitate discussion on that topic.