[LU-1269] speed up ASTs sending Created: 29/Mar/12 Updated: 08/Feb/18 Resolved: 08/Feb/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Iurii Golovach (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Bugzilla ID: | 24,450 |
| Rank (Obsolete): | 9740 |
| Description |
|
The goal of this ticket is to land into WC 1.8 branch next patches from Vladimir Saveliev (Oracle): https://bugzilla.lustre.org/attachment.cgi?id=33145 (details of these patches are in the https://bugzilla.lustre.org/show_bug.cgi?id=24450). Patches summary: ldlm_run_bl_ast_work() sends ASTs in set of PARALLEL_AST_LIMIT This patch changes ldlm_run_bl_ast_work() so that having sent one This patch uses posibility to specify wait condition for |
| Comments |
| Comment by Iurii Golovach (Inactive) [ 29/Mar/12 ] |
|
http://review.whamcloud.com/#change,2406 - link on the submitted patch |
| Comment by Andreas Dilger [ 30/Mar/12 ] |
|
Jinshan, didn't something similar to this already get implemented for 2.2? I'd prefer to keep the implementations on 1.8 and 2.x as close as possible to avoid future complications with other patches that affect the same code. |
| Comment by Jinshan Xiong (Inactive) [ 30/Mar/12 ] |
|
Yes, I think we have done the similar thing. Sorry about that. |
| Comment by Iurii Golovach (Inactive) [ 30/Mar/12 ] |
|
Andreas, Jinshan, do you mean that there is a plan to port your changes with such functionality from 2.2 into the 1.8? If yes - let me know the ticket where this is tracked and we may close this one then. |
| Comment by Peter Jones [ 16/Apr/12 ] |
|
No there are no plans to backport new features to b1_8. We are landing bugfixes only into b1_8 and new feature development is limited to master |
| Comment by Iurii Golovach (Inactive) [ 18/Apr/12 ] |
|
Peter, this ticket is about landing bugfixes which are committed on review at http://review.whamcloud.com/#change,2406 it's NOT about back-porting. Please, don't close this ticket like "Won't Fix" since these fixes require landing into 1.8 branch. Thank you, |
| Comment by Shuichi Ihara (Inactive) [ 09/May/12 ] |
|
Hi, we also got very similar problem on lustre-1.8.7-wc1 too, and MDS hanged. Apr 23 15:58:34 ALPL505 kernel: Call Trace: Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88953a00>] ldlm_expired_completion_wait+0x0/0x250 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88955542>] ldlm_completion_ast+0x4c2/0x880 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8893a709>] ldlm_lock_enqueue+0x9d9/0xb20 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8008e421>] default_wake_function+0x0/0xe Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88935b6a>] ldlm_lock_addref_internal_nolock+0x3a/0x90 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889540bb>] ldlm_cli_enqueue_local+0x46b/0x520 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88caa157>] enqueue_ordered_locks+0x387/0x4d0 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889519a0>] ldlm_blocking_ast+0x0/0x2a0 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88955080>] ldlm_completion_ast+0x0/0x880 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88caa8e9>] mds_get_parent_child_locked+0x649/0x960 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88c9b652>] mds_getattr_lock+0x632/0xc90 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88c96dda>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88ca1d83>] mds_intent_policy+0x623/0xc20 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8893c270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88939eb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889367fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8895e870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8895bb39>] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88ca0b30>] mds_handle+0x40e0/0x4d10 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff800774ed>] smp_send_reschedule+0x4e/0x53 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8008ddcd>] enqueue_task+0x41/0x56 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8897fd55>] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889896d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88989e35>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8008c85d>] __wake_up_common+0x3e/0x68 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8898adc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88989e60>] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 |
| Comment by Andreas Dilger [ 23/May/12 ] |
|
Reopening issue due to problem reports hit on 1.8. Jinshan, can you please find the patch set for master that resolved this problem? I believe it was one of the early patches in the Imperative Recovery feature. |
| Comment by Cory Spitz [ 23/May/12 ] |
|
Also, it might be worthwhile to hear from Johann. I had a conversation with him and he suggested that b1_8 might be better off simply by removing the PARALLEL_AST_LIMIT. Cray has been using the patches listed in the description from bz 24450. I'm not sure what the correct approach should be for b1_8 though. |
| Comment by Jinshan Xiong (Inactive) [ 23/May/12 ] |
|
the hash # in master is: 0bd27be7f20a671e7128f341a070838a2bd318dc and johann is working on an improvement at: http://review.whamcloud.com/2650 and you might be interested. |
| Comment by Cory Spitz [ 23/May/12 ] |
|
Thanks, Jinshan. Change #2650/ BTW, http://jira.whamcloud.com/browse/LU-571, http://review.whamcloud.com/#change,1190, and http://review.whamcloud.com/#change,1608 are a few handy links for master commit 0bd27be7f20a671e7128f341a070838a2bd318dc. |
| Comment by Nathan Rutman [ 21/Nov/12 ] |
|
Xyratex-bug-id: MRP-478 |
| Comment by Jinshan Xiong (Inactive) [ 08/Feb/18 ] |
|
close old tickets |