[LU-12293] Memory leak after router checker packet processing Created: 13/May/19 Updated: 30/Aug/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Tatsushi Takamura | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Epic/Theme: | lnet |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
If net_monitor_thr is stopped with a condition that router checker packet is waiting for retry, As a workaround, we correct to wait for completion of router checker shutdown(TIMEOUT is 10sec x 2). After that, purge retry packet.
diff --git a/lnet/lnet/lib-move.c b/lnet/lnet/lib-move.c
index 5e990d9..3b16d89 100644
--- a/lnet/lnet/lib-move.c
+++ b/lnet/lnet/lib-move.c
@@ -3682,6 +3682,14 @@ void lnet_monitor_thr_stop(void)
/* tell the monitor thread that we're shutting down */
wake_up(&the_lnet.ln_mt_waitq);
+ /* wait tx completion for router checker */
+ if (atomic_read(&the_lnet.ln_routers_nsends)) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(cfs_time_seconds(lnet_get_lnd_timeout() * 2));
+ }
+ /* purge resend messages */
+ lnet_clean_resendqs();
+
/* block until monitor thread signals that it's done */
down(&the_lnet.ln_mt_signal);
LASSERT(the_lnet.ln_mt_state == LNET_MT_STATE_SHUTDOWN);
@@ -3691,7 +3699,6 @@ void lnet_monitor_thr_stop(void)
lnet_rsp_tracker_clean();
lnet_clean_local_ni_recoveryq();
lnet_clean_peer_ni_recoveryq();
- lnet_clean_resendqs();
rc = LNetEQFree(the_lnet.ln_mt_eqh);
LASSERT(rc == 0);
return;
|
| Comments |
| Comment by Amir Shehata (Inactive) [ 16/May/19 ] |
|
Please take a look at the below patches. They are all part of the multi-rail branch. https://review.whamcloud.com/#/c/34445/4 https://review.whamcloud.com/#/c/34477/5
|
| Comment by Tatsushi Takamura [ 30/Aug/19 ] |
|
Amir Shehata,
Sorry for the late reply. We are going to check these patches. |