[LU-290] Reconnects are not throttled Created: 06/May/11  Updated: 16/Aug/16  Due: 21/May/11  Resolved: 16/Aug/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Lai Siyao Assignee: Lai Siyao
Resolution: Won't Fix Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 22,423
Epic: connect, ping
Rank (Obsolete): 4933

 Description   

It seems that clients can flood a server with reconnect requests
when this one is returning EBUSY because it is still processing
requests from the old connection.

e.g. seen on 1.8.2 with a cluster having ~800 clients:

Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect())
share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
0xffff81039e70f000; still busy with 1 active RPCs
Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
4527 previous similar messages
Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect())
share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
0xffff81039e70f000; still busy with 1 active RPCs
Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
10580 previous similar messages

From code review, this looks like a side effect of bug 18674.
Since we now bypass import_select_connection() on EBUSY and EAGAIN,
ptlrpc_connect_interpret->ptlrpc_maybe_ping_import_soon always triggers
an immediate ping causing clients to reconnect in a busy loop.
------- Comment #1 From Johann Lombardi 2010-03-31 16:01:54



 Comments   
Comment by Peter Jones [ 06/May/11 ]

Lai

Just to warn you on this one - Oleg was not sure whether this would even be a problem on master so the first step is to establish whether this is before investing time in trying to port the patch

Regards

Peter

Comment by Lai Siyao [ 08/May/11 ]

I see, I'll investigate first.

Comment by Lai Siyao [ 14/Jun/11 ]

The comments in bz22423 and current code shows a patch for 2.x was committed, but caused a conf-sanity.sh failure, and then reverted.

I'll do some test and find out the cause of that failure.

Comment by Lai Siyao [ 01/Aug/11 ]

Autotest result looks normal, the patch will be put to review.

Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/ptlrpc/import.c
  • lustre/tests/conf-sanity.sh
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » i686,server,el5,ofa #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/ptlrpc/import.c
  • lustre/tests/conf-sanity.sh
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Build Master (Inactive) [ 04/Aug/11 ]

Integrated in lustre-master » i686,client,el5,ofa #241
LU-290 Reconnects are not throttled

Oleg Drokin : 86b2211e55dcc509da85b21ece8830e2a9b70db1
Files :

  • lustre/tests/conf-sanity.sh
  • lustre/ptlrpc/import.c
Comment by Alex Zhuravlev [ 09/Aug/11 ]

please see this:

https://maloo.whamcloud.com/test_sets/e098dda6-c262-11e0-8bdf-52540025f9af

~30K messages "recovery is timed out, evict stale exports" in conf-sanity.test_47.console.client-9-ib.log

Comment by Li Wei (Inactive) [ 24/Aug/11 ]

Several occurrences on Orion with the crazy "recovery is timed out, evict stale exports" flood:

https://maloo.whamcloud.com/test_sets/8d84d62e-ce24-11e0-8d02-52540025f9af (8c8e6dc)
https://maloo.whamcloud.com/test_sets/15d34bc8-cf7e-11e0-8d02-52540025f9af (448fc34)

Comment by Li Wei (Inactive) [ 15/Jan/12 ]

Commit: 526c43ec2e47ead878f0df552b74c78b4fc79d1f (Jan 13, 2012)
Maloo: https://maloo.whamcloud.com/test_sets/6638310e-3f5f-11e1-990e-5254004bbbd3

Another flood.

Comment by James A Simmons [ 16/Aug/16 ]

Old ticket for unsupported version

Generated at Sat Feb 10 01:05:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.