[LU-290] Reconnects are not throttled - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Minor
Fix Version/s: Lustre 2.1.0
Affects Version/s: Lustre 2.0.0
Labels:
None

Severity:
3
Bugzilla ID:
22,423
Epic:
- connect
- ping
Rank (Obsolete):
4933

Description

It seems that clients can flood a server with reconnect requests
when this one is returning EBUSY because it is still processing
requests from the old connection.

e.g. seen on 1.8.2 with a cluster having ~800 clients:

Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect())
share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
0xffff81039e70f000; still busy with 1 active RPCs
Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
4527 previous similar messages
Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect())
share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
0xffff81039e70f000; still busy with 1 active RPCs
Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
10580 previous similar messages

From code review, this looks like a side effect of bug 18674.
Since we now bypass import_select_connection() on EBUSY and EAGAIN,
ptlrpc_connect_interpret->ptlrpc_maybe_ping_import_soon always triggers
an immediate ping causing clients to reconnect in a busy loop.
------- Comment #1 From Johann Lombardi 2010-03-31 16:01:54

Attachments

Activity

People

Assignee:: Lai Siyao

Reporter:: Lai Siyao

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Due:: 21/May/11

Created:: 06/May/11 10:03 PM

Updated:: 16/Aug/16 4:32 PM

Resolved:: 16/Aug/16 4:32 PM