Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
Lustre 2.0.0
-
None
Description
It seems that clients can flood a server with reconnect requests
when this one is returning EBUSY because it is still processing
requests from the old connection.
e.g. seen on 1.8.2 with a cluster having ~800 clients:
Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect())
share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
0xffff81039e70f000; still busy with 1 active RPCs
Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
4527 previous similar messages
Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect())
share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
0xffff81039e70f000; still busy with 1 active RPCs
Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
10580 previous similar messages
From code review, this looks like a side effect of bug 18674.
Since we now bypass import_select_connection() on EBUSY and EAGAIN,
ptlrpc_connect_interpret->ptlrpc_maybe_ping_import_soon always triggers
an immediate ping causing clients to reconnect in a busy loop.
------- Comment #1 From Johann Lombardi 2010-03-31 16:01:54