Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-290

Reconnects are not throttled

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • Lustre 2.1.0
    • Lustre 2.0.0
    • None
    • 3
    • 22,423
    • 4933

    Description

      It seems that clients can flood a server with reconnect requests
      when this one is returning EBUSY because it is still processing
      requests from the old connection.

      e.g. seen on 1.8.2 with a cluster having ~800 clients:

      Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect())
      share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
      0xffff81039e70f000; still busy with 1 active RPCs
      Mar 3 20:26:16 md061i kernel: Lustre: 27033:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
      4527 previous similar messages
      Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect())
      share3-MDT0000: refuse reconnection from eb9b2b28-5e23-8dc1-7024-7810bf8a74ff@173.25.10.184@o2ib to
      0xffff81039e70f000; still busy with 1 active RPCs
      Mar 3 20:26:18 md061i kernel: Lustre: 27116:0:(ldlm_lib.c:835:target_handle_connect()) Skipped
      10580 previous similar messages

      From code review, this looks like a side effect of bug 18674.
      Since we now bypass import_select_connection() on EBUSY and EAGAIN,
      ptlrpc_connect_interpret->ptlrpc_maybe_ping_import_soon always triggers
      an immediate ping causing clients to reconnect in a busy loop.
      ------- Comment #1 From Johann Lombardi 2010-03-31 16:01:54

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            laisiyao Lai Siyao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: