Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17514

hint for expected number of connected clients

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.17.0
    • 3
    • 9223372036854775807

    Description

      It would be useful to have a variable that can be set early in startup (e.g. libcfs module parameter that is also a tunable parameter) that gives the system a hint on how many clients will be mounting the filesystem.

      While we try to tune filesystem parameters dynamically, sometimes that is complex to get right from the start, and by the time 1000 or 10000 clients have connected, then the values used when 10 or 100 clients had mounted are no longer optimal. For example, conns_per_peer at the LNet level could be 4 for 100 clients but should be 1 for 10000 clients, otherwise the servers can run out of TCP ports and have too many open sockets. Similarly, at_min should be low (5s) for smaller clusters for faster recovery, but with 10000 clients it should be larger (15s+) (LU-12064), but since it is only set at mount time there is no easy way to update it on remote clients afterward.

      Having a simple parameter set in /etc/modprobe.d/lustre.conf, like:

      options libcfs expected_clients=10000
      

      gives us a ballpark figure to work with. That doesn't obviate the need for dynamic tuning of parameters at runtime, but establishes some expectations for what is not obvious when the first 10 (of 10000) clients are mounting the filesystem.

      Even better would be if the targets saved the maximum number of connected clients locally (e.g. every 128 client connections) so that it could read this from the target when it is first mounting. That avoids the requirement for the admin to specify expected_clients, though it would be useful to have both.

      Attachments

        Issue Links

          Activity

            [LU-17514] hint for expected number of connected clients
            adilger Andreas Dilger added a comment - - edited

            Note that expected_clients should never be mistaken as "maximum possible number of clients" or "maximum number of mountpoints", but rather only an order-of-magnitude estimate for how many clients might be expected, and automatically tune parameters based on that expectation.

            There may be noticeably more or fewer clients establishing LNet connections to the servers and mounting the filesystem:

            • 2-10x as many mountpoints because of subdirectory mounting
            • 2-16x as many socklnd connections because of multi-rail and conns_per_peer
            • far fewer mounts/connections because of client automounting and idle_disconnect
            • far fewer connections because of LNet routers aggregating client connections

            but it is only intended to be an order-of-magnitude guideline and not a limit of any kind.

            adilger Andreas Dilger added a comment - - edited Note that expected_clients should never be mistaken as "maximum possible number of clients" or "maximum number of mountpoints", but rather only an order-of-magnitude estimate for how many clients might be expected, and automatically tune parameters based on that expectation. There may be noticeably more or fewer clients establishing LNet connections to the servers and mounting the filesystem: 2-10x as many mountpoints because of subdirectory mounting 2-16x as many socklnd connections because of multi-rail and conns_per_peer far fewer mounts/connections because of client automounting and idle_disconnect far fewer connections because of LNet routers aggregating client connections but it is only intended to be an order-of-magnitude guideline and not a limit of any kind.

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: