Details
Description
When using ssh as the remote shell for zfs-based lustre snapshots, servers with large numbers of targets may fail to create snapshots on all relevant zfs datasets. A new ssh connection is attempted for each target, and when the total reaches the default maximum number of concurrent unauthenticated connections (10), there is a chance the next connection will be killed.
The solution suggested for LU-9368 works for me: make sure that the MaxStartups value in /etc/ssh/sshd_config is at larger than the number of targets on the server.
Can a note about this appear in the documentation or man pages? It took me an embarrassingly long time to track it down.