[LU-9368] snapshot testing: ssh_exchange_identification: Connection closed by remote host Created: 20/Apr/17  Updated: 04/Dec/17  Resolved: 04/Dec/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Saurabh Tandan (Inactive) Assignee: nasf (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

1 Client, 1 OST, 1/2/4/8/16 MDTs
master, build# 3550


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Scalability testing for 1/2/4/8/16 MDTs
Works perfectly fine till 8 MDTs
As moved to 16 MDTs saw following messages:
Create snapshot for 16 MDTs

[root@eagle-52vm2 ~]# lctl snapshot_create -c TEST5 -F lustre -n MDT16
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
[root@eagle-52vm2 ~]# lctl snapshot_list -F lustre

filesystem_name: lustre
snapshot_name: MDT16
create_time: Thu Apr 20 05:28:21 2017
modify_time: Thu Apr 20 05:28:21 2017
comment: TEST5 
snapshot_fsname: 5594a65 
status: not mount

filesystem_name: lustre
snapshot_name: MDT8
modify_time: Thu Apr 20 05:14:13 2017
create_time: Thu Apr 20 05:14:13 2017
snapshot_fsname: 58754c5 
comment: TEST4 
status: not mount
[root@eagle-52vm2 ~]# lctl snapshot_destroy -f -F lustre -n MDT8
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
[root@eagle-52vm2 ~]# lctl snapshot_list -F lustre

filesystem_name: lustre
snapshot_name: MDT16
create_time: Thu Apr 20 05:28:21 2017
modify_time: Thu Apr 20 05:28:21 2017
comment: TEST5 
snapshot_fsname: 5594a65 
status: not mount
[root@eagle-52vm2 ~]# lctl snapshot_destroy -F lustre -n MDT16
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
Miss snapshot piece on the MDT0003. Use '-f' option if want to destroy it by force.
Can't destroy the snapshot MDT16
[root@eagle-52vm2 ~]# lctl snapshot_destroy -f -F lustre -n MDT16
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
[root@eagle-52vm2 ~]# lctl snapshot_list -F lustre

Even though the snapshot is created on creation and destroyed whenever the appropriate command is used, but still get the connection closed by remote host message. Only seen for 16 MDTs configuration, not for 1/2/4/8 MDTs.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 20/Apr/17 ]

Hi Fan Yong,

Can you please look into this snapshot issue?

Thanks.
Joe

Comment by nasf (Inactive) [ 20/Apr/17 ]

What is the output "lctl snapshot_list -F lustre -n MDT16 --detail"?

Comment by nasf (Inactive) [ 20/Apr/17 ]

Would you please to check whether you can "ssh" from current node to all other Lustre servers without password? Thanks!

Comment by Saurabh Tandan (Inactive) [ 20/Apr/17 ]

Yes I can ssh from MDS to OSS without password. Is vice-versa also required? Because I have not set up password less ssh from OSS to MDS.
Earlier I had destroyed the snapshot for MDT16, so in order to give you details of "lctl snapshot_list -F lustre -n MDT16 --detail" i will have to re-create the snapshot. But when I am are-creating it I get following:

[root@eagle-52vm2 ~]# lctl snapshot_create -c TEST5 -F lustre -n MDT16
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
cannot create snapshot 'lustre-mdt2/mdt2@MDT16': dataset already exists
cannot create snapshot 'lustre-mdt11/mdt11@MDT16': dataset already exists
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
could not find any snapshots to destroy; check snapshot names.
could not find any snapshots to destroy; check snapshot names.
Can't create the snapshot MDT16
[root@eagle-52vm2 ~]# lctl snapshot_list -F lustre -n MDT16 --detail
cannot open 'lustre-mdt1/mdt1@MDT16': dataset does not exist
Can't list the snapshot MDT16
Comment by nasf (Inactive) [ 20/Apr/17 ]

It only requires the auto ssh from current node to all Lustre servers (MGS/MDS/OSS).
Would you please to use another name for another try, such as DMT16_2?

Comment by Saurabh Tandan (Inactive) [ 20/Apr/17 ]

Tried with another name. Got same result:

[root@eagle-52vm2 ~]# lctl snapshot_create -c TEST5 -F lustre -n MDT16_2
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
could not find any snapshots to destroy; check snapshot names.
Can't create the snapshot MDT16_2
[root@eagle-52vm2 ~]# lctl snapshot_list -F lustre -n MDT16_2 --detail
cannot open 'lustre-mdt1/mdt1@MDT16_2': dataset does not exist
Can't list the snapshot MDT16_2
Comment by nasf (Inactive) [ 20/Apr/17 ]

edit /etc/ssh/sshd_config, enable MaxStartups and set it as a larger value, such 128, then restart "sshd"

Comment by Saurabh Tandan (Inactive) [ 21/Apr/17 ]

Did as above but still got the following:

[root@eagle-52vm2 ~]# lctl snapshot_create -c TEST5 -F lustre -n MDT16
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
Can't create the snapshot MDT16
[root@eagle-52vm2 ~]# 
Comment by nasf (Inactive) [ 27/Apr/17 ]

It seems related with the 'ssh' configuration. Would you please to verify how many pure ssh connection (without lsnapshot) you can establish with the Lustre server nodes in parallel? Thanks!

Comment by nasf (Inactive) [ 18/May/17 ]

Have you restarted the sshd service after changed the "MaxStartups" ?
Would you please to verify how many pure ssh connection (without lsnapshot) you can establish with the Lustre server nodes in parallel?

Thanks!

Comment by nasf (Inactive) [ 13/Jun/17 ]

Saurabh,

Any feedback for this? Thanks!

Generated at Sat Feb 10 02:25:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.