Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.12.8
-
lustre-2.12.8_6.llnl
3.10.0-1160.53.1.1chaos.ch6.x86_64
RHEL7.9
zfs-0.7.11-9.8llnl
-
3
-
9223372036854775807
Description
We upgraded a lustre server cluster from lustre-2.12.7_2.llnl to lustre-2.12.8_6.llnl.
The node on which the MGS runs, copper1, began reporting "new MDS connections" from NIDs that are assigned to client nodes:
Lustre: MGS: Received new MDS connection from 192.168.128.68@o2ib38, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.128.8@o2ib42, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.131.78@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.132.204@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.134.127@o2ib27, keep former export from same NID
Clients connect flags includes "mds_mds_connection":
[root@quartz7:lustre]# head */*/connect_flags ==> mgc/MGC172.19.3.1@o2ib600/connect_flags <== flags=0x2000011005002020 flags2=0x0 version barrier adaptive_timeouts mds_mds_connection full20 imp_recov bulk_mbits
The clients are running lustre lustre-2.12.7_2.llnl, which does not have "LU-13356 client: don't use OBD_CONNECT_MNE_SWAB".
Shutting down the servers and restoring them to lustre-2.12.7_2.llnl did not change the symptoms.
Patch stacks are:
https://github.com/LLNL/lustre/releases/tag/2.12.8_6.llnl
https://github.com/LLNL/lustre/releases/tag/2.12.7_2.llnl
Seen during the same lustre server update where we saw LU-15541 but appears to be a separate issue
Attachments
Issue Links
- is related to
-
LU-13356 lctl conf_param hung on the MGS node
-
- Resolved
-
Activity
Link | Original: This issue is related to JFC-21 [ JFC-21 ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Labels | Original: llnl topllnl | New: llnl |
Priority | Original: Critical [ 2 ] | New: Minor [ 4 ] |
Description |
Original:
The node on which the MGS runs, copper1, reported "new MDS connections" from NIDs that are assigned to client nodes:
{noformat} Lustre: MGS: Received new MDS connection from 192.168.128.68@o2ib38, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.128.8@o2ib42, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.131.78@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.132.204@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.134.127@o2ib27, keep former export from same NID {noformat} Clients connect flags includes "mds_mds_connection": {noformat} [root@quartz7:lustre]# head */*/connect_flags ==> mgc/MGC172.19.3.1@o2ib600/connect_flags <== flags=0x2000011005002020 flags2=0x0 version barrier adaptive_timeouts mds_mds_connection full20 imp_recov bulk_mbits {noformat} The clients are running lustre lustre-2.12.7_2.llnl, which does not have " Shutting down the servers and restoring them to lustre-2.12.7_2.llnl did not change the symptoms. Patch stacks are: [https://github.com/LLNL/lustre/releases/tag/2.12.8_6.llnl] [https://github.com/LLNL/lustre/releases/tag/2.12.7_2.llnl] Seen during the same lustre server update where we saw |
New:
We upgraded a lustre server cluster from lustre-2.12.7_2.llnl to lustre-2.12.8_6.llnl.
The node on which the MGS runs, copper1, began reporting "new MDS connections" from NIDs that are assigned to client nodes: {noformat} Lustre: MGS: Received new MDS connection from 192.168.128.68@o2ib38, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.128.8@o2ib42, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.131.78@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.132.204@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.134.127@o2ib27, keep former export from same NID {noformat} Clients connect flags includes "mds_mds_connection": {noformat} [root@quartz7:lustre]# head */*/connect_flags ==> mgc/MGC172.19.3.1@o2ib600/connect_flags <== flags=0x2000011005002020 flags2=0x0 version barrier adaptive_timeouts mds_mds_connection full20 imp_recov bulk_mbits {noformat} The clients are running lustre lustre-2.12.7_2.llnl, which does not have " Shutting down the servers and restoring them to lustre-2.12.7_2.llnl did not change the symptoms. Patch stacks are: [https://github.com/LLNL/lustre/releases/tag/2.12.8_6.llnl] [https://github.com/LLNL/lustre/releases/tag/2.12.7_2.llnl] Seen during the same lustre server update where we saw |
Description |
Original:
The node on which the MGS runs, copper1, reported "new MDS connections" from NIDs that are assigned to client nodes:
{noformat} Lustre: MGS: Received new MDS connection from 192.168.128.68@o2ib38, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.128.8@o2ib42, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.131.78@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.132.204@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.134.127@o2ib27, keep former export from same NID {noformat} Clients connect flags includes "mds_mds_connection": {noformat} [root@quartz7:lustre]# head */*/connect_flags ==> mgc/MGC172.19.3.1@o2ib600/connect_flags <== flags=0x2000011005002020 flags2=0x0 version barrier adaptive_timeouts mds_mds_connection full20 imp_recov bulk_mbits {noformat} The clients are running lustre lustre-2.12.7_2.llnl, which does not have " Shutting down the servers and restoring them to lustre-2.12.7_2.llnl did not change the symptoms. Patch stacks are: [https://github.com/LLNL/lustre/releases/tag/2.12.8_6.llnl] [https://github.com/LLNL/lustre/releases/tag/2.12.7_2.llnl] |
New:
The node on which the MGS runs, copper1, reported "new MDS connections" from NIDs that are assigned to client nodes:
{noformat} Lustre: MGS: Received new MDS connection from 192.168.128.68@o2ib38, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.128.8@o2ib42, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.131.78@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.132.204@o2ib39, keep former export from same NID Lustre: MGS: Received new MDS connection from 192.168.134.127@o2ib27, keep former export from same NID {noformat} Clients connect flags includes "mds_mds_connection": {noformat} [root@quartz7:lustre]# head */*/connect_flags ==> mgc/MGC172.19.3.1@o2ib600/connect_flags <== flags=0x2000011005002020 flags2=0x0 version barrier adaptive_timeouts mds_mds_connection full20 imp_recov bulk_mbits {noformat} The clients are running lustre lustre-2.12.7_2.llnl, which does not have " Shutting down the servers and restoring them to lustre-2.12.7_2.llnl did not change the symptoms. Patch stacks are: [https://github.com/LLNL/lustre/releases/tag/2.12.8_6.llnl] [https://github.com/LLNL/lustre/releases/tag/2.12.7_2.llnl] Seen during the same lustre server update where we saw |
Updated all the clients to 2.12.8_6.llnl (or later) with patch "
LU-13356client: don't use OBD_CONNECT_MNE_SWAB".Then updated the servers to 2.12.8_6.llnl (or later) with that patch.
No longer seeing inappropriate "Received new MDS connection" messages on bringup.