[LU-12028] operation mds_connect to node failed Created: 27/Feb/19  Updated: 27/Feb/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Campbell Mcleay (Inactive) Assignee: Peter Jones
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We get operation 'mds_connect to node failed' every 10 minutes or so on our OSSs. Example:

/var/log/messages-20190224:Feb 24 02:07:10 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST003e: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52
/var/log/messages-20190224:Feb 24 02:17:35 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST0036: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52
/var/log/messages-20190224:Feb 24 02:28:00 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST0037: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52
/var/log/messages-20190224:Feb 24 02:38:00 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST0038: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52
/var/log/messages-20190224:Feb 24 02:48:25 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST0037: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52
/var/log/messages-20190224:Feb 24 02:58:50 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST0042: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52
/var/log/messages-20190224:Feb 24 03:08:50 foss4 kernel: LustreError: 11-0: foxtrot-MDT0000-lwp-OST0035: operation mds_connect to node 10.21.22.10@tcp failed: rc = -52

 

What could be causing this?

Kind regards,

Campbell

 



 Comments   
Comment by Campbell Mcleay (Inactive) [ 27/Feb/19 ]

Sorry, forgot versions:

2.10.6.2

This has patches as per LU-11826 and LU-11693

Kernel is kernel-3.10.0-514.el7_lustre.x86_64

 

Comment by Campbell Mcleay (Inactive) [ 27/Feb/19 ]

Corresponding messages on the MDS:

 

Feb 24 13:20:05 fmds1 kernel: Lustre: 10400:0:(mdt_handler.c:5340:mdt_connect_internal()) foxtrot-MDT0000: client foxtrot-MDT0000-lwp-OST0007_UUID does not support ibits lock, either very old or an invalid client: flags 0x2041401043000020

Comment by Peter Jones [ 27/Feb/19 ]

Campbell

I think that the issue is that you are carrying the patch for LU-11647 which also needs this patch - https://review.whamcloud.com/#/c/34027/ . We are putting the finishing touches to 2.10.7 which should have everything that you need. Even if you decide to patch immediately I would definitely recommend upgrading to the official release once it becomes available.

Peter

Comment by Campbell Mcleay (Inactive) [ 27/Feb/19 ]

Thanks Peter. I am not sure whether it is actually causing an issue for our workloads. I'm hoping we can go for the official release - any idea roughly on  when it will hit the street?

Cheers,

Campbell

Comment by Peter Jones [ 27/Feb/19 ]

The exact timing depends on testing results but I would hope that the release is out within a couple of weeks.

Generated at Sat Feb 10 02:49:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.