Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Chris Horn <hornc@cray.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/55fb5861-ae41-4fa8-988f-39e94caae705
test_48 failed with the following error:
Timeout occurred after 261 mins, last suite running was conf-sanity
Client unable to connect MDT0:
[12253.240233] Lustre: DEBUG MARKER: == conf-sanity test 48: too many acls on file ======================================================== 00:54:49 (1588899289) [12268.159764] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre [12268.168910] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-70vm11@tcp:/lustre /mnt/lustre [12363.361555] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12513.594687] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12663.827908] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12814.061377] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12964.295004] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13114.528746] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13264.762686] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13414.995666] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13565.229414] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13715.462556] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [14015.929722] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [14015.931840] LustreError: Skipped 1 previous similar message [14616.864428] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [14616.866551] LustreError: Skipped 3 previous similar messages [15217.797969] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [15217.800127] LustreError: Skipped 3 previous similar messages [15818.731170] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [15818.733314] LustreError: Skipped 3 previous similar messages
All the servers showing network errors:
OST 1, OST 2, OST 3, OST 4, OST 5, OST 6, OST 7, OST 8 (trevis-70vm10)
[12109.584743] LNetError: 120-3: Refusing connection from 127.0.0.1 for 127.0.0.2@tcp: No matching NI [12109.586451] LNetError: Skipped 9 previous similar messages [12109.587612] LNetError: 21240:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.2 [12109.589403] LNetError: 21240:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Skipped 9 previous similar messages
MDS 2, MDS 4 (trevis-70vm12)
[12168.724056] LNetError: 120-3: Refusing connection from 127.0.0.1 for 127.0.0.2@tcp: No matching NI [12168.725999] LNetError: Skipped 10 previous similar messages [12168.727047] LNetError: 14056:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.2 [12168.728834] LNetError: 14056:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Skipped 10 previous similar messages [12168.730585] LNetError: 11b-b: Connection to 127.0.0.2@tcp at host 127.0.0.2 on port 7988 was reset: is it running a compatible version of Lustre and is 127.0.0.2@tcp one of its NIDs? [12168.733518] LNetError: Skipped 10 previous similar messages
MDS 1, MDS 3 (trevis-70vm11)
[12067.370561] LustreError: 13b-9: lustre-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. [12112.040026] LNetError: 120-3: Refusing connection from 127.0.0.1 for 127.0.0.2@tcp: No matching NI [12112.041735] LNetError: Skipped 9 previous similar messages [12112.042757] LNetError: 12115:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.2 [12112.044547] LNetError: 12115:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Skipped 9 previous similar messages [12112.046396] LNetError: 11b-b: Connection to 127.0.0.2@tcp at host 127.0.0.2 on port 7988 was reset: is it running a compatible version of Lustre and is 127.0.0.2@tcp one of its NIDs? [12112.049170] LNetError: Skipped 9 previous similar messages
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_48 - Timeout occurred after 261 mins, last suite running was conf-sanity
Attachments
Issue Links
- mentioned in
-
Page Loading...