[LU-13538] conf-sanity test_48: network issues cause test-suite timeout Created: 08/May/20 Updated: 07/Jun/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Chris Horn <hornc@cray.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/55fb5861-ae41-4fa8-988f-39e94caae705 test_48 failed with the following error: Timeout occurred after 261 mins, last suite running was conf-sanity Client unable to connect MDT0: [12253.240233] Lustre: DEBUG MARKER: == conf-sanity test 48: too many acls on file ======================================================== 00:54:49 (1588899289) [12268.159764] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre [12268.168910] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-70vm11@tcp:/lustre /mnt/lustre [12363.361555] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12513.594687] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12663.827908] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12814.061377] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [12964.295004] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13114.528746] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13264.762686] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13414.995666] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13565.229414] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [13715.462556] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [14015.929722] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [14015.931840] LustreError: Skipped 1 previous similar message [14616.864428] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [14616.866551] LustreError: Skipped 3 previous similar messages [15217.797969] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [15217.800127] LustreError: Skipped 3 previous similar messages [15818.731170] LustreError: 11-0: lustre-MDT0000-mdc-ffff89ffe63b2000: operation mds_connect to node 10.9.4.51@tcp failed: rc = -11 [15818.733314] LustreError: Skipped 3 previous similar messages All the servers showing network errors: [12109.584743] LNetError: 120-3: Refusing connection from 127.0.0.1 for 127.0.0.2@tcp: No matching NI [12109.586451] LNetError: Skipped 9 previous similar messages [12109.587612] LNetError: 21240:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.2 [12109.589403] LNetError: 21240:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Skipped 9 previous similar messages MDS 2, MDS 4 (trevis-70vm12) [12168.724056] LNetError: 120-3: Refusing connection from 127.0.0.1 for 127.0.0.2@tcp: No matching NI [12168.725999] LNetError: Skipped 10 previous similar messages [12168.727047] LNetError: 14056:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.2 [12168.728834] LNetError: 14056:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Skipped 10 previous similar messages [12168.730585] LNetError: 11b-b: Connection to 127.0.0.2@tcp at host 127.0.0.2 on port 7988 was reset: is it running a compatible version of Lustre and is 127.0.0.2@tcp one of its NIDs? [12168.733518] LNetError: Skipped 10 previous similar messages MDS 1, MDS 3 (trevis-70vm11) [12067.370561] LustreError: 13b-9: lustre-OST0000 claims to have registered, but this MGS does not know about it, preventing registration. [12112.040026] LNetError: 120-3: Refusing connection from 127.0.0.1 for 127.0.0.2@tcp: No matching NI [12112.041735] LNetError: Skipped 9 previous similar messages [12112.042757] LNetError: 12115:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.2 [12112.044547] LNetError: 12115:0:(socklnd_cb.c:1808:ksocknal_recv_hello()) Skipped 9 previous similar messages [12112.046396] LNetError: 11b-b: Connection to 127.0.0.2@tcp at host 127.0.0.2 on port 7988 was reset: is it running a compatible version of Lustre and is 127.0.0.2@tcp one of its NIDs? [12112.049170] LNetError: Skipped 9 previous similar messages VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |