[LU-9527] Interop 2.9.0<->master conf-sanity test_77: start fs2ost failed Created: 18/May/17  Updated: 08/May/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Casper Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

trevis-40, interop
EL7, master branch, v2.9.57, b3575 clients
EL7, ldiskfs, b2_9 branch, v2.9.0, b22 servers


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/2209839b-42d4-4fe6-91f1-96f9ce3c5a69

Looks like network errors might have stopped the mount:

From OST console:

18:17:28:[ 9857.577430] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI
18:17:28:[ 9857.579761] LNetError: 3964:0:(socklnd_cb.c:1723:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1
18:17:28:[ 9857.582176] LNetError: 11b-b: Connection to 0.0.0.0@tcp at host 0.0.0.0 on port 7988 was reset: is it running a compatible version of Lustre and is 0.0.0.0@tcp one of its NIDs?
18:17:28:[ 9867.576301] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI
18:17:28:[ 9867.578687] LNetError: 3965:0:(socklnd_cb.c:1723:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1
18:17:28:[ 9867.581126] LNetError: 11b-b: Connection to 0.0.0.0@tcp at host 0.0.0.0 on port 7988 was reset: is it running a compatible version of Lustre and is 0.0.0.0@tcp one of its NIDs?
18:17:28:[ 9868.603119] LustreError: 15f-b: test1234-OST0000: cannot register this server with the MGS: rc = -110. Is the MGS running?
18:17:28:[ 9868.607631] LustreError: 25737:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -110
18:17:28:[ 9868.611737] LustreError: 25737:0:(obd_mount_server.c:1558:server_put_super()) no obd test1234-OST0000
18:17:28:[ 9868.614276] LustreError: 25737:0:(obd_mount_server.c:136:server_deregister_mount()) test1234-OST0000 not registered
18:17:28:[ 9868.630914] LustreError: 25737:0:(obd_mount.c:1449:lustre_fill_super()) Unable to mount  (-110)

From test_log:

Starting fs2ost:   /dev/lvm-Role_OSS/S1 /mnt/lustre-fs2ost
CMD: trevis-40vm8 mkdir -p /mnt/lustre-fs2ost; mount -t lustre   		                   /dev/lvm-Role_OSS/S1 /mnt/lustre-fs2ost
trevis-40vm8: mount.lustre: mount /dev/mapper/lvm--Role_OSS-S1 at /mnt/lustre-fs2ost failed: Connection timed out
Start of /dev/lvm-Role_OSS/S1 on fs2ost failed 110
 conf-sanity test_77: @@@@@@ FAIL: start fs2ost failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4939:error()
  = /usr/lib64/lustre/tests/conf-sanity.sh:5377:test_77()
  = /usr/lib64/lustre/tests/test-framework.sh:5215:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5254:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5101:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:5384:main()

Generated at Sat Feb 10 02:26:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.