[LU-7690] sanity-lfsck: couldn't mount ost Created: 20/Jan/16  Updated: 16/Jan/19  Resolved: 16/Jan/19

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Yang Sheng
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

server: lustre-master build#3305 RHEL7.1
client: lustre-master build#3305 RHEL6.7


Issue Links:
Related
is related to LU-6650 sanity-lfsck: Mount OST failed Resolved
is related to LU-10045 sanity-lfsck no sub tests failed Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c272f136-bbb9-11e5-a592-5254006e85c2.

no logs

CMD: shadow-20vm4 mkdir -p /mnt/mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
CMD: shadow-20vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4 
shadow-20vm4: mpi/openmpi-x86_64(5):ERROR:150: Module 'mpi/openmpi-x86_64' conflicts with the currently loaded module(s) 'mpi/compat-openmpi16-x86_64'
shadow-20vm4: mpi/openmpi-x86_64(5):ERROR:102: Tcl command execution failed: conflict		mpi
shadow-20vm4: 
CMD: shadow-20vm4 e2label /dev/lvm-Role_MDS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: shadow-20vm4 e2label /dev/lvm-Role_MDS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: shadow-20vm4 e2label /dev/lvm-Role_MDS/P1 2>/dev/null
Started lustre-MDT0000
CMD: shadow-20vm3 mkdir -p /mnt/ost1
CMD: shadow-20vm3 test -b /dev/lvm-Role_OSS/P1
Starting ost1:   /dev/lvm-Role_OSS/P1 /mnt/ost1
CMD: shadow-20vm3 mkdir -p /mnt/ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/ost1
shadow-20vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P1 at /mnt/ost1 failed: Cannot send after transport endpoint shutdown


 Comments   
Comment by Sarah Liu [ 20/Jan/16 ]

this issue also hit on saniyn and sanity-hsm, it blocks these 3 tests

Comment by Peter Jones [ 22/Jan/16 ]

Yang Sheng

Could you please look into this issue?

Peter

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for interop tag 2.7.66 - EL7 Server/2.7.1 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/b0e34a2c-cc91-11e5-b80c-5254006e85c2

Another instance found for interop tag 2.7.66 - EL6.7 Server/2.7.1 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/55c68b16-cc98-11e5-b80c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/55b4ae1e-cc98-11e5-b80c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/559ab798-cc98-11e5-b80c-5254006e85c2

Another instance found for interop tag 2.7.66 - EL7 Server/2.5.5 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/799447ec-cc46-11e5-901d-5254006e85c2
https://testing.hpdd.intel.com/test_sets/7986c66c-cc46-11e5-901d-5254006e85c2
https://testing.hpdd.intel.com/test_sets/79768c7a-cc46-11e5-901d-5254006e85c2

Another instance found for Full tag 2.7.66 - EL6.7 Server/EL6.7 Client - DNE, build# 3314
https://testing.hpdd.intel.com/test_sets/736c38b2-ca83-11e5-9215-5254006e85c2

Comment by Sarah Liu [ 18/Feb/16 ]

So I resubmit the request on onyx to rerun the tests and didn't hit this issue:
testing ran on 2/13/2016
https://testing.hpdd.intel.com/test_sessions/1eb9ee30-d380-11e5-bf08-5254006e85c2

Just talked with Saurabh, he still saw the issue recently:
testing also ran on 2/13/2016
https://testing.hpdd.intel.com/test_sessions/93baffee-d2ae-11e5-8697-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 18/Feb/16 ]

Another instance of the above failure:
https://testing.hpdd.intel.com/test_sets/84c4f3c6-d530-11e5-bc47-5254006e85c2

Also, the result mentioned above by Sarah ran on 0/13/2016 for tag 2.7.90 ran on onyx as well and still failed.
https://testing.hpdd.intel.com/test_sessions/93baffee-d2ae-11e5-8697-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found for interop - EL7 Server/2.5.5 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/93baffee-d2ae-11e5-8697-5254006e85c2

Comment by James Nunez (Inactive) [ 14/Sep/16 ]

I have a similar failure with logs at https://testing.hpdd.intel.com/test_sets/cb9dc566-79a4-11e6-8a8c-5254006e85c2.

The suite_stdout log has the same error:

04:22:09:Starting ost1:   /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
04:22:09:CMD: onyx-38vm8 mkdir -p /mnt/lustre-ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
04:22:10:onyx-38vm8: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P1 at /mnt/lustre-ost1 failed: Cannot send after transport endpoint shutdown
04:22:10:sanity-lfsck returned 0

In the logs for the OST (vm8), there's an interesting error when trying to mount the OST

04:22:16:[  402.226223] Lustre: Evicted from MGS (at 10.2.4.171@tcp) after server handle changed from 0xbc1d6e975ee7dac5 to 0xbc1d6e975ee7e8d3
Comment by Yang Sheng [ 16/Jan/19 ]

Please reopen it if hit again.

Generated at Sat Feb 10 02:11:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.