[LU-9707] Failover: recovery-random-scale test_fail_client_mds: Restart of mds1 failed! Created: 23/Jun/17  Updated: 20/Nov/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.5, Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Failover
Server: RHEL 7, 3603, master
Client: SLES12.2 (and RHEL 7), 3603, master


Issue Links:
Related
is related to LU-11795 replay-vbr test 8b fails with 'Restar... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/55c95900-576f-11e7-9221-5254006e85c2.

The sub-test test_fail_client_mds failed with the following error:

Restart of mds1 failed!

test logs:

CMD: trevis-7vm12 hostname
mount facets: mds1
CMD: trevis-7vm12 test -b /dev/lvm-Role_MDS/P1
CMD: trevis-7vm12 e2label /dev/lvm-Role_MDS/P1
trevis-7vm12: e2label: No such file or directory while trying to open /dev/lvm-Role_MDS/P1
trevis-7vm12: Couldn't find valid filesystem superblock.
Starting mds1:   -o loop /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
CMD: trevis-7vm12 mkdir -p /mnt/lustre-mds1; mount -t lustre   -o loop 		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
trevis-7vm12: mount: /dev/lvm-Role_MDS/P1: failed to setup loop device: No such file or directory
Start of /dev/lvm-Role_MDS/P1 on mds1 failed 32
 recovery-random-scale test_fail_client_mds: @@@@@@ FAIL: Restart of mds1 failed! 


 Comments   
Comment by James Casper [ 26/Sep/17 ]

2.10.1:
https://testing.hpdd.intel.com/test_sessions/2ecbd0a0-6511-479f-b014-0222fd7009b1

Comment by James Nunez (Inactive) [ 14/Mar/19 ]

We have a very similar failure with mmp test 6 and replay-single test 0a for 2.10.7 RC1. Logs are at https://testing.whamcloud.com/test_sets/8c397c98-429d-11e9-a256-52540065bddc and https://testing.whamcloud.com/test_sets/8d508356-429d-11e9-a256-52540065bddc, respectively.

Comment by James Nunez (Inactive) [ 19/Nov/19 ]

Since this ticket is old ... Just an update that we are still seeing this issue with RHEL 7.7 servers and clients during failover testing. Here's a link to a recent failure https://testing.whamcloud.com/test_sets/d9837c4a-07b6-11ea-bbc3-52540065bddc.

Generated at Sat Feb 10 02:28:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.