[LU-10708] replay-single test_20b: Restart of mds1 failed! - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3
Labels:
- zfs
Environment:
Hard Failover:
RHEL 7.4 Server/ZFS
RHEL 7.4 Client
2.10.58 master, build 3707

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

replay-single test_20b - Restart of mds1 failed!
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/7f078076-15ba-11e8-bd00-52540065bddc

test_20b failed with the following error:

Restart of mds1 failed!

test_logs:

== replay-single test 20b: write, unlink, eviction, replay (test mds_cleanup_orphans) ================ 19:46:25 (1519069585)
CMD: onyx-32vm7 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: onyx-32vm6 lctl set_param -n osd*.*OS*.force_sync=1
/mnt/lustre/f20b.replay-single
lmm_stripe_count:  1
lmm_stripe_size:   1048576
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 0
	obdidx		 objid		 objid		 group
	     0	          4770	       0x12a2	             0

CMD: onyx-32vm7 /usr/sbin/lctl set_param -n mdt.lustre-MDT0000.evict_client 425b1455-3c86-3ef4-e5f3-8752f5bdb612
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 1.08016 s, 37.9 MB/s
CMD: onyx-32vm7 lctl set_param -n osd*.*MDT*.force_sync=1
CMD: onyx-32vm7 /usr/sbin/lctl dl
Failing mds1 on onyx-32vm7
+ pm -h powerman --off onyx-32vm7
Command completed successfully
reboot facets: mds1
+ pm -h powerman --on onyx-32vm7
Command completed successfully
Failover mds1 to onyx-32vm8
19:46:42 (1519069602) waiting for onyx-32vm8 network 900 secs ...
19:46:42 (1519069602) network interface is UP
CMD: onyx-32vm8 hostname
mount facets: mds1
CMD: onyx-32vm8 lsmod | grep zfs >&/dev/null || modprobe zfs;
			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1
onyx-32vm8: cannot import 'lustre-mdt1': no such pool available
 replay-single test_20b: @@@@@@ FAIL: Restart of mds1 failed!

Attachments

Issue Links

is related to

LU-11256 replay-vbr test 7f is failing with 'Restart of mds1 failed!'

Open

is related to

LU-8104 Failover : replay-single test_70b: Restart of mds1 failed!

Open

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(4 mentioned in)

Activity

[LU-10708] replay-single test_20b: Restart of mds1 failed!

Shashidhar Giraddi (Inactive) added a comment - 04/Mar/24 11:42 AM

+1

https://testing.whamcloud.com/test_sets/5d4c3c9a-6129-48c0-929a-73cd4fe3a0e1

Shashidhar Giraddi (Inactive) added a comment - 04/Mar/24 11:42 AM +1 https://testing.whamcloud.com/test_sets/5d4c3c9a-6129-48c0-929a-73cd4fe3a0e1

James Nunez (Inactive) added a comment - 29/Apr/19 2:56 PM

Another similar failure with replay-single test 3c at https://testing.whamcloud.com/test_sets/4051fd66-682a-11e9-bd0e-52540065bddc .

James Nunez (Inactive) added a comment - 29/Apr/19 2:56 PM Another similar failure with replay-single test 3c at https://testing.whamcloud.com/test_sets/4051fd66-682a-11e9-bd0e-52540065bddc .

James Nunez (Inactive) added a comment - 17/Dec/18 1:55 AM

Similar failure for recovery-random-scale test fail_client_mds at https://testing.whamcloud.com/test_sets/e3b58552-fea5-11e8-b837-52540065bddc

CMD: trevis-25vm11 hostname
mount facets: mds1
CMD: trevis-25vm11 lsmod | grep zfs >&/dev/null || modprobe zfs;
			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1
trevis-25vm11: cannot import 'lustre-mdt1': no such pool available
 recovery-random-scale test_fail_client_mds: @@@@@@ FAIL: Restart of mds1 failed!

James Nunez (Inactive) added a comment - 17/Dec/18 1:55 AM Similar failure for recovery-random-scale test fail_client_mds at https://testing.whamcloud.com/test_sets/e3b58552-fea5-11e8-b837-52540065bddc CMD: trevis-25vm11 hostname mount facets: mds1 CMD: trevis-25vm11 lsmod | grep zfs >&/dev/null || modprobe zfs; zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1 trevis-25vm11: cannot import 'lustre-mdt1': no such pool available recovery-random-scale test_fail_client_mds: @@@@@@ FAIL: Restart of mds1 failed!

James Nunez (Inactive) added a comment - 09/May/18 2:27 PM

We see the same problem when remounting the MDS after a failover in recovery-mds-scale in test_failover_mds. See the following for logs https://testing.hpdd.intel.com/test_sets/c57c0bda-527d-11e8-b9d3-52540065bddc .

From the client test_log

Failing mds1 on trevis-8vm7
+ pm -h powerman --off trevis-8vm7
Command completed successfully
reboot facets: mds1
+ pm -h powerman --on trevis-8vm7
Command completed successfully
Failover mds1 to trevis-8vm8
19:58:09 (1525550289) waiting for trevis-8vm8 network 900 secs ...
19:58:09 (1525550289) network interface is UP
CMD: trevis-8vm8 hostname
mount facets: mds1
CMD: trevis-8vm8 lsmod | grep zfs >&/dev/null || modprobe zfs;
			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -d /dev/lvm-Role_MDS lustre-mdt1
trevis-8vm8: cannot import 'lustre-mdt1': no such pool available
 recovery-mds-scale test_failover_mds: @@@@@@ FAIL: Restart of mds1 failed!

James Nunez (Inactive) added a comment - 09/May/18 2:27 PM We see the same problem when remounting the MDS after a failover in recovery-mds-scale in test_failover_mds. See the following for logs https://testing.hpdd.intel.com/test_sets/c57c0bda-527d-11e8-b9d3-52540065bddc . From the client test_log Failing mds1 on trevis-8vm7 + pm -h powerman --off trevis-8vm7 Command completed successfully reboot facets: mds1 + pm -h powerman --on trevis-8vm7 Command completed successfully Failover mds1 to trevis-8vm8 19:58:09 (1525550289) waiting for trevis-8vm8 network 900 secs ... 19:58:09 (1525550289) network interface is UP CMD: trevis-8vm8 hostname mount facets: mds1 CMD: trevis-8vm8 lsmod | grep zfs >&/dev/null || modprobe zfs; zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool import -f -o cachefile=none -d /dev/lvm-Role_MDS lustre-mdt1 trevis-8vm8: cannot import 'lustre-mdt1': no such pool available recovery-mds-scale test_failover_mds: @@@@@@ FAIL: Restart of mds1 failed!

replay-single test_20b: Restart of mds1 failed!

Details

Description

Attachments

Issue Links

Activity

People

Dates