[LU-3286] recovery-double-scale test_pairwise_fail: FAIL: Restart of ost2 failed! Created: 07/May/13  Updated: 31/Dec/13  Resolved: 29/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.4.1
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Critical
Reporter: Jian Yu Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

FSTYPE=zfs
FAILURE_MODE=HARD
TEST_GROUP=failover


Severity: 3
Rank (Obsolete): 8129

 Description   

While running recovery-double-scale test with FSTYPE=zfs and FAILURE_MODE=HARD to verify patch http://review.whamcloud.com/6258, the test failed as follows:

==== START === test 1: failover MDS, then OST ==========
==== Checking the clients loads BEFORE failover -- failure NOT OK
<snip>
Done checking client loads. Failing type1=MDS item1=mds1 ... 
CMD: wtm-82 /usr/sbin/lctl dl
Failing mds1 on wtm-82
CMD: wtm-82 zpool set cachefile=none lustre-mdt1; sync
+ pm -h powerman --reset wtm-82
Command completed successfully
reboot facets: mds1
+ pm -h powerman --on wtm-82
Command completed successfully
Failover mds1 to wtm-83
21:37:40 (1367901460) waiting for wtm-83 network 900 secs ...
21:37:40 (1367901460) network interface is UP
CMD: wtm-83 hostname
mount facets: mds1
CMD: wtm-83 zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none lustre-mdt1
Starting mds1:   lustre-mdt1/mdt1 /mnt/mds1
CMD: wtm-83 mkdir -p /mnt/mds1; mount -t lustre   		                   lustre-mdt1/mdt1 /mnt/mds1
CMD: wtm-83 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin::/sbin:/bin:/usr/sbin: NAME=ncli sh rpc.sh set_default_debug \"-1\" \"all -lnet -lnd -pinger\" 256 
CMD: wtm-83 zfs get -H -o value lustre:svname 		                           lustre-mdt1/mdt1 2>/dev/null
Started lustre-MDT0000
                            Failing type2=OST item2=ost4 ... 
CMD: wtm-85 /usr/sbin/lctl dl
CMD: wtm-85 /usr/sbin/lctl dl
CMD: wtm-85 /usr/sbin/lctl dl
CMD: wtm-85 zpool set cachefile=none lustre-ost4; sync
CMD: wtm-85 zpool set cachefile=none lustre-ost6; sync
Failing ost2,ost4,ost6 on wtm-85
CMD: wtm-85 zpool set cachefile=none lustre-ost2; sync
+ pm -h powerman --reset wtm-85
Command completed successfully
reboot facets: ost2,ost4,ost6
+ pm -h powerman --on wtm-85
Command completed successfully
Failover ost2 to wtm-84
Failover ost4 to wtm-84
Failover ost6 to wtm-84
21:38:19 (1367901499) waiting for wtm-84 network 900 secs ...
21:38:19 (1367901499) network interface is UP
CMD: wtm-84 hostname
mount facets: ost2,ost4,ost6
CMD: wtm-84 zpool list -H lustre-ost2 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none lustre-ost2
Starting ost2:   lustre-ost2/ost2 /mnt/ost2
CMD: wtm-84 mkdir -p /mnt/ost2; mount -t lustre   		                   lustre-ost2/ost2 /mnt/ost2
wtm-84: mount.lustre: mount lustre-ost2/ost2 at /mnt/ost2 failed: Input/output error
wtm-84: Is the MGS running?
Start of lustre-ost2/ost2 on ost2 failed 5
 recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of ost2 failed! 

Dmesg on OSS wtm-84 showed that:

LustreError: 9681:0:(obd_mount_server.c:1123:server_register_target()) lustre-OST0001: error registering with the MGS: rc = -5 (not fatal)
LustreError: 6180:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff88062faed400 x1434348208262360/t0(0) o101->MGC10.10.18.253@tcp@10.10.18.253@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 6180:0:(client.c:1052:ptlrpc_import_delay_req()) Skipped 1 previous similar message
LustreError: 15c-8: MGC10.10.18.253@tcp: The configuration from log 'lustre-OST0001' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 9681:0:(obd_mount_server.c:1257:server_start_targets()) failed to start server lustre-OST0001: -5
LustreError: 9681:0:(obd_mount_server.c:1699:server_fill_super()) Unable to start targets: -5
LustreError: 9681:0:(obd_mount_server.c:844:lustre_disconnect_lwp()) lustre-MDT0000-lwp-OST0001: Can't end config log lustre-client.
LustreError: 9681:0:(obd_mount_server.c:1426:server_put_super()) lustre-OST0001: failed to disconnect lwp. (rc=-2)
LustreError: 9681:0:(obd_mount_server.c:1456:server_put_super()) no obd lustre-OST0001
Lustre: server umount lustre-OST0001 complete
LustreError: 9681:0:(obd_mount.c:1267:lustre_fill_super()) Unable to mount  (-5)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of ost2 failed!

Dmesg on MDS wtm-83 showed that:

Lustre: DEBUG MARKER: Failing type2=OST item2=ost4 ...
Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect
Lustre: lustre-MDT0000: Recovery over after 0:08, of 4 clients 4 recovered and 0 were evicted.
Lustre: 5225:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1367901499/real 1367901499]  req@ffff880c17898400 x1434348659147084/t0(0) o400->lustre-OST0001-osc-MDT0000@10.10.19.26@tcp:28/4 lens 224/224 e 0 to 1 dl 1367901543 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: lustre-OST0003-osc-MDT0000: Connection to lustre-OST0003 (at 10.10.19.26@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: 5225:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Lustre: lustre-OST0005-osc-MDT0000: Connection to lustre-OST0005 (at 10.10.19.26@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of ost2 failed! 
Lustre: DEBUG MARKER: recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of ost2 failed!

Maloo report:
https://maloo.whamcloud.com/test_sets/ebe1f318-b6e0-11e2-b6f1-52540035b04c



 Comments   
Comment by Jodi Levi (Inactive) [ 07/May/13 ]

Lai,
Could you please comment on this one?
Thank you!

Comment by Lai Siyao [ 08/May/13 ]

Yujian, I saw http://review.whamcloud.com/#change,6258 passed in HARD failover mode, does it mean this one is fixed?

Comment by Jian Yu [ 09/May/13 ]

Yujian, I saw http://review.whamcloud.com/#change,6258 passed in HARD failover mode, does it mean this one is fixed?

Hi Lai, the recovery-double-scale test with FSTYPE=zfs still failed with this issue.

I'm performing the test with FSTYPE=ldiskfs under the same configuration and will update the ticket with Maloo report.

Comment by Jian Yu [ 09/May/13 ]

I'm performing the test with FSTYPE=ldiskfs under the same configuration and will update the ticket with Maloo report.

recovery-double-scale test passed with FSTYPE=ldiskfs and FAILURE_MODE=HARD:
https://maloo.whamcloud.com/test_sessions/3ce1b7bc-b8b2-11e2-8742-52540035b04c

Comment by Lai Siyao [ 10/May/13 ]

The log shows that the failed OSTs kept connecting to old MGS nid, and never tried the failover nid. It's a bit strange, because some OST can connect to failover nid.

I'll need more time to analyse the logs.

Comment by Lai Siyao [ 15/May/13 ]

For ldiskfs test:

10000000:01000000:20.0:1368528132.031673:0:14745:0:(mgc_request.c:1763:mgc_process_cfg_log()) Failed to get MGS log lustre-OST0001, using local copy for now, will try to update later.
...
10000000:01000000:20.0:1368528132.040180:0:14745:0:(mgc_request.c:1871:mgc_process_log()) MGC10.10.18.253@tcp: configuration from log 'lustre-OST0001' succeeded (0).

While zfs:

10000000:00000001:16.0:1367901542.341862:0:9681:0:(mgc_request.c:1774:mgc_process_cfg_log()) Process leaving via out_pop (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
...
10000000:00000001:16.0:1367901542.341879:0:9681:0:(mgc_request.c:1982:mgc_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)

That is, in ldiskfs test it could find a local copy of OST config upon MGS connection failure, and use it to start OST, while zfs not, and start failed. I'll see why zfs doesn't have a copy tomorrow.

Comment by Lai Siyao [ 16/May/13 ]

The root cause is that MGC llog local copy is done in lvfs context, thus currently only ldiskfs backend filesystem is supported. So for zfs based server, upon double failure, OST can't get its config log, and failed to mount.

I can't find the original zfs support design doc, and this should be a known issue.

Comment by Jian Yu [ 20/May/13 ]

Lustre Tag: v2_4_0_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs
TEST_GROUP=failover

recovery-double-scale hit the same issue: https://maloo.whamcloud.com/test_sets/e72da246-c102-11e2-8854-52540035b04c

Comment by Lai Siyao [ 21/May/13 ]

Hi Alex, any opinion on this?

Comment by Alex Zhuravlev [ 27/May/13 ]

supposed to be fixed with http://review.whamcloud.com/#change,5049

Comment by Jian Yu [ 27/May/13 ]

supposed to be fixed with http://review.whamcloud.com/#change,5049

I submitted http://review.whamcloud.com/6459 to verify this patch together with http://review.whamcloud.com/6429 under failover configuration. Let's wait for the test result.

Comment by Jian Yu [ 28/May/13 ]

supposed to be fixed with http://review.whamcloud.com/#change,5049

recovery-double-scale still failed after failing over MDS and then OST:

mount facets: ost1,ost2,ost3,ost4,ost5,ost6,ost7
CMD: wtm-14vm8 zpool list -H lustre-ost1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -d /dev/lvm-OSS lustre-ost1
Starting ost1:   lustre-ost1/ost1 /mnt/ost1
CMD: wtm-14vm8 mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
wtm-14vm8: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: Input/output error
wtm-14vm8: Is the MGS running?
Start of lustre-ost1/ost1 on ost1 failed 5
 recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: Restart of ost1 failed!

Maloo report: https://maloo.whamcloud.com/test_sets/285b58c2-c6ed-11e2-be75-52540035b04c

Comment by Mikhail Pershin [ 26/Jun/13 ]

I can be wrong here but probably the problem is support of all lsi_flags on ZFS?

Comment by Jian Yu [ 16/Aug/13 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/32/
FSTYPE=zfs
FAILURE_MODE=HARD

recovery-double-scale still failed after failing over MDS and then OST:
https://maloo.whamcloud.com/test_sets/c55d6c84-05e8-11e3-b811-52540035b04c

Comment by Jian Yu [ 09/Sep/13 ]

Lustre Tag: v2_4_1_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/44/
Distro/Arch: RHEL6.4/x86_64
Testgroup: failover
FSTYPE=zfs

recovery-double-scale hit the same failure:
https://maloo.whamcloud.com/test_sets/2864e15c-1757-11e3-aa87-52540035b04c

Comment by Lai Siyao [ 23/Sep/13 ]

As Alex pointed out in LU-2059, lsi_srv_mnt is NULL for zfs osd, so that llog local copy is not supported.

osd_conf_get() suggests to introduce a new fs abstraction layer other than reading from vfsmount structure directly, this looks reasonable because zfs-osd doesn't do a full mount, but use DMU interface directly, which means zfs-osd doesn't have vfsmount and superblock object.

But this looks to a big project, I need understand more zfs related code to continue.

Comment by Alex Zhuravlev [ 23/Sep/13 ]

there is a patch to support local copies of llog using OSD API. I can't find it, please talk to Mike.

Comment by Lai Siyao [ 24/Sep/13 ]

The patch is http://review.whamcloud.com/#/c/5049/19, but it doesn't solve this issue, because for zfs-osd, vfsmount object is NULL, as a result server_mgc_set_fs() is not called, and llog local copy can not be done.

Comment by Alex Zhuravlev [ 24/Sep/13 ]

sorry, can you explain with details? vfsmount isn't valid notion in the server code anymore (except osd-ldiskfs/). have you contacted Mike?

Comment by Lai Siyao [ 24/Sep/13 ]

Mike is on the watching list.

server_start_targets() calls server_mgc_set_fs() only when lsi->lsi_srv_mnt is not NULL, because server_mgc_set_fs() has argument of superblock. server_mgc_set_fs() then calls mgc_fs_setup() to setup local configs dir.

Comment by Lai Siyao [ 15/Nov/13 ]

Patch is on http://review.whamcloud.com/#/c/8286/

Comment by Jian Yu [ 29/Nov/13 ]

Patch landed on master branch for Lustre 2.6.0.

Hi Lai,
Could you please back-port the patch to Lustre b2_4 branch? Thanks.

Comment by Lai Siyao [ 29/Nov/13 ]

Yujian, this patch depends on http://review.whamcloud.com/#/c/5049/19 which is not backported to 2.4 yet, should I backport both?

Comment by Peter Jones [ 29/Nov/13 ]

Thanks Lai. That is probably too big a change to include in a maintenance release so let's close this as fixed in 2.6

Generated at Sat Feb 10 01:32:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.