[LU-5236] Failover failure on test suite recovery-small test_26b: FAIL: Failed to mount /mnt/lustre2 Created: 20/Jun/14  Updated: 17/Mar/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Emoly Liu
Resolution: Cannot Reproduce Votes: 0
Labels: zfs
Environment:

client and server: lustre-master build # 2091 zfs


Severity: 3
Rank (Obsolete): 14599

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/3d29435c-f62d-11e3-8491-52540035b04c.

The sub-test test_26b failed with the following error:

Failed to mount /mnt/lustre2

Lustre: DEBUG MARKER: == recovery-small test 26a: evict dead exports == 21:59:24 (1402981164)
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x505
Lustre: *** cfs_fail_loc=505, val=0***
LustreError: 166-1: MGC10.1.6.21@tcp: Connection to MGS (at 10.1.6.25@tcp) was lost; in progress operations using this service will fail
Lustre: *** cfs_fail_loc=505, val=0***
Lustre: Skipped 8 previous similar messages
LustreError: 8289:0:(mgc_request.c:516:do_requeue()) failed processing log: -5
Lustre: *** cfs_fail_loc=505, val=0***
Lustre: Skipped 8 previous similar messages
Lustre: *** cfs_fail_loc=505, val=0***
Lustre: Skipped 9 previous similar messages
LustreError: 8289:0:(mgc_request.c:516:do_requeue()) failed processing log: -5
LustreError: 8289:0:(mgc_request.c:516:do_requeue()) Skipped 1 previous similar message
Lustre: *** cfs_fail_loc=505, val=0***
Lustre: Skipped 8 previous similar messages
Lustre: *** cfs_fail_loc=505, val=0***
Lustre: Skipped 17 previous similar messages
Lustre: *** cfs_fail_loc=505, val=0***
Lustre: Skipped 35 previous similar messages
Lustre: DEBUG MARKER: lctl set_param fail_loc=0x0
Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 26b: evict dead exports == 22:00:11 \(1402981211\)
Lustre: DEBUG MARKER: == recovery-small test 26b: evict dead exports == 22:00:11 (1402981211)
LustreError: 167-0: lustre-MDT0000-mdc-ffff88007c294400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
LustreError: 30768:0:(lmv_obd.c:1552:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff88007c294400), error -5
LustreError: 30768:0:(llite_lib.c:1827:ll_statfs_internal()) md_statfs fails: rc = -5
Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock shadow-45vm3:shadow-45vm7:/lustre /mnt/lustre2
LustreError: 15c-8: MGC10.1.6.21@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Lustre: Unmounted lustre-client
LustreError: 30787:0:(obd_mount.c:1335:lustre_fill_super()) Unable to mount  (-5)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_26b: @@@@@@ FAIL: Failed to mount \/mnt\/lustre2 
Lustre: DEBUG MARKER: recovery-small test_26b: @@@@@@ FAIL: Failed to mount /mnt/lustre2


 Comments   
Comment by Andreas Dilger [ 20/Jun/14 ]

It looks like this is a test script problem. It looks like the client was evicted from the MDS during test_26b, when it should have been evicted during test_26a. That caused the test to fail early in test_26b. It seems that test_26a() needs to set fail_loc=0 and then wait to ensure that the client is reconnected to all of the servers.

Comment by Andreas Dilger [ 17/Mar/20 ]

Closing old bug not seen in a long time.

Generated at Sat Feb 10 01:49:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.