Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5236

Failover failure on test suite recovery-small test_26b: FAIL: Failed to mount /mnt/lustre2

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.6.0
    • client and server: lustre-master build # 2091 zfs
    • 3
    • 14599

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/3d29435c-f62d-11e3-8491-52540035b04c.

      The sub-test test_26b failed with the following error:

      Failed to mount /mnt/lustre2

      Lustre: DEBUG MARKER: == recovery-small test 26a: evict dead exports == 21:59:24 (1402981164)
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0x505
      Lustre: *** cfs_fail_loc=505, val=0***
      LustreError: 166-1: MGC10.1.6.21@tcp: Connection to MGS (at 10.1.6.25@tcp) was lost; in progress operations using this service will fail
      Lustre: *** cfs_fail_loc=505, val=0***
      Lustre: Skipped 8 previous similar messages
      LustreError: 8289:0:(mgc_request.c:516:do_requeue()) failed processing log: -5
      Lustre: *** cfs_fail_loc=505, val=0***
      Lustre: Skipped 8 previous similar messages
      Lustre: *** cfs_fail_loc=505, val=0***
      Lustre: Skipped 9 previous similar messages
      LustreError: 8289:0:(mgc_request.c:516:do_requeue()) failed processing log: -5
      LustreError: 8289:0:(mgc_request.c:516:do_requeue()) Skipped 1 previous similar message
      Lustre: *** cfs_fail_loc=505, val=0***
      Lustre: Skipped 8 previous similar messages
      Lustre: *** cfs_fail_loc=505, val=0***
      Lustre: Skipped 17 previous similar messages
      Lustre: *** cfs_fail_loc=505, val=0***
      Lustre: Skipped 35 previous similar messages
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0x0
      Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 26b: evict dead exports == 22:00:11 \(1402981211\)
      Lustre: DEBUG MARKER: == recovery-small test 26b: evict dead exports == 22:00:11 (1402981211)
      LustreError: 167-0: lustre-MDT0000-mdc-ffff88007c294400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      LustreError: 30768:0:(lmv_obd.c:1552:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff88007c294400), error -5
      LustreError: 30768:0:(llite_lib.c:1827:ll_statfs_internal()) md_statfs fails: rc = -5
      Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
      Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock shadow-45vm3:shadow-45vm7:/lustre /mnt/lustre2
      LustreError: 15c-8: MGC10.1.6.21@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Lustre: Unmounted lustre-client
      LustreError: 30787:0:(obd_mount.c:1335:lustre_fill_super()) Unable to mount  (-5)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_26b: @@@@@@ FAIL: Failed to mount \/mnt\/lustre2 
      Lustre: DEBUG MARKER: recovery-small test_26b: @@@@@@ FAIL: Failed to mount /mnt/lustre2
      

      Attachments

        Issue Links

          Activity

            [LU-5236] Failover failure on test suite recovery-small test_26b: FAIL: Failed to mount /mnt/lustre2

            Closing old bug not seen in a long time.

            adilger Andreas Dilger added a comment - Closing old bug not seen in a long time.

            It looks like this is a test script problem. It looks like the client was evicted from the MDS during test_26b, when it should have been evicted during test_26a. That caused the test to fail early in test_26b. It seems that test_26a() needs to set fail_loc=0 and then wait to ensure that the client is reconnected to all of the servers.

            adilger Andreas Dilger added a comment - It looks like this is a test script problem. It looks like the client was evicted from the MDS during test_26b, when it should have been evicted during test_26a. That caused the test to fail early in test_26b. It seems that test_26a() needs to set fail_loc=0 and then wait to ensure that the client is reconnected to all of the servers.

            People

              emoly.liu Emoly Liu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: