Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10708

replay-single test_20b: Restart of mds1 failed!

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3
    • Hard Failover:
      RHEL 7.4 Server/ZFS
      RHEL 7.4 Client
      2.10.58 master, build 3707
    • 3
    • 9223372036854775807

    Description

      replay-single test_20b - Restart of mds1 failed!
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/7f078076-15ba-11e8-bd00-52540065bddc

      test_20b failed with the following error:

      Restart of mds1 failed!
      

      test_logs:

      == replay-single test 20b: write, unlink, eviction, replay (test mds_cleanup_orphans) ================ 19:46:25 (1519069585)
      CMD: onyx-32vm7 lctl set_param -n os[cd]*.*MDT*.force_sync=1
      CMD: onyx-32vm6 lctl set_param -n osd*.*OS*.force_sync=1
      /mnt/lustre/f20b.replay-single
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      	obdidx		 objid		 objid		 group
      	     0	          4770	       0x12a2	             0
      
      CMD: onyx-32vm7 /usr/sbin/lctl set_param -n mdt.lustre-MDT0000.evict_client 425b1455-3c86-3ef4-e5f3-8752f5bdb612
      10000+0 records in
      10000+0 records out
      40960000 bytes (41 MB) copied, 1.08016 s, 37.9 MB/s
      CMD: onyx-32vm7 lctl set_param -n osd*.*MDT*.force_sync=1
      CMD: onyx-32vm7 /usr/sbin/lctl dl
      Failing mds1 on onyx-32vm7
      + pm -h powerman --off onyx-32vm7
      Command completed successfully
      reboot facets: mds1
      + pm -h powerman --on onyx-32vm7
      Command completed successfully
      Failover mds1 to onyx-32vm8
      19:46:42 (1519069602) waiting for onyx-32vm8 network 900 secs ...
      19:46:42 (1519069602) network interface is UP
      CMD: onyx-32vm8 hostname
      mount facets: mds1
      CMD: onyx-32vm8 lsmod | grep zfs >&/dev/null || modprobe zfs;
      			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
      			zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1
      onyx-32vm8: cannot import 'lustre-mdt1': no such pool available
       replay-single test_20b: @@@@@@ FAIL: Restart of mds1 failed! 
      

      Attachments

        Issue Links

          Activity

            [LU-10708] replay-single test_20b: Restart of mds1 failed!
            sgiraddi Shashidhar Giraddi (Inactive) added a comment - +1  https://testing.whamcloud.com/test_sets/5d4c3c9a-6129-48c0-929a-73cd4fe3a0e1  
            jamesanunez James Nunez (Inactive) added a comment - Another similar failure with replay-single test 3c at https://testing.whamcloud.com/test_sets/4051fd66-682a-11e9-bd0e-52540065bddc .

            Similar failure for recovery-random-scale test fail_client_mds at https://testing.whamcloud.com/test_sets/e3b58552-fea5-11e8-b837-52540065bddc

            CMD: trevis-25vm11 hostname
            mount facets: mds1
            CMD: trevis-25vm11 lsmod | grep zfs >&/dev/null || modprobe zfs;
            			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
            			zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1
            trevis-25vm11: cannot import 'lustre-mdt1': no such pool available
             recovery-random-scale test_fail_client_mds: @@@@@@ FAIL: Restart of mds1 failed! 
            jamesanunez James Nunez (Inactive) added a comment - Similar failure for recovery-random-scale test fail_client_mds at https://testing.whamcloud.com/test_sets/e3b58552-fea5-11e8-b837-52540065bddc CMD: trevis-25vm11 hostname mount facets: mds1 CMD: trevis-25vm11 lsmod | grep zfs >&/dev/null || modprobe zfs; zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1 trevis-25vm11: cannot import 'lustre-mdt1': no such pool available recovery-random-scale test_fail_client_mds: @@@@@@ FAIL: Restart of mds1 failed!

            We see the same problem when remounting the MDS after a failover in recovery-mds-scale in test_failover_mds. See the following for logs https://testing.hpdd.intel.com/test_sets/c57c0bda-527d-11e8-b9d3-52540065bddc .

            From the client test_log

            Failing mds1 on trevis-8vm7
            + pm -h powerman --off trevis-8vm7
            Command completed successfully
            reboot facets: mds1
            + pm -h powerman --on trevis-8vm7
            Command completed successfully
            Failover mds1 to trevis-8vm8
            19:58:09 (1525550289) waiting for trevis-8vm8 network 900 secs ...
            19:58:09 (1525550289) network interface is UP
            CMD: trevis-8vm8 hostname
            mount facets: mds1
            CMD: trevis-8vm8 lsmod | grep zfs >&/dev/null || modprobe zfs;
            			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
            			zpool import -f -o cachefile=none -d /dev/lvm-Role_MDS lustre-mdt1
            trevis-8vm8: cannot import 'lustre-mdt1': no such pool available
             recovery-mds-scale test_failover_mds: @@@@@@ FAIL: Restart of mds1 failed! 
            
            jamesanunez James Nunez (Inactive) added a comment - We see the same problem when remounting the MDS after a failover in recovery-mds-scale in test_failover_mds. See the following for logs https://testing.hpdd.intel.com/test_sets/c57c0bda-527d-11e8-b9d3-52540065bddc . From the client test_log Failing mds1 on trevis-8vm7 + pm -h powerman --off trevis-8vm7 Command completed successfully reboot facets: mds1 + pm -h powerman --on trevis-8vm7 Command completed successfully Failover mds1 to trevis-8vm8 19:58:09 (1525550289) waiting for trevis-8vm8 network 900 secs ... 19:58:09 (1525550289) network interface is UP CMD: trevis-8vm8 hostname mount facets: mds1 CMD: trevis-8vm8 lsmod | grep zfs >&/dev/null || modprobe zfs; zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool import -f -o cachefile=none -d /dev/lvm-Role_MDS lustre-mdt1 trevis-8vm8: cannot import 'lustre-mdt1': no such pool available recovery-mds-scale test_failover_mds: @@@@@@ FAIL: Restart of mds1 failed!

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: