Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19654

recovery-random-scale test_fail_client_mds: FAIL: Restart of mds1 failed!

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Medium
    • Lustre 2.17.0
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6bd34daf-af12-4bad-a18d-0ef977339538

      test_fail_client_mds failed with the following error:

      Starting failover on mds1
      CMD: trevis-153vm19 /usr/sbin/lctl dl
      Failing mds1 on trevis-153vm19
      CMD: trevis-153vm19 /usr/sbin/lctl dl
      + pm -h powerman --off trevis-153vm19
      Command completed successfully
      waiting ! ping -w 3 -c 1 trevis-153vm19, 5 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 4 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 3 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 2 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 1 secs left ...
      waiting for trevis-153vm19 to fail attempts=3
      + pm -h powerman --off trevis-153vm19
      Command completed successfully
      waiting ! ping -w 3 -c 1 trevis-153vm19, 5 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 4 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 3 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 2 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 1 secs left ...
      waiting for trevis-153vm19 to fail attempts=3
      + pm -h powerman --off trevis-153vm19
      Command completed successfully
      waiting ! ping -w 3 -c 1 trevis-153vm19, 5 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 4 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 3 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 2 secs left ...
      waiting ! ping -w 3 -c 1 trevis-153vm19, 1 secs left ...
      waiting for trevis-153vm19 to fail attempts=3
      trevis-153vm19 still pingable after power down! attempts=3
      23:48:51 (1764632931) shut down
      facet: mds1 facet_host: trevis-153vm18 facet_failover_host: trevis-153vm19
      + pm -h powerman --on trevis-153vm19
      Command completed successfully
      23:48:55 (1764632935) trevis-153vm19 rebooted; waithostlist: 
      Failover mds1 to trevis-153vm18
      mount facets: mds1
      CMD: trevis-153vm18 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
      pdsh@trevis-153vm10: trevis-153vm18: ssh exited with exit code 1
      CMD: trevis-153vm18 test -b /dev/vg_Role_MDS/mdt1
      CMD: trevis-153vm18 blockdev --getsz /dev/vg_Role_MDS/mdt1 2>/dev/null
      CMD: trevis-153vm18 dmsetup create mds1_flakey --table \"0 6291456 linear /dev/vg_Role_MDS/mdt1 0\"
      CMD: trevis-153vm18 dmsetup mknodes >/dev/null 2>&1
      CMD: trevis-153vm18 test -b /dev/mapper/mds1_flakey
      CMD: trevis-153vm18 e2label /dev/mapper/mds1_flakey
      CMD: trevis-153vm18,trevis-153vm19 df -t lustre || true
      trevis-153vm18: df: no file systems processed
      Filesystem              1K-blocks  Used Available Use% Mounted on
      /dev/mapper/mds1_flakey   1738616  4916   1561612   1% /mnt/lustre-mds1
      Start mds1: mount -t lustre -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      CMD: trevis-153vm18 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      trevis-153vm18: mount.lustre: mount /dev/mapper/mds1_flakey at /mnt/lustre-mds1 failed: Invalid argument
      trevis-153vm18: This may have multiple causes.
      trevis-153vm18: Are the mount options correct?
      trevis-153vm18: Check the syslog for more info.
      pdsh@trevis-153vm10: trevis-153vm18: ssh exited with exit code 22
      Start of /dev/mapper/mds1_flakey on mds1 failed 22
       recovery-random-scale test_fail_client_mds: @@@@@@ FAIL: Restart of mds1 failed!
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/119118 - 4.18.0-553.76.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/119118 - 4.18.0-553.76.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      recovery-random-scale test_fail_client_mds - Restart of mds1 failed!

      Attachments

        Issue Links

          Activity

            People

              yujian Jian Yu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: