Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3264

recovery-*-scale tests failed with FSTYPE=zfs and FAILURE_MODE=HARD

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0

    • FSTYPE=zfs
      FAILURE_MODE=HARD
    • 3
    • 8083

    Description

      While running recovery-*-scale tests with FSTYPE=zfs and FAILURE_MODE=HARD under failover configuration, the tests failed as follows:

      Failing mds1 on wtm-9vm3
      + pm -h powerman --off wtm-9vm3
      Command completed successfully
      waiting ! ping -w 3 -c 1 wtm-9vm3, 4 secs left ...
      waiting ! ping -w 3 -c 1 wtm-9vm3, 3 secs left ...
      waiting ! ping -w 3 -c 1 wtm-9vm3, 2 secs left ...
      waiting ! ping -w 3 -c 1 wtm-9vm3, 1 secs left ...
      waiting for wtm-9vm3 to fail attempts=3
      + pm -h powerman --off wtm-9vm3
      Command completed successfully
      reboot facets: mds1
      + pm -h powerman --on wtm-9vm3
      Command completed successfully
      Failover mds1 to wtm-9vm7
      04:28:49 (1367234929) waiting for wtm-9vm7 network 900 secs ...
      04:28:49 (1367234929) network interface is UP
      CMD: wtm-9vm7 hostname
      mount facets: mds1
      Starting mds1:   lustre-mdt1/mdt1 /mnt/mds1
      CMD: wtm-9vm7 mkdir -p /mnt/mds1; mount -t lustre   		                   lustre-mdt1/mdt1 /mnt/mds1
      wtm-9vm7: mount.lustre: lustre-mdt1/mdt1 has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool
      Start of lustre-mdt1/mdt1 on mds1 failed 19
      

      Maloo report: https://maloo.whamcloud.com/test_sets/ac7cbc10-b0e3-11e2-b2c4-52540035b04c

      Attachments

        Activity

          [LU-3264] recovery-*-scale tests failed with FSTYPE=zfs and FAILURE_MODE=HARD
          yujian Jian Yu added a comment -

          Patch was landed on Lustre b2_4 branch.

          yujian Jian Yu added a comment - Patch was landed on Lustre b2_4 branch.
          yujian Jian Yu added a comment -

          http://review.whamcloud.com/6429 merged

          The patch needs to be back-ported to Lustre b2_4 branch.

          yujian Jian Yu added a comment - http://review.whamcloud.com/6429 merged The patch needs to be back-ported to Lustre b2_4 branch.
          utopiabound Nathaniel Clark added a comment - http://review.whamcloud.com/6429 merged

          Reworked patch with fixes merged in:

          http://review.whamcloud.com/6429

          utopiabound Nathaniel Clark added a comment - Reworked patch with fixes merged in: http://review.whamcloud.com/6429
          yujian Jian Yu added a comment -

          The patch for master branch is in http://review.whamcloud.com/6358.

          The patch was landed on both Lustre b2_4 and master branches.

          yujian Jian Yu added a comment - The patch for master branch is in http://review.whamcloud.com/6358 . The patch was landed on both Lustre b2_4 and master branches.
          yujian Jian Yu added a comment -

          cannot import 'lustre-mdt1': no such pool available

          For "zpool import" command, if the -d option is not specified, the command will only search for devices in "/dev". However, for ZFS storage pool which has file-based virtual device, we need explicitly specify the search directory otherwise the import command will not find the device.

          The patch for master branch is in http://review.whamcloud.com/6358.

          yujian Jian Yu added a comment - cannot import 'lustre-mdt1': no such pool available For "zpool import" command, if the -d option is not specified, the command will only search for devices in "/dev". However, for ZFS storage pool which has file-based virtual device, we need explicitly specify the search directory otherwise the import command will not find the device. The patch for master branch is in http://review.whamcloud.com/6358 .

          with REFORMAT=y FSTYPE=zfs sh llmount.sh -v I'm getting:

          Format mds1: lustre-mdt1/mdt1
          CMD: centos grep -c /mnt/mds1' ' /proc/mounts
          CMD: centos lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
          CMD: centos ! zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
          zpool export lustre-mdt1
          CMD: centos /work/lustre/head1/lustre/tests/../utils/mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/work/lustre/head1/lustre/tests/../utils/l_getidentity --backfstype=zfs --device-size=200000 --reformat lustre-mdt1/mdt1 /tmp/lustre-mdt1

          Permanent disk data:
          Target: lustre:MDT0000
          Index: 0
          Lustre FS: lustre
          Mount type: zfs
          Flags: 0x65
          (MDT MGS first_time update )
          Persistent mount opts:
          Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/work/lustre/head1/lustre/tests/../utils/l_getidentity

          mkfs_cmd = zpool create -f -O canmount=off lustre-mdt1 /tmp/lustre-mdt1
          mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1
          Writing lustre-mdt1/mdt1 properties
          lustre:version=1
          lustre:flags=101
          lustre:index=0
          lustre:fsname=lustre
          lustre:svname=lustre:MDT0000
          lustre:sys.timeout=20
          lustre:lov.stripesize=1048576
          lustre:lov.stripecount=0
          lustre:mdt.identity_upcall=/work/lustre/head1/lustre/tests/../utils/l_getidentity
          CMD: centos zpool set cachefile=none lustre-mdt1
          CMD: centos ! zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
          zpool export lustre-mdt1
          ...
          Loading modules from /work/lustre/head1/lustre/tests/..
          detected 2 online CPUs by sysfs
          Force libcfs to create 2 CPU partitions
          debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super
          subsystem_debug=all -lnet -lnd -pinger
          gss/krb5 is not supported
          Setup mgs, mdt, osts
          CMD: centos mkdir -p /mnt/mds1
          CMD: centos zpool import -f -o cachefile=none lustre-mdt1
          cannot import 'lustre-mdt1': no such pool available

          bzzz Alex Zhuravlev added a comment - with REFORMAT=y FSTYPE=zfs sh llmount.sh -v I'm getting: Format mds1: lustre-mdt1/mdt1 CMD: centos grep -c /mnt/mds1' ' /proc/mounts CMD: centos lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' CMD: centos ! zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool export lustre-mdt1 CMD: centos /work/lustre/head1/lustre/tests/../utils/mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/work/lustre/head1/lustre/tests/../utils/l_getidentity --backfstype=zfs --device-size=200000 --reformat lustre-mdt1/mdt1 /tmp/lustre-mdt1 Permanent disk data: Target: lustre:MDT0000 Index: 0 Lustre FS: lustre Mount type: zfs Flags: 0x65 (MDT MGS first_time update ) Persistent mount opts: Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/work/lustre/head1/lustre/tests/../utils/l_getidentity mkfs_cmd = zpool create -f -O canmount=off lustre-mdt1 /tmp/lustre-mdt1 mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1 Writing lustre-mdt1/mdt1 properties lustre:version=1 lustre:flags=101 lustre:index=0 lustre:fsname=lustre lustre:svname=lustre:MDT0000 lustre:sys.timeout=20 lustre:lov.stripesize=1048576 lustre:lov.stripecount=0 lustre:mdt.identity_upcall=/work/lustre/head1/lustre/tests/../utils/l_getidentity CMD: centos zpool set cachefile=none lustre-mdt1 CMD: centos ! zpool list -H lustre-mdt1 >/dev/null 2>&1 || zpool export lustre-mdt1 ... Loading modules from /work/lustre/head1/lustre/tests/.. detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions debug=vfstrace rpctrace dlmtrace neterror ha config ioctl super subsystem_debug=all -lnet -lnd -pinger gss/krb5 is not supported Setup mgs, mdt, osts CMD: centos mkdir -p /mnt/mds1 CMD: centos zpool import -f -o cachefile=none lustre-mdt1 cannot import 'lustre-mdt1': no such pool available

          can you confirm the patch does work on a local setup?

          bzzz Alex Zhuravlev added a comment - can you confirm the patch does work on a local setup?
          yujian Jian Yu added a comment -

          Patch was landed on master branch.

          yujian Jian Yu added a comment - Patch was landed on master branch.
          yujian Jian Yu added a comment -

          Patch for master branch is in http://review.whamcloud.com/6258.

          yujian Jian Yu added a comment - Patch for master branch is in http://review.whamcloud.com/6258 .

          People

            yujian Jian Yu
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: