Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10893

all conf-sanity tests failed: format mgs: mkfs.lustre FATAL: Unable to build fs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      After LU-684  https://review.whamcloud.com/#/c/7200/  where dm-flakey layer was added to test-framework, conf-sanity didn`t pass with a real devices.
      Example of configuration at local.sh

      MDSCOUNT=1
      OSTCOUNT=2
      mds1_HOST=fre0101
      MDSDEV1=/dev/vdb
      mds_HOST=fre0101
      MDSDEV=/dev/vdb
      ost1_HOST=fre0102
      OSTDEV1=/dev/vdb
      ost2_HOST=fre0102
      OSTDEV2=/dev/vdc
      .....
      

      Errors:

      CMD: fre0205,fre0206,fre0208 PATH=/usr/lib64/lustre/tests/../tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests/../tests/mpi:/usr/lib64/lustre/tests/../tests/racer:/usr/lib64/lustre/tests/../../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests/../tests:/usr/lib64/lustre/tests/../utils/gss:/root//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/usr/lib64/mpich/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin::/sbin:/bin:/usr/sbin: NAME=ncli sh rpc.sh set_hostid 
      fre0208: fre0208: executing set_hostid
      fre0205: fre0205: executing set_hostid
      fre0206: fre0206: executing set_hostid
      CMD: fre0205 [ -e "/dev/vdb" ]
      CMD: fre0205 grep -c /mnt/lustre-mgs' ' /proc/mounts || true
      CMD: fre0205 lsmod | grep lnet > /dev/null &&
      lctl dl | grep ' ST ' || true
      CMD: fre0205 e2label /dev/vdb
      CMD: fre0205 mkfs.lustre --mgs --param=sys.timeout=20 --backfstype=ldiskfs --device-size=0 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/vdb
      fre0205: 
      fre0205: mkfs.lustre FATAL: Unable to build fs /dev/vdb (256)
      fre0205: 
      fre0205: mkfs.lustre FATAL: mkfs failed 256
      

      A quick look shows that reformat is fine at conf-sanity with the next change to t-f
      formatall() {
      CLEANUP_DM_DEV=true stopall -f

      since there are a lot of stopall at conf-sanity, they requires a fix also, probably.

      == conf-sanity test 17: Verify failed mds_postsetup won't fail assertion (2936) (should return errs) ====================================================================================================== 15:36:46 (1522942606)
      start mds service on fre0113
      Starting mds1: -o rw,user_xattr  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      fre0113: fre0113: executing set_default_debug -1 all 4
      pdsh@fre0115: fre0113: ssh exited with exit code 1
      pdsh@fre0115: fre0113: ssh exited with exit code 1
      Started lustre-MDT0000
      start mds service on fre0113
      Starting mds2: -o rw,user_xattr  /dev/mapper/mds2_flakey /mnt/lustre-mds2
      fre0113: fre0113: executing set_default_debug -1 all 4
      pdsh@fre0115: fre0113: ssh exited with exit code 1
      pdsh@fre0115: fre0113: ssh exited with exit code 1
      Started lustre-MDT0001
      start ost1 service on fre0114
      Starting ost1: -o user_xattr  /dev/mapper/ost1_flakey /mnt/lustre-ost1
      fre0114: fre0114: executing set_default_debug -1 all 4
      pdsh@fre0115: fre0114: ssh exited with exit code 1
      pdsh@fre0115: fre0114: ssh exited with exit code 1
      Started lustre-OST0000
      mount lustre on /mnt/lustre.....
      Starting client: fre0115:  -o user_xattr,flock fre0113@tcp:/lustre /mnt/lustre
      setup single mount lustre success
      umount lustre on /mnt/lustre.....
      Stopping client fre0115 /mnt/lustre (opts:)
      stop ost1 service on fre0114
      Stopping /mnt/lustre-ost1 (opts:-f) on fre0114
      stop mds service on fre0113
      Stopping /mnt/lustre-mds1 (opts:-f) on fre0113
      stop mds service on fre0113
      Stopping /mnt/lustre-mds2 (opts:-f) on fre0113
      modules unloaded.
      Remove mds config log
      Stopping /mnt/lustre-mgs (opts:) on fre0113
      fre0113: debugfs 1.42.13.x6 (01-Mar-2018)
      start mgs service on fre0113
      Loading modules from /usr/lib64/lustre/tests/..
      detected 2 online CPUs by sysfs
      Force libcfs to create 2 CPU partitions
      ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
      ../lnet/lnet/lnet options: 'accept=all'
      ../lnet/klnds/socklnd/ksocklnd options: 'sock_timeout=10'
      gss/krb5 is not supported
      Starting mgs:   /dev/mapper/mgs_flakey /mnt/lustre-mgs
      fre0113: fre0113: executing set_default_debug -1 all 4
      pdsh@fre0115: fre0113: ssh exited with exit code 1
      pdsh@fre0115: fre0113: ssh exited with exit code 1
      Started MGS
      start ost1 service on fre0114
      Starting ost1: -o user_xattr  /dev/mapper/ost1_flakey /mnt/lustre-ost1
      fre0114: fre0114: executing set_default_debug -1 all 4
      pdsh@fre0115: fre0114: ssh exited with exit code 1
      pdsh@fre0115: fre0114: ssh exited with exit code 1
      Started lustre-OST0000
      start mds service on fre0113
      Starting mds1: -o rw,user_xattr  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      fre0113: mount.lustre: mount /dev/mapper/mds1_flakey at /mnt/lustre-mds1 failed: No such file or directory
      fre0113: Is the MGS specification correct?
      fre0113: Is the filesystem name correct?
      fre0113: If upgrading, is the copied client log valid? (see upgrade docs)
      pdsh@fre0115: fre0113: ssh exited with exit code 2
      Start of /dev/mapper/mds1_flakey on mds1 failed 2
      Stopping clients: fre0115,fre0116 /mnt/lustre (opts:-f)
      Stopping clients: fre0115,fre0116 /mnt/lustre2 (opts:-f)
      Stopping /mnt/lustre-ost1 (opts:-f) on fre0114
      pdsh@fre0115: fre0114: ssh exited with exit code 1
      Stopping /mnt/lustre-mgs (opts:) on fre0113
      fre0114: fre0114: executing set_hostid
      fre0116: fre0116: executing set_hostid
      fre0113: fre0113: executing set_hostid
      Loading modules from /usr/lib64/lustre/tests/..
      detected 2 online CPUs by sysfs
      Force libcfs to create 2 CPU partitions
      gss/krb5 is not supported
      Formatting mgs, mds, osts
      Format mgs: /dev/mapper/mgs_flakey
      pdsh@fre0115: fre0113: ssh exited with exit code 1
       conf-sanity test_17: @@@@@@ FAIL: mgs: device '/dev/mapper/mgs_flakey' does not exist 
        Trace dump:
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:5734:error()
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:4314:__touch_device()
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:4331:format_mgs()
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:4384:formatall()
        = /usr/lib64/lustre/tests/conf-sanity.sh:109:reformat()
        = /usr/lib64/lustre/tests/conf-sanity.sh:91:reformat_and_config()
        = /usr/lib64/lustre/tests/conf-sanity.sh:605:test_17()
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:6010:run_one()
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:6049:run_one_logged()
        = /usr/lib64/lustre/tests/../tests/test-framework.sh:5848:run_test()
        = /usr/lib64/lustre/tests/conf-sanity.sh:607:main()
      Dumping lctl log to /tmp/test_logs/1522942566/conf-sanity.test_17.*.1522942656.log
      fre0114: Warning: Permanently added 'fre0115,192.168.101.15' (ECDSA) to the list of known hosts.
      
      fre0116: Warning: Permanently added 'fre0115,192.168.101.15' (ECDSA) to the list of known hosts.
      
      fre0113: Warning: Permanently added 'fre0115,192.168.101.15' (ECDSA) to the list of known hosts.
      
      Resetting fail_loc on all nodes...done.
      FAIL 17 (51s)
      

      Attachments

        Issue Links

          Activity

            People

              yujian Jian Yu
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: