Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9601

recovery-mds-scale test_failover_mds: test_failover_mds returned 1

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5
    • None
    • trevis, failover
        clients: SLES12, master branch, v2.9.58, b3591
        servers: EL7, ldiskfs, master branch, v2.9.58, b3591
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/e6b87235-1ff0-4e96-a53f-ca46ffe5ed7e

      From suite_log:

      CMD: trevis-38vm1,trevis-38vm5,trevis-38vm6,trevis-38vm7,trevis-38vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440 
      trevis-38vm1: trevis-38vm1: executing check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440
      trevis-38vm7: trevis-38vm7.trevis.hpdd.intel.com: executing check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440
      trevis-38vm8: trevis-38vm8.trevis.hpdd.intel.com: executing check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440
      pdsh@trevis-38vm1: trevis-38vm6: mcmd: connect failed: No route to host
      pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
      CMD: trevis-38vm1 uname -n
      CMD: trevis-38vm5 uname -n
      pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
      
       SKIP: recovery-double-scale  SHARED_DIRECTORY should be specified with a shared directory which is accessable on all of the nodes
      Stopping clients: trevis-38vm1,trevis-38vm5,trevis-38vm6 /mnt/lustre (opts:)
      CMD: trevis-38vm1,trevis-38vm5,trevis-38vm6 running=\$(grep -c /mnt/lustre' ' /proc/mounts);
      

      and

      pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
      pdsh@trevis-38vm1: trevis-38vm6: mcmd: connect failed: No route to host
       auster : @@@@@@ FAIL: clients environments are insane! 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4952:error()
        = /usr/lib64/lustre/tests/test-framework.sh:1736:sanity_mount_check_clients()
        = /usr/lib64/lustre/tests/test-framework.sh:1741:sanity_mount_check()
        = /usr/lib64/lustre/tests/test-framework.sh:3796:setupall()
        = auster:114:reset_lustre()
        = auster:217:run_suite()
        = auster:234:run_suite_logged()
        = auster:298:run_suites()
        = auster:334:main()
      

      Attachments

        Issue Links

          Activity

            [LU-9601] recovery-mds-scale test_failover_mds: test_failover_mds returned 1
            pjones Peter Jones made changes -
            Fix Version/s Original: Lustre 2.14.0 [ 14490 ]
            bobijam Zhenyu Xu made changes -
            Fix Version/s New: Lustre 2.14.0 [ 14490 ]
            Fix Version/s Original: Lustre 2.13.0 [ 14290 ]
            jamesanunez James Nunez (Inactive) made changes -
            Link New: This issue is related to LU-12067 [ LU-12067 ]
            adilger Andreas Dilger made changes -
            Fix Version/s New: Lustre 2.13.0 [ 14290 ]
            pjones Peter Jones made changes -
            Fix Version/s Original: Lustre 2.12.0 [ 13495 ]
            yujian Jian Yu added a comment -

            Hi Andreas,

            Are you able to reproduce something like this in a VM (e.g. dd very large single file) for debugging?

            I provisioned 3 SLES12 SP3 VMs (1 client + 1 MGS/MDS +1 OSS) on trevis cluster with the latest master build #3795, and ran dd to create a 30G single file. The command passed:

            trevis-59vm1:/usr/lib64/lustre/tests # lfs df -h
            UUID                       bytes        Used   Available Use% Mounted on
            lustre-MDT0000_UUID         5.6G       45.7M        5.0G   1% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID        39.0G       49.0M       36.9G   0% /mnt/lustre[OST:0]
            lustre-OST0001_UUID        39.0G       49.0M       36.9G   0% /mnt/lustre[OST:1]
            
            filesystem_summary:        78.0G       98.1M       73.9G   0% /mnt/lustre
            
            trevis-59vm1:/usr/lib64/lustre/tests # dd if=/dev/urandom of=/mnt/lustre/large_file_10G bs=1M count=30720
            30720+0 records in
            30720+0 records out
            32212254720 bytes (32 GB, 30 GiB) copied, 2086.88 s, 15.4 MB/s
            
            yujian Jian Yu added a comment - Hi Andreas, Are you able to reproduce something like this in a VM (e.g. dd very large single file) for debugging? I provisioned 3 SLES12 SP3 VMs (1 client + 1 MGS/MDS +1 OSS) on trevis cluster with the latest master build #3795, and ran dd to create a 30G single file. The command passed: trevis-59vm1:/usr/lib64/lustre/tests # lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 5.6G 45.7M 5.0G 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 39.0G 49.0M 36.9G 0% /mnt/lustre[OST:0] lustre-OST0001_UUID 39.0G 49.0M 36.9G 0% /mnt/lustre[OST:1] filesystem_summary: 78.0G 98.1M 73.9G 0% /mnt/lustre trevis-59vm1:/usr/lib64/lustre/tests # dd if=/dev/urandom of=/mnt/lustre/large_file_10G bs=1M count=30720 30720+0 records in 30720+0 records out 32212254720 bytes (32 GB, 30 GiB) copied, 2086.88 s, 15.4 MB/s
            jamesanunez James Nunez (Inactive) made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 23197 ]
            jamesanunez James Nunez (Inactive) made changes -
            Affects Version/s New: Lustre 2.10.5 [ 14003 ]
            jamesanunez James Nunez (Inactive) made changes -
            Remote Link New: This issue links to "Page (Whamcloud Community Wiki)" [ 23177 ]
            sarah Sarah Liu made changes -
            Remote Link New: This issue links to "Page (HPDD Community Wiki)" [ 22932 ]

            People

              bobijam Zhenyu Xu
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: