Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9601

recovery-mds-scale test_failover_mds: test_failover_mds returned 1

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5
    • None
    • trevis, failover
        clients: SLES12, master branch, v2.9.58, b3591
        servers: EL7, ldiskfs, master branch, v2.9.58, b3591
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/e6b87235-1ff0-4e96-a53f-ca46ffe5ed7e

      From suite_log:

      CMD: trevis-38vm1,trevis-38vm5,trevis-38vm6,trevis-38vm7,trevis-38vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/mpi/gcc/openmpi/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440 
      trevis-38vm1: trevis-38vm1: executing check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440
      trevis-38vm7: trevis-38vm7.trevis.hpdd.intel.com: executing check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440
      trevis-38vm8: trevis-38vm8.trevis.hpdd.intel.com: executing check_logdir /shared_test/autotest2/2017-05-24/051508-70323187606440
      pdsh@trevis-38vm1: trevis-38vm6: mcmd: connect failed: No route to host
      pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
      CMD: trevis-38vm1 uname -n
      CMD: trevis-38vm5 uname -n
      pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
      
       SKIP: recovery-double-scale  SHARED_DIRECTORY should be specified with a shared directory which is accessable on all of the nodes
      Stopping clients: trevis-38vm1,trevis-38vm5,trevis-38vm6 /mnt/lustre (opts:)
      CMD: trevis-38vm1,trevis-38vm5,trevis-38vm6 running=\$(grep -c /mnt/lustre' ' /proc/mounts);
      

      and

      pdsh@trevis-38vm1: trevis-38vm5: mcmd: connect failed: No route to host
      pdsh@trevis-38vm1: trevis-38vm6: mcmd: connect failed: No route to host
       auster : @@@@@@ FAIL: clients environments are insane! 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4952:error()
        = /usr/lib64/lustre/tests/test-framework.sh:1736:sanity_mount_check_clients()
        = /usr/lib64/lustre/tests/test-framework.sh:1741:sanity_mount_check()
        = /usr/lib64/lustre/tests/test-framework.sh:3796:setupall()
        = auster:114:reset_lustre()
        = auster:217:run_suite()
        = auster:234:run_suite_logged()
        = auster:298:run_suites()
        = auster:334:main()
      

      Attachments

        Issue Links

          Activity

            [LU-9601] recovery-mds-scale test_failover_mds: test_failover_mds returned 1
            yujian Jian Yu added a comment -

            Hi Andreas,

            Are you able to reproduce something like this in a VM (e.g. dd very large single file) for debugging?

            I provisioned 3 SLES12 SP3 VMs (1 client + 1 MGS/MDS +1 OSS) on trevis cluster with the latest master build #3795, and ran dd to create a 30G single file. The command passed:

            trevis-59vm1:/usr/lib64/lustre/tests # lfs df -h
            UUID                       bytes        Used   Available Use% Mounted on
            lustre-MDT0000_UUID         5.6G       45.7M        5.0G   1% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID        39.0G       49.0M       36.9G   0% /mnt/lustre[OST:0]
            lustre-OST0001_UUID        39.0G       49.0M       36.9G   0% /mnt/lustre[OST:1]
            
            filesystem_summary:        78.0G       98.1M       73.9G   0% /mnt/lustre
            
            trevis-59vm1:/usr/lib64/lustre/tests # dd if=/dev/urandom of=/mnt/lustre/large_file_10G bs=1M count=30720
            30720+0 records in
            30720+0 records out
            32212254720 bytes (32 GB, 30 GiB) copied, 2086.88 s, 15.4 MB/s
            
            yujian Jian Yu added a comment - Hi Andreas, Are you able to reproduce something like this in a VM (e.g. dd very large single file) for debugging? I provisioned 3 SLES12 SP3 VMs (1 client + 1 MGS/MDS +1 OSS) on trevis cluster with the latest master build #3795, and ran dd to create a 30G single file. The command passed: trevis-59vm1:/usr/lib64/lustre/tests # lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 5.6G 45.7M 5.0G 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 39.0G 49.0M 36.9G 0% /mnt/lustre[OST:0] lustre-OST0001_UUID 39.0G 49.0M 36.9G 0% /mnt/lustre[OST:1] filesystem_summary: 78.0G 98.1M 73.9G 0% /mnt/lustre trevis-59vm1:/usr/lib64/lustre/tests # dd if=/dev/urandom of=/mnt/lustre/large_file_10G bs=1M count=30720 30720+0 records in 30720+0 records out 32212254720 bytes (32 GB, 30 GiB) copied, 2086.88 s, 15.4 MB/s
            sarah Sarah Liu added a comment - - edited +2 on b2_10 https://testing.hpdd.intel.com/test_sets/b7026366-5880-11e8-abc3-52540065bddc https://testing.hpdd.intel.com/test_sets/d93466da-5878-11e8-b9d3-52540065bddc
            sarah Sarah Liu added a comment -

            +1 on master SLES12 sp3 server/client failover, client hit "page allocation failure"

            https://testing.hpdd.intel.com/test_sets/50068624-4679-11e8-960d-52540065bddc

            sarah Sarah Liu added a comment - +1 on master SLES12 sp3 server/client failover, client hit "page allocation failure" https://testing.hpdd.intel.com/test_sets/50068624-4679-11e8-960d-52540065bddc

            Bobijam, it looks like the client is having problems to release pages from the page cache. I suspect there is something going badly with the CLIO page reference/dirty state with the new kernel, that is preventing the page from being released.

            Are you able to reproduce something like this in a VM (e.g. dd very large single file) for debugging?

            adilger Andreas Dilger added a comment - Bobijam, it looks like the client is having problems to release pages from the page cache. I suspect there is something going badly with the CLIO page reference/dirty state with the new kernel, that is preventing the page from being released. Are you able to reproduce something like this in a VM (e.g. dd very large single file) for debugging?
            ys Yang Sheng added a comment -

            Looks like sles12sp3 has a little difference alloc_page logic with upstream. It has brought two proc parameters:

            /proc/sys/vm/pagecache_limit_mb
            
            This tunable sets a limit to the unmapped pages in the pagecache in megabytes.
            If non-zero, it should not be set below 4 (4MB), or the system might behave erratically. In real-life, much larger limits (a few percent of system RAM / a hundred MBs) will be useful.
            
            Examples:
            echo 512 >/proc/sys/vm/pagecache_limit_mb
            
            This sets a baseline limits for the page cache (not the buffer cache!) of 0.5GiB.
            As we only consider pagecache pages that are unmapped, currently mapped pages (files that are mmap'ed such as e.g. binaries and libraries as well as SysV shared memory) are not limited by this.
            NOTE: The real limit depends on the amount of free memory. Every existing free page allows the page cache to grow 8x the amount of free memory above the set baseline. As soon as the free memory is needed, we free up page cache.
            
            /proc/sys/vm/pagecache_limit_ignore_dirty
            
            

            But it should have less effective if work with default value.

            Thanks,
            YangSheng

            ys Yang Sheng added a comment - Looks like sles12sp3 has a little difference alloc_page logic with upstream. It has brought two proc parameters: /proc/sys/vm/pagecache_limit_mb This tunable sets a limit to the unmapped pages in the pagecache in megabytes. If non-zero, it should not be set below 4 (4MB), or the system might behave erratically. In real-life, much larger limits (a few percent of system RAM / a hundred MBs) will be useful. Examples: echo 512 >/proc/sys/vm/pagecache_limit_mb This sets a baseline limits for the page cache (not the buffer cache!) of 0.5GiB. As we only consider pagecache pages that are unmapped, currently mapped pages (files that are mmap'ed such as e.g. binaries and libraries as well as SysV shared memory) are not limited by this. NOTE: The real limit depends on the amount of free memory. Every existing free page allows the page cache to grow 8x the amount of free memory above the set baseline. As soon as the free memory is needed, we free up page cache. /proc/sys/vm/pagecache_limit_ignore_dirty But it should have less effective if work with default value. Thanks, YangSheng

            Hi YangSheng,

            Can you take a look at this one?

            Thanks,

            Brad

            bhoagland Brad Hoagland (Inactive) added a comment - Hi YangSheng, Can you take a look at this one? Thanks, Brad

            The system has about 4GB of RAM. There is not a lot of memory in slab objects (only about 40MB). Most of the memory is tied up in inactive_file (about 3GB) and active_file (about 0.5GB), but none of it is reclaimable.

            It makes sense that there are a bunch of pages is tied up in active_file for dirty pages and RPC bulk replay, but the pages in inactive_file should be reclaimable. I suspect that there is some bad interaction between how CLIO is tracking pages and the VM page state in the newer SLES kernel that makes it appear to the VM that none of the pages can be reclaimed (e.g. extra page references from DLM locks, OSC extents, etc).

            We do have slab callbacks for DLM locks that would release pages, but I'm wondering if dd is using a single large lock on the whole file that this lock cannot be cancelled while it still has dirty pages? This might also relate to LU-9977.

            adilger Andreas Dilger added a comment - The system has about 4GB of RAM. There is not a lot of memory in slab objects (only about 40MB). Most of the memory is tied up in inactive_file (about 3GB) and active_file (about 0.5GB), but none of it is reclaimable. It makes sense that there are a bunch of pages is tied up in active_file for dirty pages and RPC bulk replay, but the pages in inactive_file should be reclaimable. I suspect that there is some bad interaction between how CLIO is tracking pages and the VM page state in the newer SLES kernel that makes it appear to the VM that none of the pages can be reclaimed (e.g. extra page references from DLM locks, OSC extents, etc). We do have slab callbacks for DLM locks that would release pages, but I'm wondering if dd is using a single large lock on the whole file that this lock cannot be cancelled while it still has dirty pages? This might also relate to LU-9977 .

            Looks like dumps on an SLES VM are saved in /var/crash. EL7 saves them in /scratch/dumps.

            Getting a copy of the SLES dumps to an nfs share looks to not be setup:

            el7:

            1. mount | grep export
              onyx-4.onyx.hpdd.intel.com:/export/scratch on /scratch type nfs4

            sles:

            1. mount | grep export
              onyx-3:/export/home/autotest on /home/autotest type nfs4
              #
            jcasper James Casper (Inactive) added a comment - Looks like dumps on an SLES VM are saved in /var/crash. EL7 saves them in /scratch/dumps. Getting a copy of the SLES dumps to an nfs share looks to not be setup: el7: mount | grep export onyx-4.onyx.hpdd.intel.com:/export/scratch on /scratch type nfs4 sles: mount | grep export onyx-3:/export/home/autotest on /home/autotest type nfs4 #
            jcasper James Casper (Inactive) added a comment - 2.10.54: https://testing.hpdd.intel.com/test_sessions/73765244-fb30-4759-a7b6-2f4aaf88cca7
            jcasper James Casper (Inactive) added a comment - 2.10.1: https://testing.hpdd.intel.com/test_sessions/3035e082-1d27-4979-93c7-9b7048c900c1

            People

              bobijam Zhenyu Xu
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: