Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11388

replay-single test_131b: test timeout

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0, Lustre 2.15.4
    • Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.7, Lustre 2.15.0, Lustre 2.15.3
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/41a16040-b988-11e8-9df3-52540065bddc

      test_131b failed with the following error:

      Timeout occurred after 969 mins, last suite running was replay-single, restarting cluster to continue tests
      

      This test starts to fail from tag-2.11.55
      test log

      == replay-single test 131b: DoM file write replay ==================================================== 11:35:59 (1537011359)
      CMD: trevis-33vm4 /usr/sbin/lctl get_param -n version 2>/dev/null ||
      				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
      				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
      CMD: trevis-33vm4 sync; sync; sync
      UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID      1165900        8732     1053972   1% /mnt/lustre[MDT:0]
      lustre-OST0000_UUID      1933276       79708     1731404   4% /mnt/lustre[OST:0]
      lustre-OST0001_UUID      1933276       25880     1786156   1% /mnt/lustre[OST:1]
      lustre-OST0002_UUID      1933276       25808     1786228   1% /mnt/lustre[OST:2]
      lustre-OST0003_UUID      1933276       31488     1780548   2% /mnt/lustre[OST:3]
      lustre-OST0004_UUID      1933276       41772     1770264   2% /mnt/lustre[OST:4]
      lustre-OST0005_UUID      1933276       25888     1786148   1% /mnt/lustre[OST:5]
      lustre-OST0006_UUID      1933276       25840     1786196   1% /mnt/lustre[OST:6]
      
      filesystem_summary:     13532932      256384    12426944   2% /mnt/lustre
      
      CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2 mcreate /mnt/lustre/fsa-\$(hostname); rm /mnt/lustre/fsa-\$(hostname)
      CMD: trevis-33vm1.trevis.whamcloud.com,trevis-33vm2 if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-\$(hostname); rm /mnt/lustre2/fsa-\$(hostname); fi
      CMD: trevis-33vm4 /usr/sbin/lctl --device lustre-MDT0000 notransno
      CMD: trevis-33vm4 dmsetup table /dev/mapper/mds1_flakey
      CMD: trevis-33vm4 dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey
      CMD: trevis-33vm4 dmsetup load /dev/mapper/mds1_flakey --table \"0 4194304 flakey 252:0 0 0 1800 1 drop_writes\"
      CMD: trevis-33vm4 dmsetup resume /dev/mapper/mds1_flakey
      CMD: trevis-33vm4 /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      replay-single test_131b - Timeout occurred after 969 mins, last suite running was replay-single, restarting cluster to continue tests

      Attachments

        Issue Links

          Activity

            [LU-11388] replay-single test_131b: test timeout

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50661
            Subject: LU-11388 tests: replay-single/131b to refresh grants
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6dc4285f738158a90c2ff6b6bd3cbc430b580654

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50661 Subject: LU-11388 tests: replay-single/131b to refresh grants Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6dc4285f738158a90c2ff6b6bd3cbc430b580654

            probably this is LU-16478

            bzzz Alex Zhuravlev added a comment - probably this is LU-16478
            eaujames Etienne Aujames added a comment - +1 on master (ZFS): https://testing.whamcloud.com/test_sets/4c0e7862-d495-4f2e-ab4f-8c30e3a3dc59
            paf0186 Patrick Farrell added a comment - +1 on master: https://testing.whamcloud.com/test_sets/b0074d9c-d7fd-45c1-a5b9-79c61aad0f20

            +1 on master: https://testing.whamcloud.com/test_sets/a0db5427-6afe-4709-91e7-ed111a3ce01f

            It failed even with "LU-11388 test: enable replay-single test_131b".

            scherementsev Sergey Cheremencev added a comment - +1 on master: https://testing.whamcloud.com/test_sets/a0db5427-6afe-4709-91e7-ed111a3ce01f It failed even with " LU-11388 test: enable replay-single test_131b".
            pjones Peter Jones added a comment -

            Does this landing mean that this ticket can be closed or does further work remain?

            pjones Peter Jones added a comment - Does this landing mean that this ticket can be closed or does further work remain?

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/40421/
            Subject: LU-11388 test: enable replay-single test_131b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cb3b2bb683ce128d5d9dacebbe01b23c183cbf4d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/40421/ Subject: LU-11388 test: enable replay-single test_131b Project: fs/lustre-release Branch: master Current Patch Set: Commit: cb3b2bb683ce128d5d9dacebbe01b23c183cbf4d

            Found something similar on 2.12.8 testing: https://testing.whamcloud.com/test_sets/c76df3be-faea-4f20-bb36-f44818a6a7bf

            == replay-single test 131b: DoM file write replay ==================================================== 13:33:22 (1637415202)
            CMD: onyx-109vm10 /usr/sbin/lctl get_param -n version 2>/dev/null ||
            				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
            				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
            CMD: onyx-109vm10 sync; sync; sync
            UUID                   1K-blocks        Used   Available Use% Mounted on
            lustre-MDT0000_UUID      5781172        3020     5255320   1% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID      1908940       17728     1769720   1% /mnt/lustre[OST:0]
            lustre-OST0001_UUID      1908940        1332     1786368   1% /mnt/lustre[OST:1]
            lustre-OST0002_UUID      1908940        1324     1786376   1% /mnt/lustre[OST:2]
            lustre-OST0003_UUID      1908940       11568     1776132   1% /mnt/lustre[OST:3]
            lustre-OST0004_UUID      1908940       11568     1776132   1% /mnt/lustre[OST:4]
            lustre-OST0005_UUID      1908940       11564     1776136   1% /mnt/lustre[OST:5]
            lustre-OST0006_UUID      1908940        1328     1786372   1% /mnt/lustre[OST:6]
            
            filesystem_summary:     13362580       56412    12457236   1% /mnt/lustre
            
            CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm3,onyx-64vm4 mcreate /mnt/lustre/fsa-\$(hostname); rm /mnt/lustre/fsa-\$(hostname)
            CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm3,onyx-64vm4 if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-\$(hostname); rm /mnt/lustre2/fsa-\$(hostname); fi
            CMD: onyx-109vm10 /usr/sbin/lctl --device lustre-MDT0000 notransno
            CMD: onyx-109vm10 dmsetup table /dev/mapper/mds1_flakey
            CMD: onyx-109vm10 dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey
            CMD: onyx-109vm10 dmsetup load /dev/mapper/mds1_flakey --table \"0 20971520 flakey 252:0 0 0 1800 1 drop_writes\"
            CMD: onyx-109vm10 dmsetup resume /dev/mapper/mds1_flakey
            CMD: onyx-109vm10 /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 
            'Timeout occurred after 692 mins, last suite running was replay-single'
            
            anikitenko Alena Nikitenko (Inactive) added a comment - Found something similar on 2.12.8 testing: https://testing.whamcloud.com/test_sets/c76df3be-faea-4f20-bb36-f44818a6a7bf == replay-single test 131b: DoM file write replay ==================================================== 13:33:22 (1637415202) CMD: onyx-109vm10 /usr/sbin/lctl get_param -n version 2>/dev/ null || /usr/sbin/lctl lustre_build_version 2>/dev/ null || /usr/sbin/lctl --version 2>/dev/ null | cut -d ' ' -f2 CMD: onyx-109vm10 sync; sync; sync UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 5781172 3020 5255320 1% /mnt/lustre[MDT:0] lustre-OST0000_UUID 1908940 17728 1769720 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1908940 1332 1786368 1% /mnt/lustre[OST:1] lustre-OST0002_UUID 1908940 1324 1786376 1% /mnt/lustre[OST:2] lustre-OST0003_UUID 1908940 11568 1776132 1% /mnt/lustre[OST:3] lustre-OST0004_UUID 1908940 11568 1776132 1% /mnt/lustre[OST:4] lustre-OST0005_UUID 1908940 11564 1776136 1% /mnt/lustre[OST:5] lustre-OST0006_UUID 1908940 1328 1786372 1% /mnt/lustre[OST:6] filesystem_summary: 13362580 56412 12457236 1% /mnt/lustre CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm3,onyx-64vm4 mcreate /mnt/lustre/fsa-\$(hostname); rm /mnt/lustre/fsa-\$(hostname) CMD: onyx-64vm1.onyx.whamcloud.com,onyx-64vm3,onyx-64vm4 if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-\$(hostname); rm /mnt/lustre2/fsa-\$(hostname); fi CMD: onyx-109vm10 /usr/sbin/lctl --device lustre-MDT0000 notransno CMD: onyx-109vm10 dmsetup table /dev/mapper/mds1_flakey CMD: onyx-109vm10 dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey CMD: onyx-109vm10 dmsetup load /dev/mapper/mds1_flakey --table \ "0 20971520 flakey 252:0 0 0 1800 1 drop_writes\" CMD: onyx-109vm10 dmsetup resume /dev/mapper/mds1_flakey CMD: onyx-109vm10 /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 'Timeout occurred after 692 mins, last suite running was replay-single'

            Just a note that we are still seeing replay-single test 131b timeout for ldiskfs servers on master on branch testing, full test group, https://testing.whamcloud.com/test_sets/7322774c-afbc-4e20-ac2e-7b86cfcf251c and for failover testing https://testing.whamcloud.com/test_sets/57e50c3e-e790-4966-b834-ccc00fa41a81.

            The patch above only disables this test for ZFS servers.

            jamesanunez James Nunez (Inactive) added a comment - Just a note that we are still seeing replay-single test 131b timeout for ldiskfs servers on master on branch testing, full test group, https://testing.whamcloud.com/test_sets/7322774c-afbc-4e20-ac2e-7b86cfcf251c and for failover testing https://testing.whamcloud.com/test_sets/57e50c3e-e790-4966-b834-ccc00fa41a81 . The patch above only disables this test for ZFS servers.

            Vikentsi Lapa (vlapa@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40421
            Subject: LU-11388 test: enable replay-single test_131b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4ba0faacc5ae81bea5cd3fada7298c5563a4c219

            gerrit Gerrit Updater added a comment - Vikentsi Lapa (vlapa@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40421 Subject: LU-11388 test: enable replay-single test_131b Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4ba0faacc5ae81bea5cd3fada7298c5563a4c219

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33279/
            Subject: LU-11388 test: disable replay-single test_131b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 02b6b6746af7e032df51001926fe1d59143520da

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33279/ Subject: LU-11388 test: disable replay-single test_131b Project: fs/lustre-release Branch: master Current Patch Set: Commit: 02b6b6746af7e032df51001926fe1d59143520da

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: