Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12022

sanity-flr: test_200 'checksum error for mirror 3'

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.13.0, Lustre 2.12.1
    • Fix Version/s: None
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      This issue was created by maloo for paf <pfarrell@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/39847bcc-3985-11e9-8f69-52540065bddc

      Error given is checksum error, but mirror resync just failed entirely. Test should probably be updated to catch the failure there rather than report a checksum error later:

      lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed
      lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..Waiting 7585 7586 7587 7589 7590
      failed
      10.9.4.240@tcp:/lustre /mnt/lustre2 lustre rw,flock,user_xattr,lazystatfs 0 0
      CMD: trevis-20vm1.trevis.whamcloud.com grep -c /mnt/lustre2' ' /proc/mounts
      Stopping client trevis-20vm1.trevis.whamcloud.com /mnt/lustre2 (opts
      CMD: trevis-20vm1.trevis.whamcloud.com lsof -t /mnt/lustre2
      CMD: trevis-20vm1.trevis.whamcloud.com umount /mnt/lustre2 2>&1
      10.9.4.240@tcp:/lustre /mnt/lustre3 lustre rw,flock,user_xattr,lazystatfs 0 0
      CMD: trevis-20vm1.trevis.whamcloud.com grep -c /mnt/lustre3' ' /proc/mounts
      Stopping client trevis-20vm1.trevis.whamcloud.com /mnt/lustre3 (opts
      CMD: trevis-20vm1.trevis.whamcloud.com lsof -t /mnt/lustre3
      CMD: trevis-20vm1.trevis.whamcloud.com umount /mnt/lustre3 2>&1
      mirror_io: 524: llapi_mirror_copy_many
      /mnt/lustre/f200.sanity-flr: found 10 stale components
      /mnt/lustre/f200.sanity-flr: resyncing mirror: 1, components: 65537 65538 65539 65540 65541
      3
      sanity-flr test_200: @@@@@@ FAIL: checksum error for mirror 3
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:5838:error()
      = /usr/lib64/lustre/tests/sanity-flr.sh:2189:test_200()
      = /usr/lib64/lustre/tests/test-framework.sh:6119:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:6158:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:6005:run_test()
      = /usr/lib64/lustre/tests/sanity-flr.sh:2194:main()
      Dumping lctl log to /autotest/trevis/2019-02-26/lustre-reviews-el7_6-x86_64-review-zfs-1_17_1_62058__69de2681-ac9c-46f6-a357-cca06225620a/sanity-flr.test_200.*.1551150323.log
      CMD: trevis-20vm1.trevis.whamcloud.com,trevis-20vm2,trevis-20vm3,trevis-20vm4 /usr/sbin/lctl dk > /autotest/trevis/2019-02-26/lustre-reviews-el7_6-x86_64-review-zfs-1_17_1_62058__69de2681-ac9c-46f6-a357-cca06225620a/sanity-flr.test_200.debug_log.\$(hostname -s).1551150323.log;
      dmesg > /autotest/trevis/2019-02-26/lustre-reviews-el7_6-x86_64-review-zfs-1_17_1_62058__69de2681-ac9c-46f6-a357-cca06225620a/sanity-flr.test_200.dmesg.\$(hostname -s).1551150323.log
      Resetting fail_loc on all nodes...CMD: trevis-20vm1.trevis.whamcloud.com,trevis-20vm2,trevis-20vm3,trevis-20vm4 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
      done.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                maloo Maloo
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: