Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17944

replay-vbr test_1b: FAIL: client not evicted

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.5
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/15201e27-41a9-4e2b-a401-79c33d90dca4

      test_1b failed with the following error:

      trevis-103vm11.trevis.whamcloud.com not evicted
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_15/88 - 4.18.0-553.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-b2_15/88 - 4.18.0-553.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      mount facets: mds1
      CMD: trevis-102vm6 dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
      CMD: trevis-102vm6 dmsetup status /dev/mapper/mds1_flakey 2>&1
      CMD: trevis-102vm6 dmsetup table /dev/mapper/mds1_flakey
      CMD: trevis-102vm6 dmsetup suspend --nolockfs --noflush /dev/mapper/mds1_flakey
      CMD: trevis-102vm6 dmsetup load /dev/mapper/mds1_flakey --table \"0 3964928 linear 252:0 0\"
      CMD: trevis-102vm6 dmsetup resume /dev/mapper/mds1_flakey
      CMD: trevis-102vm6 test -b /dev/mapper/mds1_flakey
      CMD: trevis-102vm6 e2label /dev/mapper/mds1_flakey
      Starting mds1: -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
      CMD: trevis-102vm6 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov /dev/mapper/mds1_flakey /mnt/lustre-mds1
      CMD: trevis-102vm6 /usr/sbin/lctl get_param -n health_check
      CMD: trevis-102vm6 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4
      trevis-102vm6: CMD: trevis-102vm6 /usr/sbin/lctl get_param -n version 2>/dev/null
      trevis-102vm6: CMD: trevis-102vm6 /usr/sbin/lctl get_param -n version 2>/dev/null
      trevis-102vm6: CMD: trevis-103vm13 /usr/sbin/lctl get_param -n version 2>/dev/null
      trevis-102vm6: CMD: trevis-102vm6.trevis.whamcloud.com /usr/sbin/lctl get_param -n version 2>/dev/null
      trevis-102vm6: trevis-102vm6.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      CMD: trevis-102vm6 e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]

      {3}

      [0-9]

      {4}

      '
      pdsh@trevis-103vm11: trevis-102vm6: ssh exited with exit code 1
      CMD: trevis-102vm6 e2label /dev/mapper/mds1_flakey 2>/dev/null
      Started lustre-MDT0000
      replay-vbr test_1b: @@@@@@ FAIL: trevis-103vm11.trevis.whamcloud.com not evicted

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      replay-vbr test_1b - trevis-103vm11.trevis.whamcloud.com not evicted

      Attachments

        Issue Links

          Activity

            [LU-17944] replay-vbr test_1b: FAIL: client not evicted
            adilger Andreas Dilger added a comment - - edited

            It looks like this has been failing intermittently for a while, but no bug was open. There were a few bugs for this failure in the past, LU-6309, LU-3750, etc. that were closed with "cannot reproduce" so it doesn't seem like a very common failure.

            However, has failed 6x after 2024-05-28 on multiple branches (b2_15, master, even b_es5_2), not in interop testing, so it isn't at all clear why that would happen. The previous non-self-inflicted failure was 2024-01-06. Possibly something changed in the test environment?

            adilger Andreas Dilger added a comment - - edited It looks like this has been failing intermittently for a while, but no bug was open. There were a few bugs for this failure in the past, LU-6309 , LU-3750 , etc. that were closed with "cannot reproduce" so it doesn't seem like a very common failure. However, has failed 6x after 2024-05-28 on multiple branches (b2_15, master, even b_es5_2), not in interop testing, so it isn't at all clear why that would happen. The previous non-self-inflicted failure was 2024-01-06. Possibly something changed in the test environment?

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: