Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11728

parallel-scale test connectathon fails with ''connectathon failed: 1''

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.6
    • DNE/ZFS
    • 3
    • 9223372036854775807

    Description

      parallel-scale test connectathon fails with ''connectathon failed: 1''

      Looking at the logs at https://testing.whamcloud.com/test_sets/bc3cdec6-f6df-11e8-86c0-52540065bddc, the final lines of parallel-scale test_connectathon are

      test rewind support
      
      test telldir cookies
      expected file 90 at cookie 0, found .
      special tests failed
       parallel-scale test_connectathon: @@@@@@ FAIL: connectathon failed: 1 
      

      There are no errors in the dmesg nor console logs.

      This failure looks like LU-9322, but there are no errors in the console logs as in LU-9322.

      This is the first time this error has been seen in at least the past seven months. So far, this has only failed for ZFS/DNE configurations.

      Attachments

        Issue Links

          Activity

            [LU-11728] parallel-scale test connectathon fails with ''connectathon failed: 1''

            I don't see any evidence that the connectathon test was run with a striped directory.

            jamesanunez James Nunez (Inactive) added a comment - I don't see any evidence that the connectathon test was run with a striped directory.

            The failing test "telldir cookies" relates to being able to restart "readdir()" operations at a specific spot. It isn't totally surprising that this would be having problems on a DNE striped directory (if that is what is being tested), since this is one of the more complex parts of DNE to combine. However, it isn't totally clear to me what the test is trying to do. The Lustre logs show a lot of seeking around in the directory, and then finally the test fails when it returns to directory offset 0. There are a large number of connectathon tests passing, so it isn't necessarily a critical/common issue.

            adilger Andreas Dilger added a comment - The failing test "telldir cookies" relates to being able to restart " readdir() " operations at a specific spot. It isn't totally surprising that this would be having problems on a DNE striped directory (if that is what is being tested), since this is one of the more complex parts of DNE to combine. However, it isn't totally clear to me what the test is trying to do. The Lustre logs show a lot of seeking around in the directory, and then finally the test fails when it returns to directory offset 0. There are a large number of connectathon tests passing, so it isn't necessarily a critical/common issue.

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: