Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18276

sanity-pfl test_16b: 'setstripe f16b.sanity-pfl.copy failed': No space left on device

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/7665dc3e-903c-4597-b509-2e8b5cdfb0eb

      test_16b failed with the following error:

      striped dir -i0 -c2 -H fnv_1a_64 /mnt/lustre/d16b.sanity-pfl
      1. PFL file
      getstripe --yaml /mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl
      setstripe --yaml=/mnt/lustre/d16b.sanity-pfl/template /mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl.copy
      compare
      2. plain file
      getstripe --yaml /mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl
      setstripe --yaml=/mnt/lustre/d16b.sanity-pfl/template /mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl.copy
      lfs setstripe: cannot create composite file '/mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl.copy': No space left on device
       sanity-pfl test_16b: @@@@@@ FAIL: setstripe /mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl.copy failed 
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master-next/799 - 4.18.0-513.24.1.el8_9.x86_64
      servers: https://build.whamcloud.com/job/lustre-master-next/799 - 4.18.0-513.24.1.el8_lustre.x86_64

      It looks like the first such failure was 2024-06-21 on a "full" test run (not a patch), and has persisted at a low rate since then (1-2 times per week).

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-pfl test_16b - setstripe /mnt/lustre/d16b.sanity-pfl/f16b.sanity-pfl.copy failed

      Attachments

        Issue Links

          Activity

            [LU-18276] sanity-pfl test_16b: 'setstripe f16b.sanity-pfl.copy failed': No space left on device
            gerrit Gerrit Updater added a comment -

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/59416
            Subject: LU-18276 tests: add debugging to sanity-pfl/16b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ccbcb81f3bae2d8f907a259c557a8370c9590d3d

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/59416 Subject: LU-18276 tests: add debugging to sanity-pfl/16b Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ccbcb81f3bae2d8f907a259c557a8370c9590d3d
            adilger Andreas Dilger added a comment - +1 on master: https://testing.whamcloud.com/test_sets/bc10bbba-f8d2-4d23-86c0-b518d6e628eb

            This is failing about 3x per week - 11 failures/4 weeks, plus 6 other failures of this subtest with different error messages (but might have the same root cause).

            adilger Andreas Dilger added a comment - This is failing about 3x per week - 11 failures/4 weeks, plus 6 other failures of this subtest with different error messages (but might have the same root cause).

            This error is coming from llapi_layout_file_open() after it calls fsetxattr() on the file:

            int llapi_layout_file_open(const char *path, int open_flags, mode_t mode,
                                       const struct llapi_layout *layout)
            {
                    :
                    rc = fsetxattr(fd, XATTR_LUSTRE_LOV, lum, lum_size, 0);
                    if (rc < 0) {
                            tmp = errno;
                            close(fd);
                            errno = tmp;
                            fprintf(stderr, "Cannot set layout EA: %s\n", strerror(errno));
                            fd = -1;
                    }
            

            It would be useful to improve this error message to include the EA size (lum_size) to see if this is abnormally large, or why and where the ENOSPC error is being returned.

            adilger Andreas Dilger added a comment - This error is coming from llapi_layout_file_open() after it calls fsetxattr() on the file: int llapi_layout_file_open( const char *path, int open_flags, mode_t mode, const struct llapi_layout *layout) { : rc = fsetxattr(fd, XATTR_LUSTRE_LOV, lum, lum_size, 0); if (rc < 0) { tmp = errno; close(fd); errno = tmp; fprintf(stderr, "Cannot set layout EA: %s\n" , strerror(errno)); fd = -1; } It would be useful to improve this error message to include the EA size ( lum_size ) to see if this is abnormally large, or why and where the ENOSPC error is being returned.
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.whamcloud.com/test_sets/7c2a4ab3-b88e-42de-8051-9030bc0e8976

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: