Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16682

sanity-pfl test_1c: comp4 stripe count != 2000

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for S Buisson <sbuisson@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4808a017-c473-4019-91e0-c790c3c97661

      test_1c failed with the following error:

      comp4 stripe count 1397 != 2000
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/93380 - 4.18.0-372.32.1.el8_6.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/93380 - 4.18.0-372.32.1.el8_lustre.x86_64

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-pfl test_1c - comp4 stripe count 1397 != 2000

      Attachments

        Issue Links

          Activity

            [LU-16682] sanity-pfl test_1c: comp4 stripe count != 2000
            gerrit Gerrit Updater added a comment - - edited

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50531
            Subject: LU-16682 tests: verify OSTSEQWIDTH fix
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7197e2c295dbb32bb17903b476463c57ec79dd78

            gerrit Gerrit Updater added a comment - - edited "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50531 Subject: LU-16682 tests: verify OSTSEQWIDTH fix Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7197e2c295dbb32bb17903b476463c57ec79dd78

            It looks like there is already a patch in LU-11912.

            adilger Andreas Dilger added a comment - It looks like there is already a patch in LU-11912 .

            Hmm, it looks like it has failed for several different patches starting on 2023-03-29, and not in the 4 weeks before that, so likely a regression due to a patch. The failures are for patches that haven't landed yet, so it is unlikely that they are the cause.

            Patches landed on this day:

            a7222127c7 LU-16642 tests: improve sanity-sec test_61
            8f40a3d711 LU-16639 misc: cleanup concole messages
            e998d21caf LU-16589 tests: add sanity/31l to test ln command
            17bbf5bdd6 LU-930 docs: fix whatis output
            36cbba150b LU-16632 tests: more margin of error for sanity/56xh
            91a3726f31 LU-16633 obdclass: fix rpc slot leakage
            12c3465199 LU-14291 batch: don't include lustre_update.h for client only builds
            d5b26443a3 LU-16615 utils: add messages in l_getidentity
            b30f825232 LU-16601 kernel: update SLES15 SP4 [5.14.21-150400.24.46.1]
            8f004bc53b LU-16599 obdclass: job_stats can parse escaped jobid string
            fc7a0d6013 LU-14668 lnet: add 'lock_prim_nid" lnet module parameter
            f5293fb66e LU-16598 osp: cleanup comment in osp_syn
            c.c
            5e24b374f7 LU-16595 test: save one second in wait_destroy_complete()
            da230373bd LU-16563 lnet: use discovered ni status to set initial health
            0366422cfd LU-16221 kernel: update RHEL 9.1 [5.14.0-162.18.1.el9_1]
            2d40d96b4e LU-15053 tests: reset quota if ENABLE_QUOTA=1
            7e893c7095 LU-16382 build: udev files in /usr/lib
            b33808d3ae LU-16338 readahead: clip readahead with kms
            ccee6b92ec LU-13107 utils: remove duplicate lctl erase/fork_lcfg
            2471d35c0e LU-16217 iokit: Add lst.sh wrapper and lst-survey
            bdbc7f9f42 LU-12805 tests: disable replay-single/36
            73ee638813 LU-16604 kfilnd: kfilnd_peer ref leak on send
            6fab1fe4a5 LU-9680 lnet: handle multi-rail setups
            0ecb2a167c LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH
            c97d4cdf4d LU-16629 osd: refill the existing env
            

            At first glance I would suspect LU-11912 as the culprit, since it is changing OST object allocation to stress test that code in autotest (this would be much less of an issue in production).

            adilger Andreas Dilger added a comment - Hmm, it looks like it has failed for several different patches starting on 2023-03-29, and not in the 4 weeks before that, so likely a regression due to a patch. The failures are for patches that haven't landed yet, so it is unlikely that they are the cause. Patches landed on this day: a7222127c7 LU-16642 tests: improve sanity-sec test_61 8f40a3d711 LU-16639 misc: cleanup concole messages e998d21caf LU-16589 tests: add sanity/31l to test ln command 17bbf5bdd6 LU-930 docs: fix whatis output 36cbba150b LU-16632 tests: more margin of error for sanity/56xh 91a3726f31 LU-16633 obdclass: fix rpc slot leakage 12c3465199 LU-14291 batch: don't include lustre_update.h for client only builds d5b26443a3 LU-16615 utils: add messages in l_getidentity b30f825232 LU-16601 kernel: update SLES15 SP4 [5.14.21-150400.24.46.1] 8f004bc53b LU-16599 obdclass: job_stats can parse escaped jobid string fc7a0d6013 LU-14668 lnet: add 'lock_prim_nid" lnet module parameter f5293fb66e LU-16598 osp: cleanup comment in osp_syn c.c 5e24b374f7 LU-16595 test: save one second in wait_destroy_complete() da230373bd LU-16563 lnet: use discovered ni status to set initial health 0366422cfd LU-16221 kernel: update RHEL 9.1 [5.14.0-162.18.1.el9_1] 2d40d96b4e LU-15053 tests: reset quota if ENABLE_QUOTA=1 7e893c7095 LU-16382 build: udev files in /usr/lib b33808d3ae LU-16338 readahead: clip readahead with kms ccee6b92ec LU-13107 utils: remove duplicate lctl erase/fork_lcfg 2471d35c0e LU-16217 iokit: Add lst.sh wrapper and lst-survey bdbc7f9f42 LU-12805 tests: disable replay-single/36 73ee638813 LU-16604 kfilnd: kfilnd_peer ref leak on send 6fab1fe4a5 LU-9680 lnet: handle multi-rail setups 0ecb2a167c LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH c97d4cdf4d LU-16629 osd: refill the existing env At first glance I would suspect LU-11912 as the culprit, since it is changing OST object allocation to stress test that code in autotest (this would be much less of an issue in production).

            Sebastien, did this happen on a filesystem with fscrypt enabled? I'm wondering if it failed due to a lack of xattr space for holding 2000 stripes, or if it is just a race condition? I haven't seen it before.

            adilger Andreas Dilger added a comment - Sebastien, did this happen on a filesystem with fscrypt enabled? I'm wondering if it failed due to a lack of xattr space for holding 2000 stripes, or if it is just a race condition? I haven't seen it before.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: