[LU-16682] sanity-pfl test_1c: comp4 stripe count != 2000 Created: 30/Mar/23  Updated: 05/Apr/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11912 reduce number of OST objects created ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4808a017-c473-4019-91e0-c790c3c97661

test_1c failed with the following error:

comp4 stripe count 1397 != 2000

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/93380 - 4.18.0-372.32.1.el8_6.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/93380 - 4.18.0-372.32.1.el8_lustre.x86_64

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-pfl test_1c - comp4 stripe count 1397 != 2000



 Comments   
Comment by Andreas Dilger [ 31/Mar/23 ]

Sebastien, did this happen on a filesystem with fscrypt enabled? I'm wondering if it failed due to a lack of xattr space for holding 2000 stripes, or if it is just a race condition? I haven't seen it before.

Comment by Andreas Dilger [ 31/Mar/23 ]

Hmm, it looks like it has failed for several different patches starting on 2023-03-29, and not in the 4 weeks before that, so likely a regression due to a patch. The failures are for patches that haven't landed yet, so it is unlikely that they are the cause.

Patches landed on this day:

a7222127c7 LU-16642 tests: improve sanity-sec test_61
8f40a3d711 LU-16639 misc: cleanup concole messages
e998d21caf LU-16589 tests: add sanity/31l to test ln command
17bbf5bdd6 LU-930 docs: fix whatis output
36cbba150b LU-16632 tests: more margin of error for sanity/56xh
91a3726f31 LU-16633 obdclass: fix rpc slot leakage
12c3465199 LU-14291 batch: don't include lustre_update.h for client only builds
d5b26443a3 LU-16615 utils: add messages in l_getidentity
b30f825232 LU-16601 kernel: update SLES15 SP4 [5.14.21-150400.24.46.1]
8f004bc53b LU-16599 obdclass: job_stats can parse escaped jobid string
fc7a0d6013 LU-14668 lnet: add 'lock_prim_nid" lnet module parameter
f5293fb66e LU-16598 osp: cleanup comment in osp_syn
c.c
5e24b374f7 LU-16595 test: save one second in wait_destroy_complete()
da230373bd LU-16563 lnet: use discovered ni status to set initial health
0366422cfd LU-16221 kernel: update RHEL 9.1 [5.14.0-162.18.1.el9_1]
2d40d96b4e LU-15053 tests: reset quota if ENABLE_QUOTA=1
7e893c7095 LU-16382 build: udev files in /usr/lib
b33808d3ae LU-16338 readahead: clip readahead with kms
ccee6b92ec LU-13107 utils: remove duplicate lctl erase/fork_lcfg
2471d35c0e LU-16217 iokit: Add lst.sh wrapper and lst-survey
bdbc7f9f42 LU-12805 tests: disable replay-single/36
73ee638813 LU-16604 kfilnd: kfilnd_peer ref leak on send
6fab1fe4a5 LU-9680 lnet: handle multi-rail setups
0ecb2a167c LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH
c97d4cdf4d LU-16629 osd: refill the existing env

At first glance I would suspect LU-11912 as the culprit, since it is changing OST object allocation to stress test that code in autotest (this would be much less of an issue in production).

Comment by Andreas Dilger [ 31/Mar/23 ]

It looks like there is already a patch in LU-11912.

Comment by Gerrit Updater [ 04/Apr/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50531
Subject: LU-16682 tests: verify OSTSEQWIDTH fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7197e2c295dbb32bb17903b476463c57ec79dd78

Generated at Sat Feb 10 03:29:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.