[LU-16442] obdfilter-survey test_3a: Error: 'set mdt quota type failed' Created: 04/Jan/23 Updated: 08/Mar/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Vitaliy Kuznetsov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | topfail | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Minh Diep <mdiep@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a6ea0c49-fb20-45e0-97f9-7055b311ef6a test_3a failed with the following error: set mdt quota type failed onyx-80vm4: losetup: /dev/mapper/mds1_flakey: failed to set up loop device: No such file or directory CMD: onyx-80vm4 test -b /dev/mapper/mds1_flakey pdsh@onyx-80vm1: onyx-80vm4: ssh exited with exit code 1 CMD: onyx-80vm4 e2label /dev/mapper/mds1_flakey onyx-80vm4: e2label: No such file or directory while trying to open /dev/mapper/mds1_flakey onyx-80vm4: Couldn't find valid filesystem superblock. pdsh@onyx-80vm1: onyx-80vm4: ssh exited with exit code 1 Starting mds1: -o localrecov,loop /dev/mapper/mds1_flakey /mnt/lustre-mds1 CMD: onyx-80vm4 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov,loop /dev/mapper/mds1_flakey /mnt/lustre-mds1 onyx-80vm4: mount: /mnt/lustre-mds1: failed to setup loop device for /dev/mapper/mds1_flakey. pdsh@onyx-80vm1: onyx-80vm4: ssh exited with exit code 32 Start of /dev/mapper/mds1_flakey on mds1 failed 32 CMD: onyx-80vm3 mkdir -p /mnt/lustre-ost1 CMD: onyx-80vm3 dmsetup status /dev/mapper/ost1_flakey >/dev/null 2>&1 pdsh@onyx-80vm1: onyx-80vm3: ssh exited with exit code 1 CMD: onyx-80vm3 test -b /dev/mapper/ost1_flakey pdsh@onyx-80vm1: onyx-80vm3: ssh exited with exit code 1 CMD: onyx-80vm3 loop_dev=\$(losetup -j /dev/mapper/ost1_flakey | cut -d : -f 1); first occured: https://testing.whamcloud.com/sub_tests/11e3ecf2-6ba6-4222-8c36-301ce21f203f VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Minh Diep [ 04/Jan/23 ] |
|
failing ~30% of the time |
| Comment by Andreas Dilger [ 04/Jan/23 ] |
|
It looks like the "set mdt quota type failed" error started on 2022-12-13. Patches landed around that date: $ git log --oneline --after 2022-12-10 --before 2022-12-14 --color=never dce487f53a6f LU-16366 build: Add LCME_FL_PARITY to wirecheck dbedb9a5f0bd LU-16364 llite: Move d_u.d_alias compat define fedf1e8bd70c LU-16363 build: fiemap flexible array bb951b90268b LU-16359 build: RHEL use Module.symvers during find-provides b455b42fa875 LU-13705 utils: fix llstat -n option c8a33e5322b0 LU-16353 config: enable_foo variables mustn't contains space 30c5421ad567 LU-16346 utils: fix lctl stack smashing d56ea0c80a95 LU-14992 tests: add more mkdir_on_mdt0 calls 6e66cbdb5c8c LU-15816 tests: use correct ost host to manage failure 51851705e936 LU-16334 llite: update statx size/ctime for fallocate 624e78ae80cd LU-930 docs: add lfs-rm_entry.8 man page d1dbf26afd66 LU-16291 build: make kobj_type constant 6f74bb60ff6c LU-16205 sec: reserve flag for fid2path for encrypted files b054fcd7852f LU-16159 lod: cancel update llogs upon recovery abort 1819f6006ff5 LU-15801 ldiskfs: Server support for RHEL9 88bccc4fa4dd LU-16114 build: Update security_dentry_init_security args c13eccf71dde LU-16112 build: ki_complete removed unused argument 99d1f12c7c5e LU-15581 utils: add check_iam util c95973fea184 LU-6142 lustre: fix minor typos in comments 6b69d22e4cb7 LU-15707 lod: force creation of a component without a pool e42efe35eec7 LU-16231 misc: fix stats snapshot_time to use wallclock e96cb6ff1fea LU-16110 lprocfs: make job_stats and rename_stats valid YAML |
| Comment by Colin Faber [ 07/Mar/23 ] |
|
Hi mdiep is this still failing regularly? |
| Comment by Andreas Dilger [ 08/Mar/23 ] |
|
Colin, you can check this easily in Maloo by doing a subtest search for obdfilter-survey test_3a (this is automatically generated by clicking on the subtest number in any Maloo failure report, and then expanding the "Within" date range as needed): It looks like the last reported failure like this was on 2023-01-05. Patches landed on 2023-01-06 are: $ git log --oneline --after 2023-01-05 --before 2023-01-07 master 5b06ba9d46 LU-16439 socklnd: clarify error message on timeout 557bb0004d LU-16438 llite: remove false outdated comment 4f0273b3bc LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x 374f12ba11 LU-14409 ldiskfs: remove stray tracing code 41bed753b3 LU-16387 lustre: switch OBD_ALLOC_LARGE to vmalloc faster 25d6e3ca63 LU-15626 tests: Fix shellcheck warning for acceptance-small d5fe41a02a LU-16335 test: add fail_abort_cleanup() d622b26d8d LU-16322: build: Add client build support for openEuler 44e2f44f29 LU-16279 lnet: improve error reporting in LUTF 34556ca18a LU-16268 mdd: set effective changelog mask correctly 445f85de2b LU-16117 build: Avoid excessive modpost warnings 61e83a6f13 LU-16113 build: Fix configure tests for lock_page_memcg 009faf132d LU-16116 build: Configure tests for rhltable, bitmap_alloc... d54e8e95de LU-16118 build: Use pde_data() when available 14cdcd6198 LU-13642 lnet: Allow IP specification 18b4e28f18 LU-15288 lnet: increase transaction timeout 5cd5a49c72 LU-16321 osd: Allow fiemap on kernel buffers 4b9a39d3ed LU-14645 tests: test lfs setdirstripe with '/$' That said, I was almost going to say this could be closed with "Cannot Reproduce", but looking at the test output it isn't clear whether this test is working correctly or not, even for tests that report PASS, because it is printing a ton of errors : + NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=network rslt_loc=/tmp targets="10.240.43.246" /usr/bin/obdfilter-survey Tue Mar 7 00:40:54 UTC 2023 Obdfilter-survey for case=network from trevis-97vm1.trevis.whamcloud.com ost 1 sz 1048576K rsz 1024K obj 1 thr 2 write 115686.24 ERROR rewrite 114559.96 ERROR read 113192.26 ERROR ost 1 sz 1048576K rsz 1024K obj 1 thr 4 write 108180.12 ERROR rewrite 106411.16 ERROR read 103443.34 ERROR done! =======================> ost 1 sz 1048576K rsz 1024K obj 1 thr 2 =============> Create 1 on localhost:echotmp_ecc create: 1 objects create: #1 is object id 0x10000001 =============> write localhost:echotmp_ecc Print status every 1 seconds --threads: starting 2 threads on device 1 running test_brw 512 wx q 256 2t268435457 g256 error: test_brw-2: #1 - Invalid argument on write error: test_brw-1: #1 - Invalid argument on write --threads: PID 32664 had rc=22 --threads: PID 32663 had rc=22 Total: total 2 threads 2 sec 0.000149 13422.818792/second : : |