[LU-11643] create disk images for Lustre 2.10 and 2.12 for ldiskfs Created: 08/Nov/18 Updated: 26/Oct/23 Resolved: 07/Mar/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Sarah Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LTS12 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
We need to create disk images for Lustre 2.10.0 and 2.12.0 for conf-sanity test_32 upgrade regression testing, similar to the existing lustre/tests/disk*.tar.bz2 files. These new test filesystems should include files using the new features for those releases, and tests that verify they are working properly:
Also, some of the files need to be striped over multiple OSTs (e.g. "lfs setstripe -c 2"), but store data on only a single object (i.e. size = 4KB). This relates to an LFSCK issue that I saw with filter_fid and would be good to test. |
| Comments |
| Comment by Gerrit Updater [ 03/Jun/19 ] |
|
Wei Liu (sarah@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35049 |
| Comment by Gerrit Updater [ 21/Feb/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37680 |
| Comment by Gerrit Updater [ 23/Feb/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/37680/ |
| Comment by Gerrit Updater [ 14/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35049/ |
| Comment by Peter Jones [ 14/May/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 15/May/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38619 |
| Comment by Gerrit Updater [ 17/May/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/38619/ |
| Comment by Andreas Dilger [ 17/May/20 ] |
|
Patch was reverted because of test failures. |
| Comment by Sarah Liu [ 22/Jan/21 ] |
|
I have tried with disk2_12-ldiskfs image(2 mdts, 2 osts) again and found it hung in test_32a when try to mount the client without writeconf, client side shows following error: [207084.287987] Lustre: DEBUG MARKER: == conf-sanity testing /usr/lib64/lustre/tests/disk2_12-ldiskfs.tar.bz2 upgrade ====================== 03:36:58 (1611286618) [207175.053537] Lustre: DEBUG MARKER: == conf-sanity nid = 10.9.6.32@tcp mount -t lustre 10.9.6.32@tcp:/t32fs /tmp/t32/mnt/lustre ========== 03:38:29 (1611286709) [207232.567915] LustreError: 30549:0:(mgc_request.c:1578:mgc_apply_recover_logs()) mgc: cannot find UUID by nid '10.9.6.32@tcp': rc = -2 [207232.570266] Lustre: 30549:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.9.6.32@tcp: error processing recovery log t32fs-cliir: rc = -2 [207232.572846] Lustre: 30549:0:(mgc_request.c:2149:mgc_process_log()) MGC10.9.6.32@tcp: IR log t32fs-cliir failed, not fatal: rc = -2 [207256.403162] LustreError: 30549:0:(lmv_obd.c:1263:lmv_statfs()) t32fs-MDT0000-mdc-ffff91bf366cf800: can't stat MDS #0: rc = -11 [207256.424154] Lustre: Unmounted t32fs-client [207256.425232] LustreError: 30549:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount (-11) [207312.563317] LustreError: 30588:0:(mgc_request.c:1578:mgc_apply_recover_logs()) mgc: cannot find UUID by nid '10.9.6.32@tcp': rc = -2 [207312.565523] LustreError: 30588:0:(mgc_request.c:1578:mgc_apply_recover_logs()) Skipped 1 previous similar message [207312.567342] Lustre: 30588:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.9.6.32@tcp: error processing recovery log t32fs-cliir: rc = -2 [207312.569687] Lustre: 30588:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) Skipped 1 previous similar message [207312.571612] Lustre: 30588:0:(mgc_request.c:2149:mgc_process_log()) MGC10.9.6.32@tcp: IR log t32fs-cliir failed, not fatal: rc = -2 [207312.573790] Lustre: 30588:0:(mgc_request.c:2149:mgc_process_log()) Skipped 1 previous similar message [207322.606194] LustreError: 11-0: t32fs-MDT0000-mdc-ffff91bf793af000: operation mds_connect to node 10.9.6.32@tcp failed: rc = -11 [207402.734185] LustreError: 11-0: t32fs-MDT0000-mdc-ffff91bf793a8800: operation mds_connect to node 10.9.6.32@tcp failed: rc = -11 [207456.137501] LustreError: 30588:0:(lmv_obd.c:1263:lmv_statfs()) t32fs-MDT0000-mdc-ffff91bf793a8800: can't stat MDS #0: rc = -11 [207456.158467] Lustre: Unmounted t32fs-client [207456.159584] LustreError: 30588:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount (-11) The client mount code was added by $MOUNT_CMD $nid:/$fsname $tmp/mnt/lustre || {
error_noexit "Mounting the client"
return 1
}
[[ $(do_facet mds1 pgrep orph_.*-MDD | wc -l) == 0 ]] ||
error "MDD orphan cleanup thread not quit"
umount $tmp/mnt/lustre || {
error_noexit "Unmounting the client"
return 1
}
|
| Comment by Gerrit Updater [ 10/Dec/21 ] |
|
"Wei Liu <sarah@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45827 |
| Comment by Gerrit Updater [ 26/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45827/ |
| Comment by Peter Jones [ 26/Jan/22 ] |
|
Landed for 2.15 |
| Comment by Andreas Dilger [ 28/Jan/22 ] |
|
It looks like the newly landed images for 2.10 and 2.12 fail conf-sanity test_32a whenever they are run. The patch passed testing because it used: Test-Parameters: fstype=ldiskfs envdefinitions=ONLY="32f 32g" testlist=conf-sanit which did not run test 32a-e on the new images. The test is consistently hanging at: : CMD: onyx-41vm7 /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15 CMD: onyx-41vm7 /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9 CMD: onyx-41vm7 /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M CMD: onyx-41vm7 /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70 CMD: onyx-41vm7 /usr/sbin/lctl pool_new t32fs.interop onyx-41vm7: Pool t32fs.interop created The next line of output for a passing test is: CMD: onyx-65vm11 pgrep orph_.*-MDD There are still a number of patches that are currently passing this test because they have not yet updated to include this patch or are themselves marked trivial and do not run this test, and conversely conf-sanity was failing 100% on master-next. I will push a patch to temporarily disable the new images from being used for this test. |
| Comment by Gerrit Updater [ 28/Jan/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46353 |
| Comment by Peter Jones [ 29/Jan/22 ] |
|
Is this fixed by the landing of https://review.whamcloud.com/46354/? |
| Comment by Sarah Liu [ 01/Feb/22 ] |
|
test_32a hung with new disk images are known issue, I commented on https://jira.whamcloud.com/browse/LU-11643?focusedCommentId=290110&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-290110, That "mount client" code was added by |
| Comment by Andreas Dilger [ 02/Feb/22 ] |
|
Yes, the patch https://review.whamcloud.com/46354 " The current problem is test_33b looks like the filesystem image is missing files in the ROOT directory that the test expects to see when there are multiple MDTs ( == checking sha1sums == CMD: onyx-71vm5 cat /tmp/t32/sha1sums /tmp/t32/mnt/lustre --- /tmp/t32/sha1sums.orig 2022-02-01 23:27:02.120240417 +0000 +++ /tmp/t32/sha1sums 2022-02-01 23:27:02.123240424 +0000 @@ -1,10 +0,0 @@ -59ced6686342e5fdff70a29277632622ad271168 ./init.d/functions -ff4f8d1bcd9ab4a9edcf77496e23963e5c6f6a2c ./init.d/lsvcgss -f8f634b92b75af4112634a6f14464e562cd82454 ./init.d/lustre -dff7d87de75271f0714c3b82921d40c96598f67a ./init.d/netconsole -21414c2b3c89f95d3eab00dafc954d3f6cf3ba9f ./init.d/network -f87a11aceaf7dc0e1614ea074fda14d6896ac66f ./init.d/README -92624163580750ca250a2c1cc8bd531d0609702a ./init.d/rhnsd -a17ecaeb91c0218092c8b01308a132698da9b81f ./pfl_dir/pfl_file -da39a3ee5e6b4b0d3255bfef95601890afd80709 ./project_quota_dir/pj_quota_file_old -2c72448b440f16c9fae18e287ca827c25d29a7cb ./rc.local ==** find returned files **== My patch https://review.whamcloud.com/46404 " The best solution would be fixing the test32newtarball() function to properly fill the image for what the tests expect, if that is really the problem, rather than fixing the images by hand. It also looks like the disk2_5-ldiskfs.tar.bz2 image has a similar problem, which is causing the Janitor test failures, but it has never been tested because it wasn't included in the lustre-tests RPM due to not being added to lustre/tests/Makefile.am. I'm not sure whether it is worthwhile to fix that image at this point, or maybe it should be removed from Git entirely (though we would "lose" some test coverage in this case). |
| Comment by Andreas Dilger [ 02/Feb/22 ] |
|
Looking at the Janitor testing, it also failed test_32c in the same way, which is checking the striped_dir, but the files are missing, so it may not be an image problem. What is needed at this point is to check with master whether the mounted filesystem shows the files to be present in the root directory and striped_dir, and then check whether this matches 2.12, or if there really is an upgrade problem. |
| Comment by Sarah Liu [ 02/Feb/22 ] |
|
For these 2 images, when doing the sha1sums check, it needs to go into the "remote_dir" dir instead of ROOT or striped_dir. I think we need to set the "pfl_upgrade=yes" (or other symbols) in test_32x which failed this part. if $r test -f $tmp/sha1sums; then
# LU-2393 - do both sorts on same node to ensure locale
# is identical
$r cat $tmp/sha1sums | sort -k 2 >$tmp/sha1sums.orig
if [ "$dne_upgrade" != "no" ]; then
pushd $tmp/mnt/lustre/striped_dir
elif [ "$pfl_upgrade" != "no" ] ||
[ "$flr_upgrade" != "no" ] ||
[ "$dom_new_upgrade" != "no" ] ||
[ "$project_quota_upgrade" != "no" ]; then
pushd $tmp/mnt/lustre/remote_dir
else
pushd $tmp/mnt/lustre
fi
I will try locally to verity first |
| Comment by Andreas Dilger [ 03/Feb/22 ] |
|
It looks like the test should be checking both remote_dir and striped_dir_old. It looks like a few parts of the test check for dne_upgrade, but they should always run that part of the test if mds2_is_available is true and the directory exists. It may be that we don't need to regenerate the images at all. Please make any changes starting with my patch https://review.whamcloud.com/46404 so that we keep the debug messages. |
| Comment by Peter Jones [ 07/Mar/22 ] |
|
Fixed in 2.15 |
| Comment by Alexander Zarochentsev [ 28/Aug/22 ] |
|
sarah , |
| Comment by Sarah Liu [ 29/Aug/22 ] |
|
Hello, |