[LU-14973] ost-pools test_20 test_23b: mds1:pool add failed testpool; lustre-OST[0000-0006/3] Created: 27/Aug/21 Updated: 27/Oct/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run, which has failed about 20 times in the past 4 months with this exact error message, and another 20 with other error messages that may or may not be related: https://testing.whamcloud.com/test_sets/f1ed827f-01f0-4992-af7e-ab7f204f6bb8 test_23b failed with the following error: CMD: trevis-45vm1 lctl pool_new lustre.testpool trevis-45vm1: Pool lustre.testpool created CMD: trevis-45vm1 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool 2>/dev/null || echo foo CMD: trevis-77vm7.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool 2>/dev/null || echo foo CMD: trevis-45vm1 lctl pool_add lustre.testpool lustre-OST[0000-0006/3] trevis-45vm1: OST lustre-OST0000_UUID added to pool lustre.testpool trevis-45vm1: OST lustre-OST0003_UUID added to pool lustre.testpool trevis-45vm1: OST lustre-OST0006_UUID added to pool lustre.testpool CMD: trevis-45vm1 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool | sort -u | tr '\n' ' ' : ost-pools test_23b: @@@@@@ FAIL: mds1:pool add failed testpool; lustre-OST[0000-0006/3] There are no errors when adding and removing the OSTs from testpool, so I think they are added properly. I suspect this is some kind of bug/race in ost-pools.sh::add_pool() where wait_update_facet() is checking for "$tgt" that is not specified exactly the same way that get_param is printing it (e.g. extra space(s), different OST ordering, etc). VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 27/Aug/21 ] |
|
It appears that the test_23b failure is a follow-on from test_20 failing with a similar problem: CMD: trevis-27vm6 lctl pool_add lustre.testpool2 lustre-OST[0001-0006/2] trevis-27vm6: OST lustre-OST0001_UUID added to pool lustre.testpool2 trevis-27vm6: OST lustre-OST0003_UUID added to pool lustre.testpool2 trevis-27vm6: OST lustre-OST0005_UUID added to pool lustre.testpool2 CMD: trevis-27vm6 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool2 | sort -u | tr '\n' ' ' : ost-pools test_20: @@@@@@ FAIL: mds1:pool add failed testpool2; lustre-OST[0001-0006/2] |