Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run, which has failed about 20 times in the past 4 months with this exact error message, and another 20 with other error messages that may or may not be related:
https://testing.whamcloud.com/test_sets/f1ed827f-01f0-4992-af7e-ab7f204f6bb8
test_23b failed with the following error:
CMD: trevis-45vm1 lctl pool_new lustre.testpool trevis-45vm1: Pool lustre.testpool created CMD: trevis-45vm1 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool 2>/dev/null || echo foo CMD: trevis-77vm7.trevis.whamcloud.com lctl get_param -n lov.lustre-*.pools.testpool 2>/dev/null || echo foo CMD: trevis-45vm1 lctl pool_add lustre.testpool lustre-OST[0000-0006/3] trevis-45vm1: OST lustre-OST0000_UUID added to pool lustre.testpool trevis-45vm1: OST lustre-OST0003_UUID added to pool lustre.testpool trevis-45vm1: OST lustre-OST0006_UUID added to pool lustre.testpool CMD: trevis-45vm1 lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.testpool | sort -u | tr '\n' ' ' : ost-pools test_23b: @@@@@@ FAIL: mds1:pool add failed testpool; lustre-OST[0000-0006/3]
There are no errors when adding and removing the OSTs from testpool, so I think they are added properly.
I suspect this is some kind of bug/race in ost-pools.sh::add_pool() where wait_update_facet() is checking for "$tgt" that is not specified exactly the same way that get_param is printing it (e.g. extra space(s), different OST ordering, etc).
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
ost-pools test_23b - mds1:pool add failed testpool; lustre-OST[0000-0006/3]