Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0, Lustre 2.14.0, Lustre 2.16.0, Lustre 2.15.6
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for jianyu <yujian@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/d66394cc-abc3-11e9-a0be-52540065bddc
test_20b failed with the following error:
CMD: trevis-70vm4 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: trevis-70vm4 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* Delete is not completed in 29 seconds CMD: trevis-70vm4 /usr/sbin/lctl get_param osc.*MDT*.sync_* osc.lustre-OST0000-osc-MDT0000.sync_changes=0 osc.lustre-OST0000-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0000-osc-MDT0000.sync_in_progress=1 osc.lustre-OST0000-osc-MDT0002.sync_changes=0 osc.lustre-OST0000-osc-MDT0002.sync_in_flight=0 osc.lustre-OST0000-osc-MDT0002.sync_in_progress=0
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-pfl test_20b - test_20b returned 1
Note that in contrast to the earlier failures James highlighted, the most recent failure does not have any errors in the force_sync set_param. That seems highly relevant because such an error would explain getting stuck, but it's not present in this most recent failure. I suspect in the earlier case we had an issue with the connection coming back up after the failover (so we failed to set sync, so we failed this check, etc), whereas here it looks more like a ZFS/storage issue
.