[LU-11909] sanity-lfsck test 18a fails with '(6.1) Expect 3 fixed on mds1, but got: 0' Created: 31/Jan/19  Updated: 31/Jan/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9631 sanity-lfsck test_18a: Expect 3 fixed... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lfsck test_18a fails with '(6.1) Expect 3 fixed on mds1, but got: 0' , test_18b fails with '(4.1) Expect 3 fixed on mds1, but got: 0'
, and test_18c fails with '(4) Expect 3 fixed on mds1, but got: 0'. We are only seeing this on b2_10 and these tests started failing on January 19. 2019. Looking at master failures, we haven’t seen these tests fail on master since at least June of 2018.

Looking at the logs for the failures at https://testing.whamcloud.com/test_sets/3ea3a8f2-2522-11e9-830a-52540065bddc , the last thing seen in the client suite_log for test 18a is

Inject failure, to make the MDT-object lost its layout EA
CMD: trevis-24vm4 /usr/sbin/lctl set_param fail_loc=0x1615
fail_loc=0x1615
CMD: trevis-24vm4 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
The file size should be incorrect since layout EA is lost
Trigger layout LFSCK on all devices to find out orphan OST-object
CMD: trevis-24vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t layout -r -o
Started LFSCK on the device lustre-MDT0000: scrub layout
CMD: trevis-24vm4 /usr/sbin/lctl get_param -n 			mdd.lustre-MDT0000.lfsck_layout |
			awk '/^status/ { print \$2 }'
CMD: trevis-24vm4 /usr/sbin/lctl get_param -n 			mdd.lustre-MDT0000.lfsck_layout |
			awk '/^status/ { print \$2 }'
CMD: trevis-24vm3 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0000.lfsck_layout
CMD: trevis-24vm3 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0001.lfsck_layout
CMD: trevis-24vm3 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0002.lfsck_layout
CMD: trevis-24vm3 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0003.lfsck_layout
CMD: trevis-24vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout
 sanity-lfsck test_18a: @@@@@@ FAIL: (6.1) Expect 3 fixed on mds1, but got: 0

There is no more information in the dmesg nor console logs.

More test 18* failures can be found at
https://testing.whamcloud.com/test_sets/b954607e-253f-11e9-b54c-52540065bddc
https://testing.whamcloud.com/test_sets/413e02a2-252b-11e9-9e22-52540065bddc


Generated at Sat Feb 10 02:47:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.