[LU-4879] sanity-lfsck test 17 failure: test_17 failed with 3 Created: 10/Apr/14  Updated: 07/Jul/14  Resolved: 18/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: James Nunez (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: lfsck
Environment:

Toro Autotest


Issue Links:
Related
is related to LU-5301 Test failure sanity-lfsck test_17: (3... Closed
Severity: 3
Rank (Obsolete): 13501

 Description   

Sanity-lfsck test 17 has failed with error message "test_17 failed with 3" a few times over the past month. Results from one failure are at https://maloo.whamcloud.com/sub_tests/2c265c8e-c040-11e3-8c28-52540035b04c .

The test 17 creates an MDT-OST inconsistency using fail_loc=0x1614 (OBD_FAIL_LFSCK_MULTIPLE_REF) meaning two MDT objects reference the same OST object. LFSCK layout is called and is waiting for a status of "complete", but LFSCK finished with a status of "partial". From the client test_log:

Started LFSCK on the device lustre-MDT0000: layout.
CMD: client-30vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: client-30vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
Waiting 3 secs for update
CMD: client-30vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: client-30vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
CMD: client-30vm3 /usr/sbin/lctl get_param -n 		mdd.lustre-MDT0000.lfsck_layout |
		awk '/^status/ { print \$2 }'
Update not seen after 3s: wanted 'completed' got 'partial'
 sanity-lfsck test_17: @@@@@@ FAIL: test_17 failed with 3 

On the MDS console, after LFSCK is started, we see:

13:27:59:LustreError: 30778:0:(lfsck_layout.c:1422:lfsck_layout_master_notify_others()) lustre-MDT0000-osd: fail to notify OST 7 for layout start: rc = -95


 Comments   
Comment by nasf (Inactive) [ 10/Apr/14 ]

I also found that. I will investigate it.

Comment by nasf (Inactive) [ 14/Apr/14 ]

Here is the patch:
http://review.whamcloud.com/#/c/9945/

Comment by nasf (Inactive) [ 14/Apr/14 ]

Increase its priority because it cause many test failure.

Comment by nasf (Inactive) [ 18/Apr/14 ]

The patch has been landed to master.

Generated at Sat Feb 10 01:46:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.