[LU-4879] sanity-lfsck test 17 failure: test_17 failed with 3 Created: 10/Apr/14 Updated: 07/Jul/14 Resolved: 18/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | James Nunez (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lfsck | ||
| Environment: |
Toro Autotest |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13501 | ||||||||
| Description |
|
Sanity-lfsck test 17 has failed with error message "test_17 failed with 3" a few times over the past month. Results from one failure are at https://maloo.whamcloud.com/sub_tests/2c265c8e-c040-11e3-8c28-52540035b04c . The test 17 creates an MDT-OST inconsistency using fail_loc=0x1614 (OBD_FAIL_LFSCK_MULTIPLE_REF) meaning two MDT objects reference the same OST object. LFSCK layout is called and is waiting for a status of "complete", but LFSCK finished with a status of "partial". From the client test_log: Started LFSCK on the device lustre-MDT0000: layout.
CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
awk '/^status/ { print \$2 }'
Waiting 3 secs for update
CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
awk '/^status/ { print \$2 }'
Update not seen after 3s: wanted 'completed' got 'partial'
sanity-lfsck test_17: @@@@@@ FAIL: test_17 failed with 3
On the MDS console, after LFSCK is started, we see: 13:27:59:LustreError: 30778:0:(lfsck_layout.c:1422:lfsck_layout_master_notify_others()) lustre-MDT0000-osd: fail to notify OST 7 for layout start: rc = -95 |
| Comments |
| Comment by nasf (Inactive) [ 10/Apr/14 ] |
|
I also found that. I will investigate it. |
| Comment by nasf (Inactive) [ 14/Apr/14 ] |
|
Here is the patch: |
| Comment by nasf (Inactive) [ 14/Apr/14 ] |
|
Increase its priority because it cause many test failure. |
| Comment by nasf (Inactive) [ 18/Apr/14 ] |
|
The patch has been landed to master. |