Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.6.0
-
Toro Autotest
-
3
-
13501
Description
Sanity-lfsck test 17 has failed with error message "test_17 failed with 3" a few times over the past month. Results from one failure are at https://maloo.whamcloud.com/sub_tests/2c265c8e-c040-11e3-8c28-52540035b04c .
The test 17 creates an MDT-OST inconsistency using fail_loc=0x1614 (OBD_FAIL_LFSCK_MULTIPLE_REF) meaning two MDT objects reference the same OST object. LFSCK layout is called and is waiting for a status of "complete", but LFSCK finished with a status of "partial". From the client test_log:
Started LFSCK on the device lustre-MDT0000: layout. CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | awk '/^status/ { print \$2 }' CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | awk '/^status/ { print \$2 }' Waiting 3 secs for update CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | awk '/^status/ { print \$2 }' CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | awk '/^status/ { print \$2 }' CMD: client-30vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout | awk '/^status/ { print \$2 }' Update not seen after 3s: wanted 'completed' got 'partial' sanity-lfsck test_17: @@@@@@ FAIL: test_17 failed with 3
On the MDS console, after LFSCK is started, we see:
13:27:59:LustreError: 30778:0:(lfsck_layout.c:1422:lfsck_layout_master_notify_others()) lustre-MDT0000-osd: fail to notify OST 7 for layout start: rc = -95
Attachments
Issue Links
- is related to
-
LU-5301 Test failure sanity-lfsck test_17: (3) unexpected status
- Closed