Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.6.0
-
Toro Autotest
-
3
-
13520
Description
sanity-lfsck test 18d has failed a few times during autotesting. One such failure is at https://maloo.whamcloud.com/test_sets/01266ea2-c171-11e3-9451-52540035b04c .
Expected LFSCK to end with status "completed", but got status "failed". From the client test_log:
Update not seen after 32s: wanted 'completed' got 'failed' sanity-lfsck test_18d: @@@@@@ FAIL: (3) MDS1 is not the expected 'completed'
From the client console, we see:
02:01:39:LustreError: 11-0: lustre-OST0000-osc-ffff88007a5d8c00: Communicating with 10.10.4.191@tcp, operation ldlm_enqueue failed with -12.
The console message from MDS1 has:
02:00:14:LustreError: 17569:0:(lfsck_layout.c:1422:lfsck_layout_master_notify_others()) lustre-MDT0000-osd: fail to notify OST 7 for layout start: rc = -95 02:00:14:LustreError: 17569:0:(lfsck_layout.c:1551:lfsck_layout_master_notify_others()) lustre-MDT0000-osd: fail to notify MDT 7 for layout phase1 done: rc = -95 02:00:14:LustreError: 17569:0:(lfsck_layout.c:1343:lfsck_layout_master_query_others()) lustre-MDT0000-osd: fail to query MDT 7 for layout: rc = -95 02:00:14:LustreError: 17569:0:(lfsck_layout.c:1504:lfsck_layout_master_notify_others()) lustre-MDT0000-osd: fail to notify MDT 7 for layout stop/phase2: rc = -95
This failure looks a lot like LU-4879 where LFSCK completed with "partial" instead of "failed" in this case.