[LU-8886] LFSCK failed to resume from the last checkpoint Created: 01/Dec/16 Updated: 23/Dec/16 Resolved: 23/Dec/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
It is found that the LFSCK failed just after resuming from the last checkpoint. The log is as following: 00100000:10000000:8.0:1480550565.633225:0:8251:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = -61 00100000:10000000:8.0:1480550565.633247:0:8254:0:(lfsck_engine.c:1781:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant unknown status: rc = 0 00100000:10000000:8.0:1480550565.633252:0:8254:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit 00100000:10000000:25.0:1480550565.633853:0:8253:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit 00100000:10000000:8.0:1480550565.636178:0:8254:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 00100000:10000000:8.0:1480550565.636190:0:8254:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 0 00100000:10000000:25.0:1480550565.650715:0:8253:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = 0 00100000:10000000:25.0:1480550565.650725:0:8253:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = -61 00100000:10000000:24.0:1480550565.650761:0:8251:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_layout post, rc = -61 00100000:10000000:24.0:1480550565.650810:0:8251:0:(lfsck_layout.c:4680:lfsck_layout_master_post()) soaked-MDT0000-osd: layout LFSCK master post done: rc = 0 00100000:10000000:24.0:1480550565.650813:0:8251:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = -61 00100000:10000000:24.0:1480550565.650815:0:8251:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_namespace post, rc = -61 |
| Comments |
| Comment by Gerrit Updater [ 01/Dec/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/24056 |
| Comment by Cliff White (Inactive) [ 02/Dec/16 ] |
|
Tried the patch, does not appear to be working - first time lfsk started the node crashed Dec 2 09:56:17 lola-8 kernel: LustreError: 14454:0:(lfsck_namespace.c:4492:lfsck_namespace_double_scan()) ASSERTION( list_empty(&lad->lad_req_list) ) failed: Dec 2 09:56:17 lola-8 kernel: LustreError: 14454:0:(lfsck_namespace.c:4492:lfsck_namespace_double_scan()) LBUG Dec 2 09:56:17 lola-8 kernel: Pid: 14454, comm: lfsck Dec 2 09:56:17 lola-8 kernel: Dec 2 09:56:17 lola-8 kernel: Call Trace: Dec 2 09:56:17 lola-8 kernel: [<ffffffffa081e875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Dec 2 09:56:17 lola-8 kernel: [<ffffffffa081e9bf>] lbug_with_loc+0x3f/0x90 [libcfs] Dec 2 09:56:17 lola-8 kernel: [<ffffffffa11d1c2e>] lfsck_namespace_double_scan+0xee/0x120 [lfsck] Dec 2 09:56:17 lola-8 kernel: [<ffffffffa11cee40>] lfsck_master_engine+0x590/0x1460 [lfsck] Dec 2 09:56:17 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20 Dec 2 09:56:17 lola-8 kernel: [<ffffffffa11ce8b0>] ? lfsck_master_engine+0x0/0x1460 [lfsck] Dec 2 09:56:17 lola-8 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0 Dec 2 09:56:17 lola-8 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Dec 2 09:56:17 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Dec 2 09:56:17 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 |
| Comment by nasf (Inactive) [ 03/Dec/16 ] |
|
The new failure looks like |
| Comment by Cliff White (Inactive) [ 05/Dec/16 ] |
|
No, has that patch landed on master? If not, why not? |
| Comment by James Nunez (Inactive) [ 05/Dec/16 ] |
|
|
| Comment by nasf (Inactive) [ 06/Dec/16 ] |
Cliff, As James mentioned, such patch has already been in master. But I am not sure whether the build your testing on Lola contains such patch or not. |
| Comment by Gerrit Updater [ 23/Dec/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24056/ |