[LU-959] kuc channels not reestablished after MDS crash Created: 04/Jan/12 Updated: 11/Apr/12 Resolved: 11/Apr/12 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0, Lustre 2.1.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Thomas LEIBOVICI - CEA (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6499 |
| Description |
|
It seams the kuc channels are not reestablished after a MDS crash. It would probably need an action in mdc_import_event() to reregister kuc listeners, in mdc_import_event(): case IMP_EVENT_ACTIVE: { |
| Comments |
| Comment by Peter Jones [ 04/Jan/12 ] |
|
Niu Could you please look into this one Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 05/Jan/12 ] |
|
Hi, Thomas I don't quite follow your description of this ticket, and I didn't find mdc_kuc_reregister() neither. Could you elaborate this ticket a little bit more or post the patch on gerrit for review? Thank you. |
| Comment by Thomas LEIBOVICI - CEA (Inactive) [ 05/Jan/12 ] |
|
OK, I'll try to explain the issue with more details. To receive MDT changelogs from a client, llapi_changelog_start() is called by the user space program (like lfs): The problem is there is no recovery mechanism of KUC channels when the MDS restarts: This is what I suggested: a mdc_kuc_reregister() should be implemented to be called in mdc_import_event(), Do you have a better understanding of this issue? |
| Comment by Niu Yawei (Inactive) [ 05/Jan/12 ] |
|
Thanks a lot for the details, Thomas. I think I have much better understanding now, but I still don't see why client was blocked in llapi_changelog_recv() after MDS restarted: mdc_ioc_changelog_send() just use the llog APIs to read changelog on MDS then put it in the pipe, so when MDS restarts, no matter if the client llog process procedure break earlier for an RPC error, the CL_EOF will always be written, and llapi_changelog_recv() should receive this EOF record and break reading. Do you have the debug log and stack trace when the process stuck in llapi_changelog_recv()? |
| Comment by Thomas LEIBOVICI - CEA (Inactive) [ 05/Jan/12 ] |
|
Right, I see what you mean. Maybe my initial understanding of the problem is wrong. |
| Comment by Niu Yawei (Inactive) [ 10/Apr/12 ] |
|
Thomas, is it still relevant? can we close it? |
| Comment by Thomas LEIBOVICI - CEA (Inactive) [ 11/Apr/12 ] |
|
OK, let's close it. I'll reopen it in case of new occurrence. |
| Comment by Niu Yawei (Inactive) [ 11/Apr/12 ] |
|
not reproduced, close it for now. |