[LU-7115] fld_client_rpc() may run into deadloop Created: 08/Sep/15 Updated: 24/Jan/17 Resolved: 24/Jan/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Niu Yawei (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
In fld_client_rpc(): if (rc != 0) { if (imp->imp_state != LUSTRE_IMP_CLOSED && !imp->imp_deactive) { /* Since LWP is not replayable, so it will keep * trying unless umount happens, otherwise it would * cause unecessary failure of the application. */ ptlrpc_req_finished(req); rc = 0; goto again; } GOTO(out_req, rc); } If the connection is broken, this function will run into an dead loop. I think we'd reshape the function somehow to make it interruptable, otherwise, if connection never being established, caller will stuck in this function forever. Seems fld_update_from_controller() has similar problem. |
| Comments |
| Comment by Peter Jones [ 08/Sep/15 ] |
|
Yang Sheng Could you please look into this issue? Thanks Peter |
| Comment by Di Wang [ 08/Sep/15 ] |
|
I thought the point here is to not fail for LWP unless it is being umounted or deactive, not sure how interruptible can help here, since it is the connection between MDTs. I may miss sth? But we do need check if the import is for LWP, i.e. only do this "try again" for LWP connection, not for MDC or other import. |
| Comment by Niu Yawei (Inactive) [ 09/Sep/15 ] |
|
I mean if the connection is broken, then it'll run into a deadloop, then there is no way to terminate the thread which calls this function, and we won't able to shutdown the MDT/OST at the end. I think it's not a serious problem, and looks not easy to fix. |
| Comment by Di Wang [ 09/Sep/15 ] |
|
Oh, it can break the loop, see if (imp->imp_state != LUSTRE_IMP_CLOSED && !imp->imp_deactive) {
/* Since LWP is not replayable, so it will keep
* trying unless umount happens, otherwise it would
* cause unecessary failure of the application. */
ptlrpc_req_finished(req);
rc = 0;
goto again;
}
It will check the import state here. And also the point right now is that we do not break the connection between MDTs, until umount and admin step in, so this implementation actually fit in here. |
| Comment by Andreas Dilger [ 15/Sep/15 ] |
|
Is this really a bug or could this be closed? |
| Comment by Niu Yawei (Inactive) [ 16/Sep/15 ] |
|
As Di mentioned, the thread can be terminated by umount target, I think that's fine to me, we can just leave as it is. And this function will be called by client as well, we may need to check if it's called by client (not from LWP but from mdc device), and break the loop for non-LWP device case. |
| Comment by Gerrit Updater [ 04/Nov/15 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/17041 |
| Comment by Gerrit Updater [ 24/Jan/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/17041/ |
| Comment by Peter Jones [ 24/Jan/17 ] |
|
Landed for 2.10 |