Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
any lustre code, tcp network (o2ib should be affected also).
-
3
-
14005
Description
LNet commit a MD immediately after LNetGet or LNetPut called, but LNet may don't have active communication to destination node. In that case ptlrpc may expire request before LNet will release an TX descriptor, but request can't be freed because lnet hold a reference for a request buffer.
I found it (or similar issue) in 2008y and maxim was tried to fix it in commit
commit f6cd596982ed4380e5547181022ad81e4c6d3512 Author: maxim <maxim> Date: Fri Sep 5 14:58:15 2008 +0000 b=16308 i=isaac i=liang Conf-sanity test_32a couldn't stop ost and mds because it tried to access non-existent peer and tcp connect took quite long before timing out. The patch flushes txs pinned to a peer even if it's still in "connecting" state.
but it fix isn't complete and LNet still don't able to release TX.
so we need some support on ptlrpc layer to solve it issue in async unlink way.