[LU-5073] lustre don't able to unload modules in conf-sanity 31. Created: 16/May/14  Updated: 20/Jul/14  Resolved: 11/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alexey Lyashkov Assignee: Cliff White (Inactive)
Resolution: Fixed Votes: 0
Labels: patch
Environment:

any lustre code, tcp network (o2ib should be affected also).


Issue Links:
Related
is related to LU-5259 request gets stuck in UNREGISTERING p... Resolved
is related to LU-5341 Intermittent hangs waiting for RPCs i... Closed
Severity: 3
Rank (Obsolete): 14005

 Description   

LNet commit a MD immediately after LNetGet or LNetPut called, but LNet may don't have active communication to destination node. In that case ptlrpc may expire request before LNet will release an TX descriptor, but request can't be freed because lnet hold a reference for a request buffer.
I found it (or similar issue) in 2008y and maxim was tried to fix it in commit

commit f6cd596982ed4380e5547181022ad81e4c6d3512
Author: maxim <maxim>
Date:   Fri Sep 5 14:58:15 2008 +0000

    b=16308
    i=isaac
    i=liang
    Conf-sanity test_32a couldn't stop ost and mds because it
    tried to access non-existent peer and tcp connect took
    quite long before timing out. The patch flushes txs pinned
    to a peer even if it's still in "connecting" state.

but it fix isn't complete and LNet still don't able to release TX.
so we need some support on ptlrpc layer to solve it issue in async unlink way.



 Comments   
Comment by Alexey Lyashkov [ 16/May/14 ]

fix http://review.whamcloud.com/10353

xyratex bug MRP-1848

Comment by Cliff White (Inactive) [ 19/May/14 ]

Thank you, will monitor

Comment by Andreas Dilger [ 26/Jun/14 ]

Also needs http://review.whamcloud.com/10846 to fix a related problem.

Comment by Cliff White (Inactive) [ 11/Jul/14 ]

Both patches have been merged, I am closing this issue. Please reopen if there are any concerns

Generated at Sat Feb 10 01:48:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.