[LU-10669] Potential race condition when unlinking MD Created: 14/Feb/18 Updated: 05/Dec/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
There could be a potential race condition that could cause an MD to be unlinked twice. The first unlink will decrement the references counter and the second unlink would cause an assert on the reference counter. two unlink paths can be hit at the same time, when request is expired and transfer finishes. These code paths need to be investigated in more details. |
| Comments |
| Comment by Amir Shehata (Inactive) [ 15/Feb/18 ] |
|
After more investigation, I don't see a possible scenario where the MD reference count can be decremented because of an RPC expiry. When an RPC expires the actual process of cleaning up happens in ptlrpc_expire_one_request(). Two functions are called: ptlrpc_unregister_reply() and ptlrpc_unregister_bulk(). Both of these functions end up calling LNetMDUnlink(). LNetMDUnlink() doesn't free the md unless there are no more reference counters on it. All this processing is done within the resource lock. lnet_finalize() is the only path where the refcount is decremented. Therefore for the md refcount to be < 0, lnet_finalize() must've been called on the same msg/md pair twice. Since this issue has only been seen on OPA, I suspect that there could be a scenario where the OPA driver could be notifying the LND twice of the same message. I'm adding a patch to not assert on this scenario, but rather print some information to verify that we're hitting this case. |
| Comment by Gerrit Updater [ 15/Feb/18 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/31313 |