[LU-9971] MR: ABA problem in lnet_discover_peer_locked Created: 11/Sep/17  Updated: 22/Sep/20  Resolved: 10/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.6

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10281 conf-sanity: test_54a hung at lnet_di... Open
is related to LU-13652 [1575337.260035] LNetError: 8719:0:(p... Resolved
is related to LU-12519 sanity-sec test 31 crashes with ASSER... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In lnet_discover_peer_locked() when we drop and re-acquire the net lock there is a possibility that the lpni might've been relinked, opening the window for an ABA problem.



 Comments   
Comment by Olaf Weber [ 11/Sep/17 ]

Not so much an ABA problem as a use-after-free: the lp pointer might point to a different lnet_peer at the same address. Reshuffling the reference count addref and decref on lp a bit we can easily ensure that even if lpni is now linked to a different peer, at least that peer must have a different address.

Comment by Gerrit Updater [ 12/Sep/17 ]

Olaf Weber (olaf.weber@hpe.com) uploaded a new patch: https://review.whamcloud.com/28944
Subject: LU-9971 lnet: use after free in lnet_discover_peer_locked()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0f1814c07e9e9107dc6b92584ab9ed999621fee6

Comment by Gerrit Updater [ 07/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28944/
Subject: LU-9971 lnet: use after free in lnet_discover_peer_locked()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2b5b551b15d96588f8f309b5a08c11cab203efeb

Comment by Andriy Skulysh [ 08/Jul/19 ]

The patch has a defect caused by landing of LU-11299.

Comment by Gerrit Updater [ 08/Jul/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35446
Subject: LU-9971 lnet: fix peer ref counting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 086739cf3d0424da664cd11c7798f7c93153b95b

Comment by Gerrit Updater [ 10/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35446/
Subject: LU-9971 lnet: fix peer ref counting
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dbcddb4824f08153f145327e2bfe1bf4a2becc4f

Comment by Peter Jones [ 10/Jul/19 ]

Both patches landed for 2.13

Comment by Gerrit Updater [ 10/Jun/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38891
Subject: LU-9971 lnet: use after free in lnet_discover_peer_locked()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: da45d5fe3b64f38391f25511eccbacf250bac28b

Comment by Gerrit Updater [ 10/Jun/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38892
Subject: LU-9971 lnet: fix peer ref counting
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 9312614ca535a6b7c4d4c81756237a9192322460

Comment by Gerrit Updater [ 01/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38891/
Subject: LU-9971 lnet: use after free in lnet_discover_peer_locked()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 3f3c839f97298f6e65e2e053fc2ece59c39931dc

Generated at Sat Feb 10 02:30:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.