[LU-5639] Message is hashed to invalid match-table of LNet request portal Created: 18/Sep/14  Updated: 06/Oct/18  Resolved: 31/Oct/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Blocker
Reporter: Liang Zhen (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-5962 OSS crash on lnet_ptl_match_md() due ... Resolved
Related
Severity: 3
Rank (Obsolete): 15789

 Description   

When new message arrived at LNet request portal, if there is no receiving buffer (ME/MD) in match-table of the partition that message arrived in, either because service does not post any buffer in that partition, or because server is under memory pressure, LNet will increase a integer (rotor) then hash this message to another match-table within a different partition which has receiving buffers. This rotor should be unsigned when hashing, otherwise we will refer to illegal address when the integer is overflowed.

This issue should be rare because Lustre services will always post enough request buffers



 Comments   
Comment by Liang Zhen (Inactive) [ 18/Sep/14 ]

patch is here: http://review.whamcloud.com/#/c/11936/

Comment by Liang Zhen (Inactive) [ 31/Oct/14 ]

patch has landed to 2.7, but we also need this for 2.5:

http://review.whamcloud.com/#/c/11999/

Comment by Peter Jones [ 31/Oct/14 ]

Landed for 2.7

Comment by Gerrit Updater [ 01/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11999/
Subject: LU-5639 lnet: portal spreading rotor should be unsigned
Project: fs/lustre-release
Branch: b2_5
Current Patch Set:
Commit: 14e2be44141bcec05202a83493eb34fe602dadc9

Generated at Sat Feb 10 01:53:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.