[LU-11064] o2iblnd fast reg gaps case is determined incompletely Created: 30/May/18  Updated: 09/Jul/19  Resolved: 18/Jul/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Major
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Duplicate
is duplicated by LU-11308 LustreError: 2742:0:(events.c:199:cl... Resolved
is duplicated by LU-11105 Seeing "Using FastReg with no GAPS su... Resolved
Related
is related to LU-11350 Review of NASA LNet patches Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We're allowed to start at a non-aligned page offset in the first fragment and end at a non-aligned page offset in the last fragment.

Currently the first fragment is not considered. So it's possible to start on a non-aligned page boundary and consider that there is a gap in the transmit buffer, which is not correct.



 Comments   
Comment by Gerrit Updater [ 30/May/18 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/32586
Subject: LU-11064 lnd: determine gaps correctly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d2105fede63b081a61fb8c77683d79d7991693f5

Comment by Gerrit Updater [ 18/Jul/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32586/
Subject: LU-11064 lnd: determine gaps correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e40ea6fd4494b86d26a9d459f7bf620d816af4e1

Comment by Peter Jones [ 18/Jul/18 ]

Landed for 2.12

Comment by Mahmoud Hanafi [ 05/Sep/18 ]

Does this LU impact 2.10.x ?

Comment by Amir Shehata (Inactive) [ 06/Sep/18 ]

Plane 2.10.x doesn't include any of the GAPs changes. These were landed in 2.11.

However, if you applied these patches on your own branch based on 2.10.x, then you'll need that patch as well.

patches I'm talking about are:

LU-9943 lnd: correct WR fast reg accounting
LU-9810 lnd: use less CQ entries for each connection
LU-10129 lnd: rework map_on_demand behavior
LU-10129 lnd: set device capabilities
Comment by Jay Lan (Inactive) [ 06/Sep/18 ]

We have cherry-picked "LU-9810 lnd: use less CQ entries for each connection". Not the other three.

 

Comment by Mahmoud Hanafi [ 06/Sep/18 ]

FYI, We cherry-picked LU-9810 into 2.11.

Comment by Peter Jones [ 06/Sep/18 ]

Mahmoud

My recommendation is to open a ticket to ask whether it is ok before you include any patch into your distribution - sometimes there are non-obvious pre-requisites.

Peter

Comment by Amir Shehata (Inactive) [ 06/Sep/18 ]

I think 9810 should be ok. There's just been a lot of rework in that area regarding map_on_demand and usage of gaps in 2.11. So I wouldn't recommend pulling that in to a tree based on 2.10.x, unless you pull in the entire set of changes.

Based on LU-11308, I think you guys might be using trees based on 2.10.X as well as 2.11. That's why I was making sure you guys are aware of the dependencies there.

Comment by Jay Lan (Inactive) [ 06/Sep/18 ]

Well, we actually have
LU-9810: lnd: use less CQ entries for each connection, and
LU-9810 lnet: fix build with M-OFED 4.1
in our 2.10.3 and 2.10.5 branches.

LU-9810 was included in Whamcloud's 2.11.0 release. I cherry-picked LU-11064 into our 2.11.0.

Mahmoud has decided to run 2.10.x. So what do you recommend us to do? I do not recall why I cherry-picked LU-9810, do you Mahmoud? Probably trying to fix a problem we encountered?

Comment by Mahmoud Hanafi [ 06/Sep/18 ]

LU-9810 added mofed4.x support. I don't think we cherry-picked that one.

Comment by Amir Shehata (Inactive) [ 06/Sep/18 ]

Would it be possible to point me to the 2.10.X git repo that you guys are using now, so I can take a look at the commits. This way I'm clear on which patches you have.

Comment by Peter Jones [ 06/Sep/18 ]

How about we transfer this discussion to a new ticket so we can identify it as a NASA support issue?

Comment by Jay Lan (Inactive) [ 06/Sep/18 ]

Admir,
https://github.com/jlan/lustre-nas

 

Generated at Sat Feb 10 02:40:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.