Adapt ko2iblnd to latest RDMA changes (LU-8874)

[LU-8875] Change to new RDMA done callback mechanism Created: 30/Nov/16  Updated: 01/Oct/20

Status: In Progress
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Upstream
Fix Version/s: Upstream

Type: Technical task Priority: Critical
Reporter: Doug Oucharek (Inactive) Assignee: James A Simmons
Resolution: Unresolved Votes: 0
Labels: lnet

Issue Links:
Blocker
Rank (Obsolete): 9223372036854775807

 Description   

The new done callback is implemented in three ways:

1- Direct (no polling)
2- softirq (our callback gets called by the IRQ_POLL_SOFTIRQ mechanism)
3- WorkQueues

It is very tempting to replace the kiblnd_scheduler() and its use of wait queues with the WorkQueue approach. However, this has two major problems:

1- There is no way to bind the WorkQueue to a specific CPT without submitting a change to the RDMA code base. I'm not interested in doing this.
2- It is unclear how the kernel threads for WorkQueues are created/destroyed. If not done efficiently, this will cause a performance degradation to LNet.

So, my recommendation is to bind our current kiblnd_cq_completion() to the softirq callback (with necessary semantic changes). The main loop for the scheduler, kiblnd_scheduler(), will need to be updated to not do any polling of the cq as that will be done for us by the new callback mechanism. All of o2iblnd needs to be scanned for any cq polling and that needs to be turned off.



 Comments   
Comment by Gerrit Updater [ 09/Jan/17 ]

Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/24771
Subject: LU-8875 lnet: Change to new RDMA done callback mechanism
Project: fs/linux-staging
Branch: staging-testing
Current Patch Set: 1
Commit: 85708880bb6ced72a7e5abbe68014864873cc870

Comment by James A Simmons [ 09/Jan/17 ]

You are my hero. Thanks for picking  this up. I just haven't been able to get to it with my other projects going on. Details about creating and submitting a patch upstream can be read at

http://wiki.lustre.org/Upstream_contributing

I will grab the patch and try it out.

Comment by James A Simmons [ 19/Jan/17 ]

The patch you submitted does to much to be allowed to land upstream. You will need to break it up for each individual change. We can rebase this on top of the LU-9026 patch so we can test the changes out. Thanks for doing this.

Comment by Gerrit Updater [ 01/Mar/17 ]

Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/25704
Subject: LU-8875 lnet: Change to new RDMA done callback mechanism
Project: fs/linux-staging
Branch: staging-testing
Current Patch Set: 1
Commit: b444ed14e18f1e8a0f1936e0e5fe71a81093cf67

Comment by Gerrit Updater [ 01/Mar/17 ]

Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/25709
Subject: LU-8875 lnet: Change to new RDMA done callback mechanism
Project: fs/linux-staging
Branch: staging-testing
Current Patch Set: 1
Commit: 3757a29b76cfd1497bd5a3cd9f3d51c07c06670c

Comment by Gerrit Updater [ 10/May/17 ]

Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: https://review.whamcloud.com/27028
Subject: LU-8875 lnet: Change to new RDMA done callback mechanism
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d881b245e690b4bc0d9cb6cd65e6fcf822f2af82

Comment by James A Simmons [ 01/Oct/20 ]

Looking at this work now that we support workqueues bound to CPT sets we should reconsider using a work queue. Any opinons?

Generated at Sat Feb 10 02:21:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.