[LU-14488] Support rdma_connect_locked() Created: 04/Mar/21 Updated: 27/Apr/21 Resolved: 09/Mar/21 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.12.7, Lustre 2.15.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Sergey Gorenko | Assignee: | Sergey Gorenko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LTS12 | ||
| Environment: |
MOFED-5.2-2.2.0.0 |
||
| Issue Links: |
|
||||||||
| Epic/Theme: | lnet | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Hi, I'm testing the Lustre master branch with MOFED-5.2-2.2.0.0. I get the following error at mounting Lustre on the client:
[Thu Mar 4 11:15:48 2021] INFO: task kworker/u8:2:10042 blocked for more than 120 seconds. [Thu Mar 4 11:15:48 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Mar 4 11:15:48 2021] kworker/u8:2 D ffff8895368e0000 0 10042 2 0x00000080 [Thu Mar 4 11:15:48 2021] Workqueue: rdma_cm cma_work_handler [rdma_cm] [Thu Mar 4 11:15:48 2021] Call Trace: [Thu Mar 4 11:15:48 2021] [<ffffffff86786ca9>] schedule_preempt_disabled+0x29/0x70 [Thu Mar 4 11:15:48 2021] [<ffffffff86784c37>] __mutex_lock_slowpath+0xc7/0x1d0 [Thu Mar 4 11:15:48 2021] [<ffffffff8678400f>] mutex_lock+0x1f/0x2f [Thu Mar 4 11:15:48 2021] [<ffffffffc054e5d3>] rdma_connect+0x23/0x50 [rdma_cm] [Thu Mar 4 11:15:48 2021] [<ffffffffc0971105>] kiblnd_cm_callback+0x1575/0x23d0 [ko2iblnd] [Thu Mar 4 11:15:48 2021] [<ffffffffc054ebd1>] cma_work_handler+0xa1/0xe0 [rdma_cm] [Thu Mar 4 11:15:48 2021] [<ffffffff860be6bf>] process_one_work+0x17f/0x440 [Thu Mar 4 11:15:48 2021] [<ffffffff860bf7d6>] worker_thread+0x126/0x3c0 [Thu Mar 4 11:15:48 2021] [<ffffffff860bf6b0>] ? manage_workers.isra.26+0x2a0/0x2a0 [Thu Mar 4 11:15:48 2021] [<ffffffff860c6691>] kthread+0xd1/0xe0 [Thu Mar 4 11:15:48 2021] [<ffffffff860c65c0>] ? insert_kthread_work+0x40/0x40 [Thu Mar 4 11:15:48 2021] [<ffffffff86792d37>] ret_from_fork_nospec_begin+0x21/0x21 [Thu Mar 4 11:15:48 2021] [<ffffffff860c65c0>] ? insert_kthread_work+0x40/0x40
I investigated the issue and found out the issue is related to the change that became to MOFED from the upstream kernel 5.10: https://www.spinics.net/lists/linux-rdma/msg96986.html
After the patch, it is not allowed to call rdma_connect() in RDMA_CM_EVENT_ROUTE_RESOLVED handler; rdma_connect_locked() must be used instead. I'm testing a patch for the issue. I'm going to push it for review soon. |
| Comments |
| Comment by Gerrit Updater [ 04/Mar/21 ] |
|
Sergey Gorenko (sergeygo@nvidia.com) uploaded a new patch: https://review.whamcloud.com/41887 |
| Comment by Gerrit Updater [ 09/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41887/ |
| Comment by Peter Jones [ 09/Mar/21 ] |
|
Landed for 2.15 |
| Comment by Gerrit Updater [ 10/Mar/21 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41977 |
| Comment by Gerrit Updater [ 22/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41977/ |