Details
-
Improvement
-
Resolution: Fixed
-
Major
-
None
-
MOFED-5.2-2.2.0.0
-
9223372036854775807
Description
Hi,
I'm testing the Lustre master branch with MOFED-5.2-2.2.0.0. I get the following error at mounting Lustre on the client:
[Thu Mar 4 11:15:48 2021] INFO: task kworker/u8:2:10042 blocked for more than 120 seconds.
[Thu Mar 4 11:15:48 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Mar 4 11:15:48 2021] kworker/u8:2 D ffff8895368e0000 0 10042 2 0x00000080
[Thu Mar 4 11:15:48 2021] Workqueue: rdma_cm cma_work_handler [rdma_cm]
[Thu Mar 4 11:15:48 2021] Call Trace:
[Thu Mar 4 11:15:48 2021] [<ffffffff86786ca9>] schedule_preempt_disabled+0x29/0x70
[Thu Mar 4 11:15:48 2021] [<ffffffff86784c37>] __mutex_lock_slowpath+0xc7/0x1d0
[Thu Mar 4 11:15:48 2021] [<ffffffff8678400f>] mutex_lock+0x1f/0x2f
[Thu Mar 4 11:15:48 2021] [<ffffffffc054e5d3>] rdma_connect+0x23/0x50 [rdma_cm]
[Thu Mar 4 11:15:48 2021] [<ffffffffc0971105>] kiblnd_cm_callback+0x1575/0x23d0 [ko2iblnd]
[Thu Mar 4 11:15:48 2021] [<ffffffffc054ebd1>] cma_work_handler+0xa1/0xe0 [rdma_cm]
[Thu Mar 4 11:15:48 2021] [<ffffffff860be6bf>] process_one_work+0x17f/0x440
[Thu Mar 4 11:15:48 2021] [<ffffffff860bf7d6>] worker_thread+0x126/0x3c0
[Thu Mar 4 11:15:48 2021] [<ffffffff860bf6b0>] ? manage_workers.isra.26+0x2a0/0x2a0
[Thu Mar 4 11:15:48 2021] [<ffffffff860c6691>] kthread+0xd1/0xe0
[Thu Mar 4 11:15:48 2021] [<ffffffff860c65c0>] ? insert_kthread_work+0x40/0x40
[Thu Mar 4 11:15:48 2021] [<ffffffff86792d37>] ret_from_fork_nospec_begin+0x21/0x21
[Thu Mar 4 11:15:48 2021] [<ffffffff860c65c0>] ? insert_kthread_work+0x40/0x40
I investigated the issue and found out the issue is related to the change that became to MOFED from the upstream kernel 5.10:
https://www.spinics.net/lists/linux-rdma/msg96986.html
After the patch, it is not allowed to call rdma_connect() in RDMA_CM_EVENT_ROUTE_RESOLVED handler; rdma_connect_locked() must be used instead.
I'm testing a patch for the issue. I'm going to push it for review soon.
Attachments
Issue Links
- is related to
-
LU-14588 LNet: make config script aware of the ofed symbols
- Resolved