Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.12.4
-
None
-
3
-
9223372036854775807
Description
conf-sanity test_98 hangs for review-dne-zfs-part-3 for the patch https://review.whamcloud.com/37445/ for LU-12593. We've seen this test hang with the errors below once only so far.
Looking at the hang at https://testing.whamcloud.com/test_sets/469de552-4869-11ea-b58e-52540065bddc, in the client1 console log we see LNET issues
[19629.856282] Lustre: Unmounted lustre-client [19659.244701] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' || true [19659.384715] Key type lgssc unregistered [19661.272912] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19665.275764] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19673.280446] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19689.288850] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19721.304636] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19785.335214] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19809.073214] INFO: task socknal_sd00_01:4386 blocked for more than 120 seconds. [19809.074558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19809.075866] socknal_sd00_01 D ffff8e17ffd1ac80 0 4386 2 0x00000080 [19809.077194] Call Trace: [19809.077665] [<ffffffffb0581929>] schedule_preempt_disabled+0x29/0x70 [19809.078735] [<ffffffffb057f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [19809.079762] [<ffffffffb057ec8f>] mutex_lock+0x1f/0x2f [19809.080741] [<ffffffffc0891e31>] lnet_nid2peerni_locked+0x71/0x150 [lnet] [19809.082031] [<ffffffffc087ed01>] lnet_parse+0x791/0x11f0 [lnet] [19809.083044] [<ffffffffc0916838>] ksocknal_process_receive+0x498/0xde0 [ksocklnd] [19809.084277] [<ffffffffc0917626>] ksocknal_scheduler+0x206/0xd50 [ksocklnd] [19809.085499] [<ffffffffafec72e0>] ? wake_up_atomic_t+0x30/0x30 [19809.086567] [<ffffffffc0917420>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd] [19809.087734] [<ffffffffafec61f1>] kthread+0xd1/0xe0 [19809.088612] [<ffffffffafec6120>] ? insert_kthread_work+0x40/0x40 [19809.089668] [<ffffffffb058dd37>] ret_from_fork_nospec_begin+0x21/0x21 [19809.090850] [<ffffffffafec6120>] ? insert_kthread_work+0x40/0x40 [19913.395403] LNet: 5951:0:(socklnd.c:2550:ksocknal_shutdown()) waiting for 1 peers to disconnect [19929.146713] INFO: task socknal_sd00_01:4386 blocked for more than 120 seconds.