Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Chris Horn <chris.horn@hpe.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/01cd6eae-ac03-48f6-980d-3977224dfaad
test_226 failed with the following error:
Timeout occurred after 112 minutes, last suite running was sanity-lnet
LNetNIFini() and discovery thread appear to have hit a deadlock:
[Thu Mar 3 19:42:41 2022] INFO: task lnet_discovery:424118 blocked for more than 120 seconds. [Thu Mar 3 19:42:41 2022] Tainted: G OE --------- - - 4.18.0-240.22.1.el8_lustre.x86_64 #1 [Thu Mar 3 19:42:41 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Mar 3 19:42:41 2022] lnet_discovery D 0 424118 2 0x80004080 [Thu Mar 3 19:42:41 2022] Call Trace: [Thu Mar 3 19:42:41 2022] __schedule+0x2c4/0x700 [Thu Mar 3 19:42:41 2022] schedule+0x38/0xa0 [Thu Mar 3 19:42:41 2022] schedule_preempt_disabled+0xa/0x10 [Thu Mar 3 19:42:41 2022] __mutex_lock.isra.5+0x2d0/0x4a0 [Thu Mar 3 19:42:41 2022] lnet_peer_discovery+0x929/0x16c0 [lnet] [Thu Mar 3 19:42:41 2022] ? finish_wait+0x80/0x80 [Thu Mar 3 19:42:41 2022] ? lnet_peer_merge_data+0xff0/0xff0 [lnet] [Thu Mar 3 19:42:41 2022] kthread+0x112/0x130 [Thu Mar 3 19:42:41 2022] ? kthread_flush_work_fn+0x10/0x10 [Thu Mar 3 19:42:41 2022] ret_from_fork+0x35/0x40 [Thu Mar 3 19:42:41 2022] INFO: task lnetctl:428295 blocked for more than 120 seconds. [Thu Mar 3 19:42:41 2022] Tainted: G OE --------- - - 4.18.0-240.22.1.el8_lustre.x86_64 #1 [Thu Mar 3 19:42:41 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Mar 3 19:42:41 2022] lnetctl D 0 428295 428283 0x00004080 [Thu Mar 3 19:42:41 2022] Call Trace: [Thu Mar 3 19:42:41 2022] __schedule+0x2c4/0x700 [Thu Mar 3 19:42:41 2022] ? __wake_up_common_lock+0x89/0xc0 [Thu Mar 3 19:42:41 2022] schedule+0x38/0xa0 [Thu Mar 3 19:42:41 2022] lnet_peer_discovery_stop+0x112/0x260 [lnet] [Thu Mar 3 19:42:41 2022] ? finish_wait+0x80/0x80 [Thu Mar 3 19:42:41 2022] LNetNIFini+0x5e/0x100 [lnet] [Thu Mar 3 19:42:41 2022] lnet_ioctl+0x220/0x260 [lnet] [Thu Mar 3 19:42:41 2022] notifier_call_chain+0x47/0x70 [Thu Mar 3 19:42:41 2022] blocking_notifier_call_chain+0x3e/0x60 [Thu Mar 3 19:42:41 2022] libcfs_psdev_ioctl+0x346/0x590 [libcfs] [Thu Mar 3 19:42:41 2022] do_vfs_ioctl+0xa4/0x640 [Thu Mar 3 19:42:41 2022] ? syscall_trace_enter+0x1d3/0x2c0 [Thu Mar 3 19:42:41 2022] ksys_ioctl+0x60/0x90 [Thu Mar 3 19:42:41 2022] __x64_sys_ioctl+0x16/0x20 [Thu Mar 3 19:42:41 2022] do_syscall_64+0x5b/0x1a0 [Thu Mar 3 19:42:41 2022] entry_SYSCALL_64_after_hwframe+0x65/0xca
LNetNIFini() has the ln_api_mutex and is waiting for the discovery thread to stop. The discovery thread needs the mutex to progress.
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lnet test_226 - Timeout occurred after 112 minutes, last suite running was sanity-lnet
Attachments
Issue Links
- is duplicated by
-
LU-15650 sanity-lnet: test_102 Timeout occurred, last suite running was sanity-lnet
- Resolved
-
LU-12148 conf-sanity test_64: timed out
- Closed
-
LU-13218 conf-sanity test 98 hangs in socknal_sd00_01: lnet_nid2peerni_locked
- Closed
- is related to
-
LU-15705 sanity-lnet test_103: hang in lnetctl lnet unconfigure
- Resolved