Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Chris Horn <chris.horn@hpe.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3f6fa410-0bcb-4668-9c54-750275ee2044
test_244b failed with the following error:
Timeout occurred after 245 minutes, last suite running was sanity
Client reports stuck threads:
[Sat Oct 22 01:26:58 2022] Lustre: DEBUG MARKER: == sanity test 244b: multi-threaded write with group lock ========================================================== 01:26:59 (1666402019) [Sat Oct 22 01:30:46 2022] INFO: task multiop:941148 blocked for more than 120 seconds. [Sat Oct 22 01:30:46 2022] Tainted: G W OE --------- - - 4.18.0-348.7.1.el8_5.x86_64 #1 [Sat Oct 22 01:30:46 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Oct 22 01:30:46 2022] task:multiop state:D stack: 0 pid:941148 ppid:940902 flags:0x00004080 [Sat Oct 22 01:30:46 2022] Call Trace: [Sat Oct 22 01:30:46 2022] __schedule+0x2bd/0x760 [Sat Oct 22 01:30:46 2022] ? vsnprintf+0x297/0x520 [Sat Oct 22 01:30:46 2022] schedule+0x37/0xa0 [Sat Oct 22 01:30:46 2022] schedule_preempt_disabled+0xa/0x10 [Sat Oct 22 01:30:46 2022] __mutex_lock.isra.6+0x2b5/0x4a0 [Sat Oct 22 01:30:46 2022] ll_layout_refresh+0x19b/0x330 [lustre] [Sat Oct 22 01:30:46 2022] vvp_io_init+0x22a/0x370 [lustre] [Sat Oct 22 01:30:46 2022] __cl_io_init.isra.14+0x86/0x150 [obdclass] [Sat Oct 22 01:30:46 2022] ll_file_io_generic+0x388/0xd70 [lustre] [Sat Oct 22 01:30:46 2022] ll_file_write_iter+0x64b/0x8a0 [lustre] [Sat Oct 22 01:30:46 2022] new_sync_write+0x112/0x160 [Sat Oct 22 01:30:46 2022] vfs_write+0xa5/0x1a0 [Sat Oct 22 01:30:46 2022] ksys_write+0x4f/0xb0 [Sat Oct 22 01:30:46 2022] do_syscall_64+0x5b/0x1a0 [Sat Oct 22 01:30:46 2022] entry_SYSCALL_64_after_hwframe+0x65/0xca [Sat Oct 22 01:30:46 2022] RIP: 0033:0x7fb6fffd0915 [Sat Oct 22 01:30:46 2022] Code: Unable to access opcode bytes at RIP 0x7fb6fffd08eb. [Sat Oct 22 01:30:46 2022] RSP: 002b:00007ffcc51be938 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [Sat Oct 22 01:30:46 2022] RAX: ffffffffffffffda RBX: 00007ffcc51bfe6e RCX: 00007fb6fffd0915 [Sat Oct 22 01:30:46 2022] RDX: 0000000000000001 RSI: 0000000001280000 RDI: 0000000000000003 [Sat Oct 22 01:30:46 2022] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000003 [Sat Oct 22 01:30:46 2022] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000403348 [Sat Oct 22 01:30:46 2022] R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000604a60 [Sat Oct 22 01:30:46 2022] INFO: task multiop:941150 blocked for more than 120 seconds. [Sat Oct 22 01:30:46 2022] Tainted: G W OE --------- - - 4.18.0-348.7.1.el8_5.x86_64 #1 [Sat Oct 22 01:30:46 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Oct 22 01:30:46 2022] task:multiop state:D stack: 0 pid:941150 ppid:940902 flags:0x00004080 [Sat Oct 22 01:30:46 2022] Call Trace: [Sat Oct 22 01:30:46 2022] __schedule+0x2bd/0x760 [Sat Oct 22 01:30:46 2022] ? lprocfs_counter_add+0xd2/0x140 [obdclass] [Sat Oct 22 01:30:46 2022] schedule+0x37/0xa0 [Sat Oct 22 01:30:46 2022] schedule_preempt_disabled+0xa/0x10 [Sat Oct 22 01:30:46 2022] __mutex_lock.isra.6+0x2b5/0x4a0 [Sat Oct 22 01:30:46 2022] ll_layout_refresh+0x19b/0x330 [lustre] [Sat Oct 22 01:30:46 2022] vvp_io_init+0x22a/0x370 [lustre] [Sat Oct 22 01:30:46 2022] __cl_io_init.isra.14+0x86/0x150 [obdclass] [Sat Oct 22 01:30:46 2022] cl_get_grouplock+0xd2/0x200 [lustre] [Sat Oct 22 01:30:46 2022] ll_get_grouplock+0x378/0x700 [lustre] [Sat Oct 22 01:30:46 2022] ll_file_ioctl+0x7f4/0x4480 [lustre] [Sat Oct 22 01:30:46 2022] ? __handle_mm_fault+0x4d5/0x820 [Sat Oct 22 01:30:46 2022] do_vfs_ioctl+0xa4/0x680 [Sat Oct 22 01:30:46 2022] ? handle_mm_fault+0xbe/0x1e0 [Sat Oct 22 01:30:46 2022] ? syscall_trace_enter+0x1d3/0x2c0 [Sat Oct 22 01:30:46 2022] ksys_ioctl+0x60/0x90 [Sat Oct 22 01:30:46 2022] __x64_sys_ioctl+0x16/0x20 [Sat Oct 22 01:30:46 2022] do_syscall_64+0x5b/0x1a0 [Sat Oct 22 01:30:46 2022] entry_SYSCALL_64_after_hwframe+0x65/0xca
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_244b - Timeout occurred after 245 minutes, last suite running was sanity