[LU-16261] sanity test_244b: Timeout occurred after 245 minutes, last suite running was sanity Created: 24/Oct/22  Updated: 24/Oct/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Chris Horn <chris.horn@hpe.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3f6fa410-0bcb-4668-9c54-750275ee2044

test_244b failed with the following error:

Timeout occurred after 245 minutes, last suite running was sanity

Client reports stuck threads:

[Sat Oct 22 01:26:58 2022] Lustre: DEBUG MARKER: == sanity test 244b: multi-threaded write with group lock ========================================================== 01:26:59 (1666402019)
[Sat Oct 22 01:30:46 2022] INFO: task multiop:941148 blocked for more than 120 seconds.
[Sat Oct 22 01:30:46 2022]       Tainted: G        W  OE    --------- -  - 4.18.0-348.7.1.el8_5.x86_64 #1
[Sat Oct 22 01:30:46 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Oct 22 01:30:46 2022] task:multiop         state:D stack:    0 pid:941148 ppid:940902 flags:0x00004080
[Sat Oct 22 01:30:46 2022] Call Trace:
[Sat Oct 22 01:30:46 2022]  __schedule+0x2bd/0x760
[Sat Oct 22 01:30:46 2022]  ? vsnprintf+0x297/0x520
[Sat Oct 22 01:30:46 2022]  schedule+0x37/0xa0
[Sat Oct 22 01:30:46 2022]  schedule_preempt_disabled+0xa/0x10
[Sat Oct 22 01:30:46 2022]  __mutex_lock.isra.6+0x2b5/0x4a0
[Sat Oct 22 01:30:46 2022]  ll_layout_refresh+0x19b/0x330 [lustre]
[Sat Oct 22 01:30:46 2022]  vvp_io_init+0x22a/0x370 [lustre]
[Sat Oct 22 01:30:46 2022]  __cl_io_init.isra.14+0x86/0x150 [obdclass]
[Sat Oct 22 01:30:46 2022]  ll_file_io_generic+0x388/0xd70 [lustre]
[Sat Oct 22 01:30:46 2022]  ll_file_write_iter+0x64b/0x8a0 [lustre]
[Sat Oct 22 01:30:46 2022]  new_sync_write+0x112/0x160
[Sat Oct 22 01:30:46 2022]  vfs_write+0xa5/0x1a0
[Sat Oct 22 01:30:46 2022]  ksys_write+0x4f/0xb0
[Sat Oct 22 01:30:46 2022]  do_syscall_64+0x5b/0x1a0
[Sat Oct 22 01:30:46 2022]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[Sat Oct 22 01:30:46 2022] RIP: 0033:0x7fb6fffd0915
[Sat Oct 22 01:30:46 2022] Code: Unable to access opcode bytes at RIP 0x7fb6fffd08eb.
[Sat Oct 22 01:30:46 2022] RSP: 002b:00007ffcc51be938 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[Sat Oct 22 01:30:46 2022] RAX: ffffffffffffffda RBX: 00007ffcc51bfe6e RCX: 00007fb6fffd0915
[Sat Oct 22 01:30:46 2022] RDX: 0000000000000001 RSI: 0000000001280000 RDI: 0000000000000003
[Sat Oct 22 01:30:46 2022] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000003
[Sat Oct 22 01:30:46 2022] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000403348
[Sat Oct 22 01:30:46 2022] R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000604a60
[Sat Oct 22 01:30:46 2022] INFO: task multiop:941150 blocked for more than 120 seconds.
[Sat Oct 22 01:30:46 2022]       Tainted: G        W  OE    --------- -  - 4.18.0-348.7.1.el8_5.x86_64 #1
[Sat Oct 22 01:30:46 2022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Oct 22 01:30:46 2022] task:multiop         state:D stack:    0 pid:941150 ppid:940902 flags:0x00004080
[Sat Oct 22 01:30:46 2022] Call Trace:
[Sat Oct 22 01:30:46 2022]  __schedule+0x2bd/0x760
[Sat Oct 22 01:30:46 2022]  ? lprocfs_counter_add+0xd2/0x140 [obdclass]
[Sat Oct 22 01:30:46 2022]  schedule+0x37/0xa0
[Sat Oct 22 01:30:46 2022]  schedule_preempt_disabled+0xa/0x10
[Sat Oct 22 01:30:46 2022]  __mutex_lock.isra.6+0x2b5/0x4a0
[Sat Oct 22 01:30:46 2022]  ll_layout_refresh+0x19b/0x330 [lustre]
[Sat Oct 22 01:30:46 2022]  vvp_io_init+0x22a/0x370 [lustre]
[Sat Oct 22 01:30:46 2022]  __cl_io_init.isra.14+0x86/0x150 [obdclass]
[Sat Oct 22 01:30:46 2022]  cl_get_grouplock+0xd2/0x200 [lustre]
[Sat Oct 22 01:30:46 2022]  ll_get_grouplock+0x378/0x700 [lustre]
[Sat Oct 22 01:30:46 2022]  ll_file_ioctl+0x7f4/0x4480 [lustre]
[Sat Oct 22 01:30:46 2022]  ? __handle_mm_fault+0x4d5/0x820
[Sat Oct 22 01:30:46 2022]  do_vfs_ioctl+0xa4/0x680
[Sat Oct 22 01:30:46 2022]  ? handle_mm_fault+0xbe/0x1e0
[Sat Oct 22 01:30:46 2022]  ? syscall_trace_enter+0x1d3/0x2c0
[Sat Oct 22 01:30:46 2022]  ksys_ioctl+0x60/0x90
[Sat Oct 22 01:30:46 2022]  __x64_sys_ioctl+0x16/0x20
[Sat Oct 22 01:30:46 2022]  do_syscall_64+0x5b/0x1a0
[Sat Oct 22 01:30:46 2022]  entry_SYSCALL_64_after_hwframe+0x65/0xca

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_244b - Timeout occurred after 245 minutes, last suite running was sanity


Generated at Sat Feb 10 03:25:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.