Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for emoly <emoly@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/08f63f4f-5ac0-4fae-9b9a-f63dfe10d72a
test_27ab failed with the following error:
[21815.303862] LNetError: Unexpected error -2 connecting to 10.240.23.49@tcp at host 10.240.23.49:7988 [21815.305786] LNetError: Skipped 1 previous similar message [21818.375690] nfs: server 10.240.16.204 not responding, timed out [21818.375742] nfs: server 10.240.16.204 not responding, still trying [21818.377043] nfs: server 10.240.16.204 not responding, still trying ... [25124.265111] nfs: server 10.240.16.204 not responding, timed out [25133.480875] nfs: server 10.240.16.204 not responding, timed out ...
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/111814 - 5.15.0-94-generic
servers: https://build.whamcloud.com/job/lustre-reviews/111814 - 4.18.0-553.40.1.el8_lustre.x86_64
And another timeout case showed the following Call Trace at https://testing.whamcloud.com/test_logs/f5c5ffa4-f1d4-49a6-a3ae-322f7a78884f/show_text
[21749.376971] INFO: task tee:900701 blocked for more than 120 seconds. [21749.380450] Tainted: G OE 5.15.0-94-generic #104-Ubuntu [21749.381879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21749.383394] task:tee state:D stack: 0 pid:900701 ppid: 3172 flags:0x00000002 [21749.385045] Call Trace: [21749.385622] <TASK> [21749.386143] __schedule+0x24e/0x590 [21749.386928] schedule+0x69/0x110 [21749.387625] io_schedule+0x46/0x80 [21749.388341] wait_on_page_bit_common+0x10c/0x3d0 [21749.389319] ? filemap_invalidate_unlock_two+0x50/0x50 [21749.390346] wait_on_page_bit+0x3f/0x50 [21749.391139] wait_on_page_writeback+0x26/0x80 [21749.392030] wait_for_stable_page+0x32/0x40 [21749.392900] grab_cache_page_write_begin+0x31/0x40 [21749.393880] nfs_write_begin+0x61/0x300 [nfs] [21749.394871] generic_perform_write+0xc9/0x200 [21749.395754] ? sched_clock+0x9/0x10 [21749.396505] ? __cond_resched+0x1a/0x50 [21749.397316] nfs_file_write+0x1a7/0x2c0 [nfs] [21749.398227] new_sync_write+0x114/0x1a0 [21749.399047] vfs_write+0x1d5/0x270 [21749.399770] ksys_write+0x67/0xf0 [21749.400466] __x64_sys_write+0x19/0x20 [21749.401269] do_syscall_64+0x5c/0xc0 [21749.402041] ? syscall_exit_to_user_mode+0x35/0x50 [21749.402995] ? __x64_sys_read+0x19/0x20 [21749.403815] ? do_syscall_64+0x69/0xc0 [21749.404594] ? do_syscall_64+0x69/0xc0 [21749.405380] ? do_syscall_64+0x69/0xc0 [21749.406251] ? do_syscall_64+0x69/0xc0 [21749.407104] entry_SYSCALL_64_after_hwframe+0x62/0xcc
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-sec test_27ab - Timeout occurred after 450 minutes, last suite running was sanity-sec
Attachments
Issue Links
- is duplicated by
-
LU-18569 sanity-sec: timeout - clients lost connection to MGS
-
- Open
-