[LU-17324] sanity test_398q: timeout Created: 30/Nov/23  Updated: 30/Nov/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/b261ea50-f00e-4dcf-83df-34b1a8c21d9a

test_398q failed with the following error:

Timeout occurred after 374 minutes, last suite running was sanity

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/100517 - 5.14.21-150400.24.92-default
servers: https://build.whamcloud.com/job/lustre-reviews/100517 - 4.18.0-477.27.1.el8_lustre.x86_64

test log:

== sanity test 398q: race dio with buffered i/o ========== 18:06:49 (1701108409)
1+0 records in
1+0 records out
26214400 bytes (26 MB, 25 MiB) copied, 0.571818 s, 45.8 MB/s
bs: 4096, file_size 26214400
3200+0 records in
3200+0 records out
26214400 bytes (26 MB, 25 MiB) copied, 25.0115 s, 1.0 MB/s

Nothing in the dmesg.

The stack trace seems to indicate that lustre is waiting for the journal to commit:
client:

[Mon Nov 27 19:37:00 2023] task:dd              state:I stack:    0 pid: 6854 ppid:  6721 flags:0x00000000
[Mon Nov 27 19:37:00 2023] Call Trace:
[Mon Nov 27 19:37:00 2023]  <TASK>
[Mon Nov 27 19:37:00 2023]  __schedule+0x2d1/0x1150
[Mon Nov 27 19:37:00 2023]  ? __SCT__cond_resched+0x8/0x8
[Mon Nov 27 19:37:00 2023]  schedule+0x64/0xe0
[Mon Nov 27 19:37:00 2023]  cl_sync_io_wait+0x18f/0x360 [obdclass bb912653934ba2f8282cd62f2f26e68b4c4e6050]
[Mon Nov 27 19:37:00 2023]  ? wait_woken+0x70/0x70
[Mon Nov 27 19:37:00 2023]  cl_sync_io_wait_recycle+0x27/0x40 [obdclass bb912653934ba2f8282cd62f2f26e68b4c4e6050]
[Mon Nov 27 19:37:00 2023]  ll_file_io_generic+0x69e/0xe60 [lustre 5294ccb855009b7823aa6fa7626a9d4aa8816337]
[Mon Nov 27 19:37:00 2023]  ll_file_write_iter+0x4b2/0x8c0 [lustre 5294ccb855009b7823aa6fa7626a9d4aa8816337]
[Mon Nov 27 19:37:00 2023]  ? vfs_read+0x16f/0x190
[Mon Nov 27 19:37:00 2023]  new_sync_write+0x11f/0x1c0
[Mon Nov 27 19:37:00 2023]  vfs_write+0x220/0x280
[Mon Nov 27 19:37:00 2023]  ksys_write+0xa5/0xe0
[Mon Nov 27 19:37:00 2023]  do_syscall_64+0x5b/0x80
[Mon Nov 27 19:37:00 2023]  ? exit_to_user_mode_prepare+0x1dc/0x230
[Mon Nov 27 19:37:00 2023]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[Mon Nov 27 19:37:00 2023] RIP: 0033:0x7f122ba2d9d3
[Mon Nov 27 19:37:00 2023] RSP: 002b:00007ffde9ef96b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[Mon Nov 27 19:37:00 2023] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f122ba2d9d3
[Mon Nov 27 19:37:00 2023] RDX: 0000000000002000 RSI: 00005595505a0000 RDI: 0000000000000001
[Mon Nov 27 19:37:00 2023] RBP: 0000000000002000 R08: 000055955059f3b0 R09: 0000000000000003
[Mon Nov 27 19:37:00 2023] R10: 00007f122b92e468 R11: 0000000000000246 R12: 00005595505a0000
[Mon Nov 27 19:37:00 2023] R13: 0000000000000200 R14: 0000000000000000 R15: 00005595505a0000
[Mon Nov 27 19:37:00 2023]  </TASK>

oss:

[Mon Nov 27 19:37:17 2023] task:jbd2/dm-12-8    state:D stack:    0 pid:489834 ppid:     2 flags:0x80004080
[Mon Nov 27 19:37:17 2023] Call Trace:
[Mon Nov 27 19:37:17 2023]  __schedule+0x2d1/0x870
[Mon Nov 27 19:37:17 2023]  ? bit_wait_timeout+0xa0/0xa0
[Mon Nov 27 19:37:17 2023]  schedule+0x55/0xf0
[Mon Nov 27 19:37:17 2023]  io_schedule+0x12/0x40
[Mon Nov 27 19:37:17 2023]  bit_wait_io+0xd/0x50
[Mon Nov 27 19:37:17 2023]  __wait_on_bit+0x2d/0x90
[Mon Nov 27 19:37:17 2023]  out_of_line_wait_on_bit+0x91/0xb0
[Mon Nov 27 19:37:17 2023]  ? var_wake_function+0x30/0x30
[Mon Nov 27 19:37:17 2023]  jbd2_journal_commit_transaction+0x14db/0x19f0 [jbd2]
[Mon Nov 27 19:37:17 2023]  ? finish_task_switch+0x86/0x2e0
[Mon Nov 27 19:37:17 2023]  kjournald2+0xbd/0x270 [jbd2]
[Mon Nov 27 19:37:17 2023]  ? finish_wait+0x80/0x80
[Mon Nov 27 19:37:17 2023]  ? commit_timeout+0x10/0x10 [jbd2]
[Mon Nov 27 19:37:17 2023]  kthread+0x134/0x150
[Mon Nov 27 19:37:17 2023]  ? set_kthread_struct+0x50/0x50
[Mon Nov 27 19:37:17 2023]  ret_from_fork+0x35/0x40

[Mon Nov 27 19:37:17 2023] task:ll_ost_io00_007 state:D stack:    0 pid:644296 ppid:     2 flags:0x80004080
[Mon Nov 27 19:37:17 2023] Call Trace:
[Mon Nov 27 19:37:17 2023]  __schedule+0x2d1/0x870
[Mon Nov 27 19:37:17 2023]  ? __wake_up_common_lock+0x89/0xc0
[Mon Nov 27 19:37:17 2023]  schedule+0x55/0xf0
[Mon Nov 27 19:37:17 2023]  jbd2_log_wait_commit+0xac/0x120 [jbd2]
[Mon Nov 27 19:37:17 2023]  ? finish_wait+0x80/0x80
[Mon Nov 27 19:37:17 2023]  jbd2_journal_stop+0x2d6/0x330 [jbd2]
[Mon Nov 27 19:37:17 2023]  ? tgt_last_rcvd_update+0x32a/0xae0 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  __ldiskfs_journal_stop+0x36/0xb0 [ldiskfs]
[Mon Nov 27 19:37:17 2023]  osd_trans_stop+0x23d/0x9f0 [osd_ldiskfs]
[Mon Nov 27 19:37:17 2023]  ofd_commitrw_write+0x551/0x1a80 [ofd]
[Mon Nov 27 19:37:17 2023]  ofd_commitrw+0x5d4/0xd60 [ofd]
[Mon Nov 27 19:37:17 2023]  ? obd_commitrw+0x1b6/0x370 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  obd_commitrw+0x1b6/0x370 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  tgt_brw_write+0xef7/0x1fb0 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  ? tgt_request_preprocess.isra.30+0xd3/0x8d0 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  ? tgt_request_preprocess.isra.30+0xd3/0x8d0 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  tgt_request_handle+0x378/0x19d0 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  ptlrpc_server_handle_request+0x3d1/0xc30 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  ? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
[Mon Nov 27 19:37:17 2023]  ptlrpc_main+0xc61/0x15e0 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  ? __schedule+0x2d9/0x870
[Mon Nov 27 19:37:17 2023]  ? ptlrpc_wait_event+0x560/0x560 [ptlrpc]
[Mon Nov 27 19:37:17 2023]  kthread+0x134/0x150
[Mon Nov 27 19:37:17 2023]  ? set_kthread_struct+0x50/0x50
[Mon Nov 27 19:37:17 2023]  ret_from_fork+0x35/0x40

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_398q - Timeout occurred after 374 minutes, last suite running was sanity


Generated at Sat Feb 10 03:34:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.