[LU-15473] sanity test_230d: Timeout waiting for IOs on all nodes Created: 21/Jan/22  Updated: 22/Feb/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6c537d23-38e5-4825-a422-bbca89bdc908

test_230d failed with the following error:

Timeout occurred after 265 mins, last suite running was sanity

This seems to be hardware related. All node (even the clients) seems to wait for io:

*client1:*

...
[12960.292893] INFO: task jbd2/vda1-8:268 blocked for more than 120 seconds.
[12960.294060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12960.295280] jbd2/vda1-8     D ffffa025761447e0     0   268      2 0x00000000
[12960.296455] Call Trace:
[12960.297713]  [<ffffffffa2789179>] schedule+0x29/0x70
[12960.298496]  [<ffffffffa2786e41>] schedule_timeout+0x221/0x2d0
[12960.303047]  [<ffffffffa2788a2d>] io_schedule_timeout+0xad/0x130
[12960.303979]  [<ffffffffa2788ac8>] io_schedule+0x18/0x20
[12960.304789]  [<ffffffffa2787491>] bit_wait_io+0x11/0x50
[12960.305605]  [<ffffffffa2786fb7>] __wait_on_bit+0x67/0x90
[12960.307245]  [<ffffffffa2787121>] out_of_line_wait_on_bit+0x81/0xb0
[12960.309171]  [<ffffffffa228723a>] __wait_on_buffer+0x2a/0x30
[12960.310124]  [<ffffffffc03dc871>] jbd2_journal_commit_transaction+0x1771/0x19c0 [jbd2]
[12960.312219]  [<ffffffffc03e1f89>] kjournald2+0xc9/0x260 [jbd2]
[12960.315005]  [<ffffffffa20c5e61>] kthread+0xd1/0xe0
[12960.318681] INFO: task 0anacron:4661 blocked for more than 120 seconds.
[12960.319697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12960.320887] 0anacron        D ffffa025dead68e0     0  4661   4657 0x00000080
[12960.322029] Call Trace:
[12960.323221]  [<ffffffffa2789179>] schedule+0x29/0x70
[12960.323998]  [<ffffffffa2786e41>] schedule_timeout+0x221/0x2d0
[12960.327541]  [<ffffffffa2788a2d>] io_schedule_timeout+0xad/0x130
[12960.328470]  [<ffffffffa2788ac8>] io_schedule+0x18/0x20
[12960.329276]  [<ffffffffa2787491>] bit_wait_io+0x11/0x50
[12960.330091]  [<ffffffffa2786fb7>] __wait_on_bit+0x67/0x90
[12960.331722]  [<ffffffffa2787121>] out_of_line_wait_on_bit+0x81/0xb0
[12960.333597]  [<ffffffffa228723a>] __wait_on_buffer+0x2a/0x30
[12960.334518]  [<ffffffffc03fc217>] __ext4_get_inode_loc+0x197/0x3c0 [ext4]
[12960.335572]  [<ffffffffc03feb36>] ext4_iget+0x96/0xbd0 [ext4]
[12960.336470]  [<ffffffffc03ff6a5>] ext4_iget_normal+0x35/0x40 [ext4]
[12960.337446]  [<ffffffffc0409c52>] ext4_lookup+0xc2/0x160 [ext4]
[12960.338368]  [<ffffffffa22591d3>] lookup_real+0x23/0x60
[12960.339179]  [<ffffffffa2259bf2>] __lookup_hash+0x42/0x60
[12960.340033]  [<ffffffffa27800e5>] lookup_slow+0x42/0xa7
[12960.340842]  [<ffffffffa225cdbf>] link_path_walk+0x80f/0x8b0
[12960.341719]  [<ffffffffa225cfca>] path_lookupat+0x7a/0x8d0
[12960.346235]  [<ffffffffa225d84b>] filename_lookup+0x2b/0xc0
[12960.347098]  [<ffffffffa2261557>] user_path_at_empty+0x67/0xc0
[12960.349786]  [<ffffffffa22615c1>] user_path_at+0x11/0x20
[12960.350612]  [<ffffffffa224c902>] SyS_faccessat+0xb2/0x230
[12960.351469]  [<ffffffffa2795f92>] system_call_fastpath+0x25/0x2a
...

*MDS:*

...
[12720.294647] INFO: task jbd2/vda1-8:268 blocked for more than 120 seconds.
[12720.297276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12720.298494] jbd2/vda1-8     D ffff9fd5f63bd860     0   268      2 0x00000000
[12720.299642] Call Trace:
[12720.300874]  [<ffffffff92589179>] schedule+0x29/0x70
[12720.301645]  [<ffffffff92586e41>] schedule_timeout+0x221/0x2d0
[12720.306109]  [<ffffffff92588a2d>] io_schedule_timeout+0xad/0x130
[12720.307026]  [<ffffffff92588ac8>] io_schedule+0x18/0x20
[12720.307831]  [<ffffffff92587491>] bit_wait_io+0x11/0x50
[12720.308634]  [<ffffffff92586fb7>] __wait_on_bit+0x67/0x90
[12720.310235]  [<ffffffff92587121>] out_of_line_wait_on_bit+0x81/0xb0
[12720.312135]  [<ffffffff9208724a>] __wait_on_buffer+0x2a/0x30
[12720.313143]  [<ffffffffc049c871>] jbd2_journal_commit_transaction+0x1771/0x19c0 [jbd2]
[12720.315211]  [<ffffffffc04a1f89>] kjournald2+0xc9/0x260 [jbd2]
[12720.317980]  [<ffffffff91ec5e61>] kthread+0xd1/0xe0
...

*OST:*

...
[12840.193849] INFO: task jbd2/vda1-8:267 blocked for more than 120 seconds.
[12840.195003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12840.196213] jbd2/vda1-8     D ffff937175945860     0   267      2 0x00000000
[12840.197371] Call Trace:
[12840.198599]  [<ffffffff88589179>] schedule+0x29/0x70
[12840.199389]  [<ffffffff88586e41>] schedule_timeout+0x221/0x2d0
[12840.203947]  [<ffffffff88588a2d>] io_schedule_timeout+0xad/0x130
[12840.204877]  [<ffffffff88588ac8>] io_schedule+0x18/0x20
[12840.205687]  [<ffffffff88587491>] bit_wait_io+0x11/0x50
[12840.206489]  [<ffffffff88586fb7>] __wait_on_bit+0x67/0x90
[12840.208134]  [<ffffffff88587121>] out_of_line_wait_on_bit+0x81/0xb0
[12840.210022]  [<ffffffff8808724a>] __wait_on_buffer+0x2a/0x30
[12840.210938]  [<ffffffffc033ff72>] jbd2_journal_commit_transaction+0xe72/0x19c0 [jbd2]
[12840.213018]  [<ffffffffc0345f89>] kjournald2+0xc9/0x260 [jbd2]
[12840.215786]  [<ffffffff87ec5e61>] kthread+0xd1/0xe0

...

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_230d - Timeout occurred after 265 mins, last suite running was sanity


Generated at Sat Feb 10 03:18:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.