[LU-3329] Failure on test suite sanity test_18: test failed to respond and timed out Created: 13/May/13  Updated: 17/Apr/17  Resolved: 17/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

tag-2.3.65
client is running SLES11 SP2


Severity: 3
Rank (Obsolete): 8211

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/68dcb62c-ba7f-11e2-b1a3-52540035b04c.

The sub-test test_18 failed with the following error:

test failed to respond and timed out

Found some D processes on OST side:

kjournald     D 0000000000000000     0   354      2 0x00000000
 ffff880037bf3c30 0000000000000046 00000010ffffffff 00000d55bc99ffca
 ffff88000002bb08 ffff88007969a9f0 00000000000877bc ffffffffae72b5ca
 ffff880037b5faf8 ffff880037bf3fd8 000000000000fb88 ffff880037b5faf8
Call Trace:
 [<ffffffff810a1ac9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff811b60c0>] ? sync_buffer+0x0/0x50
 [<ffffffff8150e723>] io_schedule+0x73/0xc0
 [<ffffffff811b6100>] sync_buffer+0x40/0x50
 [<ffffffff8150f0df>] __wait_on_bit+0x5f/0x90
 [<ffffffff811b60c0>] ? sync_buffer+0x0/0x50
 [<ffffffff8150f188>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81096ce0>] ? wake_bit_function+0x0/0x50
 [<ffffffff811b60b6>] __wait_on_buffer+0x26/0x30
 [<ffffffff811b70d1>] __sync_dirty_buffer+0x71/0xf0
 [<ffffffffa00683c5>] journal_commit_transaction+0xe35/0x1310 [jbd]
 [<ffffffff81080fcc>] ? lock_timer_base+0x3c/0x70
 [<ffffffff81081a5b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa006d768>] kjournald+0xe8/0x250 [jbd]
 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa006d680>] ? kjournald+0x0/0x250 [jbd]
 [<ffffffff81096936>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810968a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
auditd        D 0000000000000000     0  1104      1 0x00000000
 ffff88007db6be08 0000000000000086 ffff88007db6bd88 ffff880037bf3e80
 0000000000000000 ffff8800374208d0 0000000000000000 0000000000000000
 ffff880037f6dab8 ffff88007db6bfd8 000000000000fb88 ffff880037f6dab8
Call Trace:
 [<ffffffff81096f8e>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa006d545>] log_wait_commit+0xc5/0x140 [jbd]
 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00855e6>] ext3_sync_file+0x126/0x1a0 [ext3]
 [<ffffffff8111a618>] ? filemap_write_and_wait_range+0x78/0x90
 [<ffffffff811b1b11>] vfs_fsync_range+0xa1/0xe0
 [<ffffffff811b1bbd>] vfs_fsync+0x1d/0x20
 [<ffffffff811b1bfe>] do_fsync+0x3e/0x60
 [<ffffffff811b1c50>] sys_fsync+0x10/0x20
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Peter Jones [ 14/May/13 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 15/May/13 ]

the syslog of MDS shows that for some unknown reasons ntpd of it reset its time forward around 40 minutes

May 10 12:49:57 client-19vm3 ntpd[1681]: synchronized to 10.10.0.1, stratum 3
May 10 13:00:35 client-19vm3 ntpd[1681]: kernel time sync status change 2001
May 10 13:42:31 client-19vm3 xinetd[1673]: START: shell pid=26547 from=::ffff:10.10.4.220

Comment by Sarah Liu [ 14/Jun/13 ]

another instance:
https://maloo.whamcloud.com/test_sets/2171fbce-d511-11e2-b13b-52540035b04c

Comment by Andreas Dilger [ 17/Apr/17 ]

Close old issue.

Generated at Sat Feb 10 01:32:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.