[LU-9970] sanity-hsm test_40: test failed to respond and timed out Created: 10/Sep/17  Updated: 12/Nov/17  Resolved: 12/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/7f44a5c8-9635-11e7-b760-5254006e85c2.

The sub-test test_40 failed with the following error:

test failed to respond and timed out

from the following stack trace it looks like the client hung during NFS access:

13:23:32:[28440.201797] INFO: task tee:29977 blocked for more than 120 seconds.
13:23:32:[28440.202454] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13:23:32:[28440.203232] tee             D ffffffff8168a850     0 29977   1530 0x00000080
13:23:32:[28440.204007]  ffff880020db39b0 0000000000000086 ffff88007b4ade20 ffff880020db3fd8
13:23:32:[28440.204854]  ffff880020db3fd8 ffff880020db3fd8 ffff88007b4ade20 ffff88007fd16c40
13:23:32:[28440.205666]  0000000000000000 7fffffffffffffff ffff88007ff5d710 ffffffff8168a850
13:23:32:[28440.206481] Call Trace:
13:23:32:[28440.206731]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
13:23:32:[28440.207234]  [<ffffffff8168c7f9>] schedule+0x29/0x70
13:23:32:[28440.207799]  [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0
13:23:32:[28440.208358]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
13:23:32:[28440.208965]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
13:23:32:[28440.209517]  [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130
13:23:32:[28440.210101]  [<ffffffff8168be38>] io_schedule+0x18/0x20
13:23:32:[28440.210647]  [<ffffffff8168a861>] bit_wait_io+0x11/0x50
13:23:32:[28440.211158]  [<ffffffff8168a385>] __wait_on_bit+0x65/0x90
13:23:32:[28440.211722]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
13:23:32:[28440.212225]  [<ffffffff8168a431>] out_of_line_wait_on_bit+0x81/0xb0
13:23:32:[28440.212875]  [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
13:23:32:[28440.213491]  [<ffffffffa04d6a53>] nfs_wait_on_request+0x33/0x40 [nfs]
13:23:32:[28440.214121]  [<ffffffffa04db991>] nfs_updatepage+0x151/0x8d0 [nfs]
13:23:32:[28440.214755]  [<ffffffffa04cb171>] nfs_write_end+0x121/0x350 [nfs]
13:23:32:[28440.215351]  [<ffffffff81181c29>] generic_file_buffered_write+0x189/0x2a0
13:23:32:[28440.216066]  [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20
13:23:32:[28440.216697]  [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20
13:23:32:[28440.217309]  [<ffffffff811831a2>] __generic_file_aio_write+0x1e2/0x400
13:23:32:[28440.217994]  [<ffffffff81183419>] generic_file_aio_write+0x59/0xa0
13:23:32:[28440.218626]  [<ffffffffa04ca15b>] nfs_file_write+0xbb/0x1e0 [nfs]
13:23:32:[28440.219221]  [<ffffffff811fe18d>] do_sync_write+0x8d/0xd0
13:23:32:[28440.219796]  [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0
13:23:32:[28440.220286]  [<ffffffff811fe8c7>] ? vfs_read+0xf7/0x170
13:23:32:[28440.220808]  [<ffffffff811ff51f>] SyS_write+0x7f/0xe0
13:23:32:[28440.221338]  [<ffffffff81697809>] system_call_fastpath+0x16/0x1b
13:23:32:[28455.275792] nfs: server onyx-3 not responding, still trying

That being the case this TIMEOUT might very well be a DCO issue, not lustre at all.

Info required for matching: sanity-hsm 40



 Comments   
Comment by Bruno Faccini (Inactive) [ 11/Sep/17 ]

My 2 cents that this is a dup of DCO-7492.

Generated at Sat Feb 10 02:30:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.