[LU-9877] conf-sanity test_84: test failed to respond and timed out Created: 14/Aug/17 Updated: 13/Oct/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/575246c6-7fd1-11e7-8823-5254006e85c2. The sub-test test_84 failed with the following error: test failed to respond and timed out Several instances of stack traces like this seen on console log of Client: 23:10:43:[11160.267697] INFO: task tee:19197 blocked for more than 120 seconds. 23:10:43:[11160.268385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 23:10:43:[11160.269165] tee D ffffffff8168a850 0 19197 1556 0x00000080 23:10:43:[11160.270216] ffff8800798ef9b0 0000000000000082 ffff88007a966dd0 ffff8800798effd8 23:10:43:[11160.271166] ffff8800798effd8 ffff8800798effd8 ffff88007a966dd0 ffff88007fc16c40 23:10:43:[11160.272094] 0000000000000000 7fffffffffffffff ffff88007ff5d728 ffffffff8168a850 23:10:43:[11160.273036] Call Trace: 23:10:43:[11160.273305] [<ffffffff8168a850>] ? bit_wait+0x50/0x50 23:10:43:[11160.273842] [<ffffffff8168c7f9>] schedule+0x29/0x70 23:10:43:[11160.274719] [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0 23:10:43:[11160.275284] [<ffffffff810d1f43>] ? find_busiest_group+0x143/0x920 23:10:43:[11160.275911] [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30 23:10:43:[11160.276937] [<ffffffff8168a850>] ? bit_wait+0x50/0x50 23:10:43:[11160.277441] [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130 23:10:43:[11160.278064] [<ffffffff8168be38>] io_schedule+0x18/0x20 23:10:43:[11160.278925] [<ffffffff8168a861>] bit_wait_io+0x11/0x50 23:10:43:[11160.279590] [<ffffffff8168a385>] __wait_on_bit+0x65/0x90 23:10:43:[11160.280146] [<ffffffff8168a850>] ? bit_wait+0x50/0x50 23:10:43:[11160.280902] [<ffffffff8168a431>] out_of_line_wait_on_bit+0x81/0xb0 23:10:43:[11160.281646] [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40 23:10:43:[11160.282264] [<ffffffffa04e3a53>] nfs_wait_on_request+0x33/0x40 [nfs] 23:10:43:[11160.283196] [<ffffffffa04e8991>] nfs_updatepage+0x151/0x8d0 [nfs] 23:10:43:[11160.283988] [<ffffffffa04d8171>] nfs_write_end+0x121/0x350 [nfs] 23:10:43:[11160.284764] [<ffffffff81181c29>] generic_file_buffered_write+0x189/0x2a0 23:10:43:[11160.285545] [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20 23:10:43:[11160.286181] [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20 23:10:43:[11160.287045] [<ffffffff811831a2>] __generic_file_aio_write+0x1e2/0x400 23:10:43:[11160.287877] [<ffffffff81183419>] generic_file_aio_write+0x59/0xa0 23:10:43:[11160.288620] [<ffffffffa04d715b>] nfs_file_write+0xbb/0x1e0 [nfs] 23:10:43:[11160.289237] [<ffffffff811fe18d>] do_sync_write+0x8d/0xd0 23:10:43:[11160.290039] [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0 23:10:43:[11160.290708] [<ffffffff811fe8c7>] ? vfs_read+0xf7/0x170 23:10:43:[11160.291217] [<ffffffff811ff51f>] SyS_write+0x7f/0xe0 23:10:43:[11160.291743] [<ffffffff81697809>] system_call_fastpath+0x16/0x1b Suggests the failure is in access to nfs, not lustre. Info required for matching: conf-sanity 84 |
| Comments |
| Comment by Steve Guminski (Inactive) [ 14/Aug/17 ] |
|
Another on master. My client log shows the same timeout for NFS: https://testing.hpdd.intel.com/test_sessions/1e6fc7e0-c0f3-4090-825c-e8716a417d94 |
| Comment by James Nunez (Inactive) [ 17/Aug/17 ] |
|
We're seeing this on several different test suites. ost_pools test_20 hang with this 'tee' hang. Logs at https://testing.hpdd.intel.com/test_sets/fee52b1c-82ea-11e7-980b-5254006e85c2 |