[LU-9608] recovery-small test_57: test failed to respond and timed out Created: 06/Jun/17 Updated: 21/May/21 Resolved: 21/May/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
onyx, failover |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/31124cb2-cc25-46ca-89d1-f0f57d60054c Looked through all logs of the TIMEOUT subtest (57) and the PASS subtest that preceeded it (56). The only 21:47:04:[188400.664338] tee D ffffffff8168a1e0 0 8273 10981 0x00000080 21:47:04:[188400.666145] ffff880012c6f9b0 0000000000000086 ffff88000303ce70 ffff880012c6ffd8 21:47:04:[188400.668007] ffff880012c6ffd8 ffff880012c6ffd8 ffff88000303ce70 ffff88007fc16c40 21:47:04:[188400.669475] 0000000000000000 7fffffffffffffff ffff88007ff607a0 ffffffff8168a1e0 21:47:04:[188400.671536] Call Trace: 21:47:04:[188400.672816] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 21:47:04:[188400.674403] [<ffffffff8168c169>] schedule+0x29/0x70 21:47:04:[188400.675681] [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0 21:47:04:[188400.677296] [<ffffffff811dbad9>] ? ___slab_alloc+0x209/0x4f0 21:47:04:[188400.678583] [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30 21:47:04:[188400.680132] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 21:47:04:[188400.681545] [<ffffffff8168b70e>] io_schedule_timeout+0xae/0x130 21:47:04:[188400.683008] [<ffffffff8168b7a8>] io_schedule+0x18/0x20 21:47:04:[188400.684242] [<ffffffff8168a1f1>] bit_wait_io+0x11/0x50 21:47:04:[188400.685688] [<ffffffff81689d15>] __wait_on_bit+0x65/0x90 21:47:04:[188400.687050] [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50 21:47:04:[188400.688347] [<ffffffff81689dc1>] out_of_line_wait_on_bit+0x81/0xb0 21:47:04:[188400.689715] [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40 21:47:04:[188400.691008] [<ffffffffa05af9f3>] nfs_wait_on_request+0x33/0x40 [nfs] 21:47:04:[188400.692582] [<ffffffffa05b4931>] nfs_updatepage+0x151/0x8d0 [nfs] 21:47:04:[188400.694007] [<ffffffffa05a4131>] nfs_write_end+0x121/0x350 [nfs] 21:47:04:[188400.695366] [<ffffffff81181b99>] generic_file_buffered_write+0x189/0x2a0 21:47:04:[188400.696806] [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20 21:47:04:[188400.698188] [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20 21:47:04:[188400.699529] [<ffffffff81183112>] __generic_file_aio_write+0x1e2/0x400 21:47:04:[188400.701007] [<ffffffff81183389>] generic_file_aio_write+0x59/0xa0 21:47:04:[188400.702458] [<ffffffffa05a311b>] nfs_file_write+0xbb/0x1e0 [nfs] 21:47:04:[188400.703874] [<ffffffff811fdf3d>] do_sync_write+0x8d/0xd0 21:47:04:[188400.705007] [<ffffffff811fe7ad>] vfs_write+0xbd/0x1e0 21:47:04:[188400.706507] [<ffffffff811fe677>] ? vfs_read+0xf7/0x170 21:47:04:[188400.707858] [<ffffffff811ff2cf>] SyS_write+0x7f/0xe0 21:47:04:[188400.709098] [<ffffffff81697189>] system_call_fastpath+0x16/0x1b and 21:47:04:[188400.710582] INFO: task lsof:32228 blocked for more than 120 seconds. 21:47:04:[188400.712009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 21:47:04:[188400.713397] lsof D ffff88001a713bc8 0 32228 32226 0x00000080 21:47:04:[188400.715028] ffff88004a9bbd90 0000000000000086 ffff880006e0de20 ffff88004a9bbfd8 21:47:04:[188400.716564] ffff88004a9bbfd8 ffff88004a9bbfd8 ffff880006e0de20 ffff88001a713bc0 21:47:04:[188400.718334] ffff88001a713bc4 ffff880006e0de20 00000000ffffffff ffff88001a713bc8 21:47:04:[188400.719925] Call Trace: 21:47:04:[188400.721209] [<ffffffff8168d259>] schedule_preempt_disabled+0x29/0x70 21:47:04:[188400.722865] [<ffffffff8168aeb5>] __mutex_lock_slowpath+0xc5/0x1c0 21:47:04:[188400.724339] [<ffffffff8168a31f>] mutex_lock+0x1f/0x2f 21:47:04:[188400.725730] [<ffffffffa05a779b>] nfs_getattr+0xfb/0x230 [nfs] 21:47:04:[188400.727107] [<ffffffff81203726>] vfs_getattr+0x46/0x80 21:47:04:[188400.728533] [<ffffffff81203855>] vfs_fstatat+0x75/0xc0 21:47:04:[188400.729630] [<ffffffff81203dae>] SYSC_newstat+0x2e/0x60 21:47:04:[188400.731011] [<ffffffff811fe677>] ? vfs_read+0xf7/0x170 21:47:04:[188400.732374] [<ffffffff8111f426>] ? __audit_syscall_exit+0x1e6/0x280 21:47:04:[188400.733819] [<ffffffff8120408e>] SyS_newstat+0xe/0x10 21:47:04:[188400.735144] [<ffffffff81697189>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Sarah Liu [ 07/Jun/17 ] |
|
another one found on performance-sanity on tag-2.9.58 DNE+zfs full group |
| Comment by James Casper [ 26/Sep/17 ] |
|
failover 2.10.1 |