[LU-9777] replay-single test_70f: test failed to respond and timed out Created: 14/Jul/17 Updated: 13/Oct/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a9e1439e-689b-11e7-9230-5254006e85c2. The sub-test test_70f failed with the following error: test failed to respond and timed out Please provide additional information about the failure here. Info required for matching: replay-single 70f |
| Comments |
| Comment by Sarah Liu [ 18/Sep/17 ] |
|
another one on 2.10.1 https://testing.hpdd.intel.com/test_sets/4e4e2c74-9b4d-11e7-ba27-5254006e85c2 10:01:57:[38268.794445] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70f failing OST 5 times 10:01:57:[38268.956844] Lustre: DEBUG MARKER: test_70f failing OST 5 times 10:01:57:[38280.356600] INFO: task tee:16642 blocked for more than 120 seconds. 10:01:57:[38280.357275] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 10:01:57:[38280.358017] tee D ffffffff816a7610 0 16642 1505 0x00000080 10:01:57:[38280.358735] ffff88006740b9b0 0000000000000082 ffff880039d1cf10 ffff88006740bfd8 10:01:57:[38280.359563] ffff88006740bfd8 ffff88006740bfd8 ffff880039d1cf10 ffff88007fd16cc0 10:01:57:[38280.360369] 0000000000000000 7fffffffffffffff ffff88007ff5d770 ffffffff816a7610 10:01:57:[38280.361164] Call Trace: 10:01:57:[38280.361414] [<ffffffff816a7610>] ? bit_wait+0x50/0x50 10:01:57:[38280.361903] [<ffffffff816a94e9>] schedule+0x29/0x70 10:01:57:[38280.362481] [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0 10:01:57:[38280.363037] [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20 10:01:57:[38280.363671] [<ffffffff816a7610>] ? bit_wait+0x50/0x50 10:01:57:[38280.364184] [<ffffffff816a8b6d>] io_schedule_timeout+0xad/0x130 10:01:57:[38280.364760] [<ffffffff816a8c08>] io_schedule+0x18/0x20 10:01:57:[38280.365299] [<ffffffff816a7621>] bit_wait_io+0x11/0x50 10:01:57:[38280.365795] [<ffffffff816a7145>] __wait_on_bit+0x65/0x90 10:01:57:[38280.366349] [<ffffffff816a7610>] ? bit_wait+0x50/0x50 10:01:57:[38280.366840] [<ffffffff816a71f1>] out_of_line_wait_on_bit+0x81/0xb0 10:01:57:[38280.367470] [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40 10:01:57:[38280.368041] [<ffffffffc04f5db3>] nfs_wait_on_request+0x33/0x40 [nfs] 10:01:57:[38280.368720] [<ffffffffc04faeb3>] nfs_updatepage+0x153/0x8d0 [nfs] 10:01:57:[38280.369338] [<ffffffffc04ea371>] nfs_write_end+0x141/0x350 [nfs] 10:01:57:[38280.369918] [<ffffffff81182539>] generic_file_buffered_write+0x189/0x2a0 10:01:57:[38280.370607] [<ffffffff811848f2>] __generic_file_aio_write+0x1e2/0x400 10:01:57:[38280.371266] [<ffffffff8121cf41>] ? update_time+0x81/0xd0 10:01:57:[38280.371776] [<ffffffff81184b69>] generic_file_aio_write+0x59/0xa0 10:01:57:[38280.372407] [<ffffffffc04e97db>] nfs_file_write+0xbb/0x1e0 [nfs] 10:01:57:[38280.372985] [<ffffffff8120027d>] do_sync_write+0x8d/0xd0 10:01:57:[38280.373548] [<ffffffff81200d3d>] vfs_write+0xbd/0x1e0 10:01:57:[38280.374036] [<ffffffff81201b4f>] SyS_write+0x7f/0xe0 10:01:57:[38280.374567] [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b 10:01:57:[38285.330324] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u 10:01:57:[38285.797355] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-49vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid 10:01:57:[38285.989862] Lustre: DEBUG MARKER: onyx-49vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid 10:01:57:[38286.082508] Lustre: DEBUG MARKER: lctl get_param -n at_max 10:01:57:[38287.271192] Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 1 sec 10:01:57:[38287.462276] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 1 sec 10:01:57:[38289.182270] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param ldlm.namespaces.*.lru_size=clear 10:01:57:[38289.906118] Lustre: DEBUG MARKER: md5sum /mnt/lustre/d70f.replay-single/f70f.replay-single.onyx-49vm4 10:01:57:[38290.090446] Lustre: DEBUG MARKER: dd bs=1M count=10 if=/tmp/f70f.replay-single of=/mnt/lustre/d70f.replay-single/f70f.replay-single.onyx-49vm3.onyx.hpdd.intel.com 10:01:57:[38290.404005] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param ldlm.namespaces.*.lru_size=clear 10:01:57:[38291.165204] Lustre: DEBUG MARKER: rm -f /tmp/f70f.replay-single 10:01:57:[38291.487042] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null 10:01:57:[38291.966735] Lustre: DEBUG MARKER: rc=0; val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ $? -eq 0 && $val -ne 0 ]]; then echo $(hostname -s): $val; rc=$val; fi; exit $rc 10:01:57:[38292.648060] Lustre: DEBUG MARKER: dmesg 10:01:57:[38400.374599] INFO: task tee:16642 blocked for more than 120 seconds. 10:01:57:[38400.375292] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 10:01:57:[38400.376074] tee D ffffffff816a7610 0 16642 1505 0x00000080 10:01:57:[38400.376907] ffff88006740b9b0 0000000000000082 ffff880039d1cf10 ffff88006740bfd8 10:01:57:[38400.377727] ffff88006740bfd8 ffff88006740bfd8 ffff880039d1cf10 ffff88007fd16cc0 10:01:57:[38400.378543] 0000000000000000 7fffffffffffffff ffff88007ff5d770 ffffffff816a7610 10:09:48:[38400.379382] Call Trace: 10:09:48:[38400.379649] [<ffffffff816a7610>] ? bit_wait+0x50/0x50 10:09:48:[38400.380240] [<ffffffff816a94e9>] schedule+0x29/0x70 10:09:48:[38400.380731] [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0 10:09:48:[38400.381355] [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20 10:09:48:[38400.381962] [<ffffffff816a7610>] ? bit_wait+0x50/0x50 10:09:48:[38400.382518] [<ffffffff816a8b6d>] io_schedule_timeout+0xad/0x130 10:09:48:[38400.383104] [<ffffffff816a8c08>] io_schedule+0x18/0x20 10:09:48:[38400.383706] [<ffffffff816a7621>] bit_wait_io+0x11/0x50 10:09:48:[38400.384256] [<ffffffff816a7145>] __wait_on_bit+0x65/0x90 10:09:48:[38400.384785] [<ffffffff816a7610>] ? bit_wait+0x50/0x50 10:09:48:[38400.385332] [<ffffffff816a71f1>] out_of_line_wait_on_bit+0x81/0xb0 10:09:48:[38400.385951] [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40 10:09:48:[38400.386617] [<ffffffffc04f5db3>] nfs_wait_on_request+0x33/0x40 [nfs] 10:09:48:[38400.387252] [<ffffffffc04faeb3>] nfs_updatepage+0x153/0x8d0 [nfs] 10:09:48:[38400.387865] [<ffffffffc04ea371>] nfs_write_end+0x141/0x350 [nfs] 10:09:48:[38400.388544] [<ffffffff81182539>] generic_file_buffered_write+0x189/0x2a0 10:09:48:[38400.389209] [<ffffffff811848f2>] __generic_file_aio_write+0x1e2/0x400 10:09:48:[38400.389897] [<ffffffff8121cf41>] ? update_time+0x81/0xd0 10:09:48:[38400.390467] [<ffffffff81184b69>] generic_file_aio_write+0x59/0xa0 10:09:48:[38400.391071] [<ffffffffc04e97db>] nfs_file_write+0xbb/0x1e0 [nfs] 10:09:48:[38400.391756] [<ffffffff8120027d>] do_sync_write+0x8d/0xd0 10:09:48:[38400.392323] [<ffffffff81200d3d>] vfs_write+0xbd/0x1e0 10:09:48:[38400.392825] [<ffffffff81201b4f>] SyS_write+0x7f/0xe0 10:09:48:[38400.393368] [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b 10:09:48:[38520.393616] INFO: task tee:16642 blocked for more than 120 seconds. 10:09:48:[38520.394291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. |
| Comment by Emoly Liu [ 25/Sep/17 ] |
|
+1 on master: https://testing.hpdd.intel.com/test_sets/141d3872-a018-11e7-b778-5254006e85c2 |