[LU-9777] replay-single test_70f: test failed to respond and timed out Created: 14/Jul/17  Updated: 13/Oct/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10008 ost-pools test_12: test failed to res... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a9e1439e-689b-11e7-9230-5254006e85c2.

The sub-test test_70f failed with the following error:

test failed to respond and timed out

Please provide additional information about the failure here.

Info required for matching: replay-single 70f



 Comments   
Comment by Sarah Liu [ 18/Sep/17 ]

another one on 2.10.1 https://testing.hpdd.intel.com/test_sets/4e4e2c74-9b4d-11e7-ba27-5254006e85c2
client console

10:01:57:[38268.794445] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_70f failing OST 5 times
10:01:57:[38268.956844] Lustre: DEBUG MARKER: test_70f failing OST 5 times
10:01:57:[38280.356600] INFO: task tee:16642 blocked for more than 120 seconds.
10:01:57:[38280.357275] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:01:57:[38280.358017] tee             D ffffffff816a7610     0 16642   1505 0x00000080
10:01:57:[38280.358735]  ffff88006740b9b0 0000000000000082 ffff880039d1cf10 ffff88006740bfd8
10:01:57:[38280.359563]  ffff88006740bfd8 ffff88006740bfd8 ffff880039d1cf10 ffff88007fd16cc0
10:01:57:[38280.360369]  0000000000000000 7fffffffffffffff ffff88007ff5d770 ffffffff816a7610
10:01:57:[38280.361164] Call Trace:
10:01:57:[38280.361414]  [<ffffffff816a7610>] ? bit_wait+0x50/0x50
10:01:57:[38280.361903]  [<ffffffff816a94e9>] schedule+0x29/0x70
10:01:57:[38280.362481]  [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0
10:01:57:[38280.363037]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
10:01:57:[38280.363671]  [<ffffffff816a7610>] ? bit_wait+0x50/0x50
10:01:57:[38280.364184]  [<ffffffff816a8b6d>] io_schedule_timeout+0xad/0x130
10:01:57:[38280.364760]  [<ffffffff816a8c08>] io_schedule+0x18/0x20
10:01:57:[38280.365299]  [<ffffffff816a7621>] bit_wait_io+0x11/0x50
10:01:57:[38280.365795]  [<ffffffff816a7145>] __wait_on_bit+0x65/0x90
10:01:57:[38280.366349]  [<ffffffff816a7610>] ? bit_wait+0x50/0x50
10:01:57:[38280.366840]  [<ffffffff816a71f1>] out_of_line_wait_on_bit+0x81/0xb0
10:01:57:[38280.367470]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
10:01:57:[38280.368041]  [<ffffffffc04f5db3>] nfs_wait_on_request+0x33/0x40 [nfs]
10:01:57:[38280.368720]  [<ffffffffc04faeb3>] nfs_updatepage+0x153/0x8d0 [nfs]
10:01:57:[38280.369338]  [<ffffffffc04ea371>] nfs_write_end+0x141/0x350 [nfs]
10:01:57:[38280.369918]  [<ffffffff81182539>] generic_file_buffered_write+0x189/0x2a0
10:01:57:[38280.370607]  [<ffffffff811848f2>] __generic_file_aio_write+0x1e2/0x400
10:01:57:[38280.371266]  [<ffffffff8121cf41>] ? update_time+0x81/0xd0
10:01:57:[38280.371776]  [<ffffffff81184b69>] generic_file_aio_write+0x59/0xa0
10:01:57:[38280.372407]  [<ffffffffc04e97db>] nfs_file_write+0xbb/0x1e0 [nfs]
10:01:57:[38280.372985]  [<ffffffff8120027d>] do_sync_write+0x8d/0xd0
10:01:57:[38280.373548]  [<ffffffff81200d3d>] vfs_write+0xbd/0x1e0
10:01:57:[38280.374036]  [<ffffffff81201b4f>] SyS_write+0x7f/0xe0
10:01:57:[38280.374567]  [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
10:01:57:[38285.330324] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
10:01:57:[38285.797355] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-49vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
10:01:57:[38285.989862] Lustre: DEBUG MARKER: onyx-49vm3.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
10:01:57:[38286.082508] Lustre: DEBUG MARKER: lctl get_param -n at_max
10:01:57:[38287.271192] Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 1 sec
10:01:57:[38287.462276] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 1 sec
10:01:57:[38289.182270] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param ldlm.namespaces.*.lru_size=clear
10:01:57:[38289.906118] Lustre: DEBUG MARKER: md5sum /mnt/lustre/d70f.replay-single/f70f.replay-single.onyx-49vm4
10:01:57:[38290.090446] Lustre: DEBUG MARKER: dd bs=1M count=10 if=/tmp/f70f.replay-single of=/mnt/lustre/d70f.replay-single/f70f.replay-single.onyx-49vm3.onyx.hpdd.intel.com
10:01:57:[38290.404005] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param ldlm.namespaces.*.lru_size=clear
10:01:57:[38291.165204] Lustre: DEBUG MARKER: rm -f /tmp/f70f.replay-single
10:01:57:[38291.487042] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
10:01:57:[38291.966735] Lustre: DEBUG MARKER: rc=0; val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ $? -eq 0 && $val -ne 0 ]]; then echo $(hostname -s): $val; rc=$val; fi; exit $rc
10:01:57:[38292.648060] Lustre: DEBUG MARKER: dmesg
10:01:57:[38400.374599] INFO: task tee:16642 blocked for more than 120 seconds.
10:01:57:[38400.375292] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
10:01:57:[38400.376074] tee             D ffffffff816a7610     0 16642   1505 0x00000080
10:01:57:[38400.376907]  ffff88006740b9b0 0000000000000082 ffff880039d1cf10 ffff88006740bfd8
10:01:57:[38400.377727]  ffff88006740bfd8 ffff88006740bfd8 ffff880039d1cf10 ffff88007fd16cc0
10:01:57:[38400.378543]  0000000000000000 7fffffffffffffff ffff88007ff5d770 ffffffff816a7610
10:09:48:[38400.379382] Call Trace:
10:09:48:[38400.379649]  [<ffffffff816a7610>] ? bit_wait+0x50/0x50
10:09:48:[38400.380240]  [<ffffffff816a94e9>] schedule+0x29/0x70
10:09:48:[38400.380731]  [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0
10:09:48:[38400.381355]  [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
10:09:48:[38400.381962]  [<ffffffff816a7610>] ? bit_wait+0x50/0x50
10:09:48:[38400.382518]  [<ffffffff816a8b6d>] io_schedule_timeout+0xad/0x130
10:09:48:[38400.383104]  [<ffffffff816a8c08>] io_schedule+0x18/0x20
10:09:48:[38400.383706]  [<ffffffff816a7621>] bit_wait_io+0x11/0x50
10:09:48:[38400.384256]  [<ffffffff816a7145>] __wait_on_bit+0x65/0x90
10:09:48:[38400.384785]  [<ffffffff816a7610>] ? bit_wait+0x50/0x50
10:09:48:[38400.385332]  [<ffffffff816a71f1>] out_of_line_wait_on_bit+0x81/0xb0
10:09:48:[38400.385951]  [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
10:09:48:[38400.386617]  [<ffffffffc04f5db3>] nfs_wait_on_request+0x33/0x40 [nfs]
10:09:48:[38400.387252]  [<ffffffffc04faeb3>] nfs_updatepage+0x153/0x8d0 [nfs]
10:09:48:[38400.387865]  [<ffffffffc04ea371>] nfs_write_end+0x141/0x350 [nfs]
10:09:48:[38400.388544]  [<ffffffff81182539>] generic_file_buffered_write+0x189/0x2a0
10:09:48:[38400.389209]  [<ffffffff811848f2>] __generic_file_aio_write+0x1e2/0x400
10:09:48:[38400.389897]  [<ffffffff8121cf41>] ? update_time+0x81/0xd0
10:09:48:[38400.390467]  [<ffffffff81184b69>] generic_file_aio_write+0x59/0xa0
10:09:48:[38400.391071]  [<ffffffffc04e97db>] nfs_file_write+0xbb/0x1e0 [nfs]
10:09:48:[38400.391756]  [<ffffffff8120027d>] do_sync_write+0x8d/0xd0
10:09:48:[38400.392323]  [<ffffffff81200d3d>] vfs_write+0xbd/0x1e0
10:09:48:[38400.392825]  [<ffffffff81201b4f>] SyS_write+0x7f/0xe0
10:09:48:[38400.393368]  [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
10:09:48:[38520.393616] INFO: task tee:16642 blocked for more than 120 seconds.
10:09:48:[38520.394291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Comment by Emoly Liu [ 25/Sep/17 ]

+1 on master: https://testing.hpdd.intel.com/test_sets/141d3872-a018-11e7-b778-5254006e85c2

Generated at Sat Feb 10 02:29:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.