Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.5.0, Lustre 2.5.3
-
None
-
client and server: lustre-b2_5 build #2 RHEL6 ldiskfs
-
3
-
11303
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/de5e20d0-399a-11e3-8e4c-52540035b04c.
The sub-test test_failover_ost failed with the following error:
import is not in FULL state
client 3 console shows:
13:45:10:Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=6700 DURATION=86400 PERIOD=900
13:45:10:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] &&
13:45:10: echo $(< /proc/sys/lnet/catastrophe) || echo 0);
13:45:11: if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi
13:45:11: exit $rc
13:45:11:Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_tar.sh
13:45:11:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Wait ost7 recovery complete before doing next failover...
13:45:11:Lustre: DEBUG MARKER: Wait ost7 recovery complete before doing next failover...
13:45:52:LustreError: 16584:0:(osc_cache.c:897:osc_extent_wait()) extent ffff88005197cd08@{[0 -> 2/255], [2|0|+|cache|wihuY|ffff8800507e21b8], [12288|3|+|-|ffff88005046e3d8|256|(null)]} lustre-OST0002-osc-ffff880037e99000: wait ext to 0 timedout, recovery in progress?
13:46:23:INFO: task tar:28315 blocked for more than 120 seconds.
13:46:23:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13:46:23:tar D 0000000000000000 0 28315 28313 0x00000080
13:46:23: ffff88004fa89a18 0000000000000086 ffff88004fa899c8 ffffffff810097cc
13:46:23: ffff88007d4b4b18 0000000000000000 0000000000a899d8 ffff880002214280
13:46:24: ffff880066aab058 ffff88004fa89fd8 000000000000fb88 ffff880066aab058
13:46:24:Call Trace:
13:46:24: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
13:46:24: [<ffffffff8150e130>] ? thread_return+0x4e/0x76e
13:46:24: [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
13:46:24: [<ffffffffa0696bf0>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
13:46:24: [<ffffffff8150ec23>] wait_for_common+0x123/0x180
13:46:24: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
13:46:24: [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
13:46:24: [<ffffffffa089d5ec>] osc_io_setattr_end+0xbc/0x190 [osc]
13:46:24: [<ffffffffa092cdf0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
13:46:24: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:46:24: [<ffffffffa055b780>] ? cl_io_start+0x0/0x140 [obdclass]
13:46:24: [<ffffffffa092cee1>] lov_io_end_wrapper+0xf1/0x100 [lov]
13:46:24: [<ffffffffa092c86e>] lov_io_call+0x8e/0x130 [lov]
13:46:24: [<ffffffffa092e5dc>] lov_io_end+0x4c/0xf0 [lov]
13:46:24: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:46:24: [<ffffffffa055ff02>] cl_io_loop+0xc2/0x1b0 [obdclass]
13:46:25: [<ffffffffa09fcfc8>] cl_setattr_ost+0x218/0x2f0 [lustre]
13:46:25: [<ffffffffa09c9145>] ll_setattr_raw+0xa45/0x1070 [lustre]
13:46:25: [<ffffffffa09c97cd>] ll_setattr+0x5d/0xf0 [lustre]
13:46:25: [<ffffffff8119ea48>] notify_change+0x168/0x340
13:46:25: [<ffffffff811b2aec>] utimes_common+0xdc/0x1b0
13:46:25: [<ffffffff81182bc1>] ? __fput+0x1a1/0x210
13:46:25: [<ffffffff811b2c9e>] do_utimes+0xde/0xf0
13:46:25: [<ffffffff811b2db2>] sys_utimensat+0x32/0x90
13:46:25: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
13:48:28:INFO: task tar:28315 blocked for more than 120 seconds.
13:48:29:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13:48:29:tar D 0000000000000000 0 28315 28313 0x00000080
13:48:30: ffff88004fa89a18 0000000000000086 ffff88004fa899c8 ffffffff810097cc
13:48:31: ffff88007d4b4b18 0000000000000000 0000000000a899d8 ffff880002214280
13:48:31: ffff880066aab058 ffff88004fa89fd8 000000000000fb88 ffff880066aab058
13:48:32:Call Trace:
13:48:33: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
13:48:33: [<ffffffff8150e130>] ? thread_return+0x4e/0x76e
13:48:34: [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
13:48:35: [<ffffffffa0696bf0>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
13:48:36: [<ffffffff8150ec23>] wait_for_common+0x123/0x180
13:48:36: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
13:48:36: [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
13:48:37: [<ffffffffa089d5ec>] osc_io_setattr_end+0xbc/0x190 [osc]
13:48:37: [<ffffffffa092cdf0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
13:48:38: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:48:38: [<ffffffffa055b780>] ? cl_io_start+0x0/0x140 [obdclass]
13:48:39: [<ffffffffa092cee1>] lov_io_end_wrapper+0xf1/0x100 [lov]
13:48:40: [<ffffffffa092c86e>] lov_io_call+0x8e/0x130 [lov]
13:48:41: [<ffffffffa092e5dc>] lov_io_end+0x4c/0xf0 [lov]
13:48:41: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:48:42: [<ffffffffa055ff02>] cl_io_loop+0xc2/0x1b0 [obdclass]
13:48:42: [<ffffffffa09fcfc8>] cl_setattr_ost+0x218/0x2f0 [lustre]
13:48:42: [<ffffffffa09c9145>] ll_setattr_raw+0xa45/0x1070 [lustre]
13:48:43: [<ffffffffa09c97cd>] ll_setattr+0x5d/0xf0 [lustre]
13:48:43: [<ffffffff8119ea48>] notify_change+0x168/0x340
13:48:43: [<ffffffff811b2aec>] utimes_common+0xdc/0x1b0
13:48:43: [<ffffffff81182bc1>] ? __fput+0x1a1/0x210
13:48:44: [<ffffffff811b2c9e>] do_utimes+0xde/0xf0
13:48:44: [<ffffffff811b2db2>] sys_utimensat+0x32/0x90
13:48:44: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
13:50:30:INFO: task tar:28315 blocked for more than 120 seconds.
13:50:32:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13:50:32:tar D 0000000000000000 0 28315 28313 0x00000080
13:50:33: ffff88004fa89a18 0000000000000086 ffff88004fa899c8 ffffffff810097cc
13:50:33: ffff88007d4b4b18 0000000000000000 0000000000a899d8 ffff880002214280
13:50:34: ffff880066aab058 ffff88004fa89fd8 000000000000fb88 ffff880066aab058
13:50:34:Call Trace:
13:50:34: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
13:50:35: [<ffffffff8150e130>] ? thread_return+0x4e/0x76e
13:50:35: [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
13:50:36: [<ffffffffa0696bf0>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
13:50:38: [<ffffffff8150ec23>] wait_for_common+0x123/0x180
13:50:39: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
13:50:39: [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
13:50:40: [<ffffffffa089d5ec>] osc_io_setattr_end+0xbc/0x190 [osc]
13:50:40: [<ffffffffa092cdf0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
13:50:41: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:50:41: [<ffffffffa055b780>] ? cl_io_start+0x0/0x140 [obdclass]
13:50:42: [<ffffffffa092cee1>] lov_io_end_wrapper+0xf1/0x100 [lov]
13:50:42: [<ffffffffa092c86e>] lov_io_call+0x8e/0x130 [lov]
13:50:42: [<ffffffffa092e5dc>] lov_io_end+0x4c/0xf0 [lov]
13:50:43: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:50:43: [<ffffffffa055ff02>] cl_io_loop+0xc2/0x1b0 [obdclass]
13:50:43: [<ffffffffa09fcfc8>] cl_setattr_ost+0x218/0x2f0 [lustre]
13:50:44: [<ffffffffa09c9145>] ll_setattr_raw+0xa45/0x1070 [lustre]
13:50:44: [<ffffffffa09c97cd>] ll_setattr+0x5d/0xf0 [lustre]
13:50:45: [<ffffffff8119ea48>] notify_change+0x168/0x340
13:50:45: [<ffffffff811b2aec>] utimes_common+0xdc/0x1b0
13:50:45: [<ffffffff81182bc1>] ? __fput+0x1a1/0x210
13:50:45: [<ffffffff811b2c9e>] do_utimes+0xde/0xf0
13:50:45: [<ffffffff811b2db2>] sys_utimensat+0x32/0x90
13:50:46: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
13:52:09:LustreError: 11-0: lustre-OST0002-osc-ffff880037e99000: Communicating with 10.10.4.183@tcp, operation ldlm_enqueue failed with -12.
13:52:09:LustreError: Skipped 391089 previous similar messages
13:52:30:INFO: task tar:28315 blocked for more than 120 seconds.
13:52:30:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13:52:30:tar D 0000000000000000 0 28315 28313 0x00000080
13:52:30: ffff88004fa89a18 0000000000000086 ffff88004fa899c8 ffffffff810097cc
13:52:30: ffff88007d4b4b18 0000000000000000 0000000000a899d8 ffff880002214280
13:52:30: ffff880066aab058 ffff88004fa89fd8 000000000000fb88 ffff880066aab058
13:52:30:Call Trace:
13:52:30: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
13:52:30: [<ffffffff8150e130>] ? thread_return+0x4e/0x76e
13:52:30: [<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
13:52:30: [<ffffffffa0696bf0>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
13:52:30: [<ffffffff8150ec23>] wait_for_common+0x123/0x180
13:52:31: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
13:52:31: [<ffffffff8150ed3d>] wait_for_completion+0x1d/0x20
13:52:31: [<ffffffffa089d5ec>] osc_io_setattr_end+0xbc/0x190 [osc]
13:52:31: [<ffffffffa092cdf0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
13:52:31: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:52:31: [<ffffffffa055b780>] ? cl_io_start+0x0/0x140 [obdclass]
13:52:31: [<ffffffffa092cee1>] lov_io_end_wrapper+0xf1/0x100 [lov]
13:52:31: [<ffffffffa092c86e>] lov_io_call+0x8e/0x130 [lov]
13:52:31: [<ffffffffa092e5dc>] lov_io_end+0x4c/0xf0 [lov]
13:52:31: [<ffffffffa055aed0>] cl_io_end+0x60/0x150 [obdclass]
13:52:31: [<ffffffffa055ff02>] cl_io_loop+0xc2/0x1b0 [obdclass]
13:52:31: [<ffffffffa09fcfc8>] cl_setattr_ost+0x218/0x2f0 [lustre]
13:52:31: [<ffffffffa09c9145>] ll_setattr_raw+0xa45/0x1070 [lustre]
13:52:31: [<ffffffffa09c97cd>] ll_setattr+0x5d/0xf0 [lustre]
13:52:31: [<ffffffff8119ea48>] notify_change+0x168/0x340
13:52:31: [<ffffffff811b2aec>] utimes_common+0xdc/0x1b0
13:52:31: [<ffffffff81182bc1>] ? __fput+0x1a1/0x210
13:52:32: [<ffffffff811b2c9e>] do_utimes+0xde/0xf0
13:52:32: [<ffffffff811b2db2>] sys_utimensat+0x32/0x90
13:52:32: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
13:56:27:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Checking clients are in FULL state before doing next failover...
13:56:30:Lustre: DEBUG MARKER: Checking clients are in FULL state before doing next failover...
13:56:30:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
13:56:30:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:30:Lustre: DEBUG MARKER: lctl get_param -n at_max
13:56:30:Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0001-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: osc.lustre-OST0001-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0001-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0001-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: osc.lustre-OST0001-osc-*.ost_server_uuid in FULL state after 0 sec
13:56:31:Lustre: DEBUG MARKER: osc.lustre-OST0001-osc-*.ost_server_uuid in FULL state after 0 sec
14:02:10:LustreError: 11-0: lustre-OST0002-osc-ffff880037e99000: Communicating with 10.10.4.183@tcp, operation ldlm_enqueue failed with -12.
14:02:10:LustreError: Skipped 464241 previous similar messages
14:07:35:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for osc.lustre-OST0002-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_WAIT
14:07:37:Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for osc.lustre-OST0002-osc-*.ost_server_uuid into FULL state after 662 sec, have REPLAY_WAIT
14:07:37:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests
14:07:37:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /tmp/test_logs/1382129779/rpc..debug_log.$(hostname -s).1382130447.log;
14:07:37: dmesg > /tmp/test_logs/1382129779/rpc..dmesg.$(hostname -s).1382130447.log
14:07:37:Lustre: DEBUG MARKER: rsync -az /tmp/test_logs/1382129779/rpc..*.1382130447.log client-30vm5.lab.whamcloud.com:/tmp/test_logs/1382129779
14:07:37:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for osc.lustre-OST0002-osc-*.ost_server_uuid into FULL state after 662 sec, have CONNECTING