Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.8.0
-
None
-
server: lustre-master build # 3093 RHEL7.1
client: SLES11 SP3
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/816ceeca-2623-11e5-92e6-5254006e85c2.
The sub-test test_pairwise_fail failed with the following error:
mount /mnt/lustre on onyx-34vm6 failed
OST dmesg
[ 2212.173927] Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK [ 2212.998458] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Done checking client loads. Failing type1=clients item1=onyx-34vm5,onyx-34vm6 ... [ 2213.336608] Lustre: DEBUG MARKER: Done checking client loads. Failing type1=clients item1=onyx-34vm5,onyx-34vm6 ... [ 2253.919179] LNet: Service thread pid 4319 was inactive for 40.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 2253.926365] Pid: 4319, comm: ll_ost_io00_067 [ 2253.928076] Call Trace: [ 2253.931101] [<ffffffff8160a409>] schedule+0x29/0x70 [ 2253.933951] [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0 [ 2253.936095] [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc] [ 2253.938049] [<ffffffff8107ee80>] ? process_timeout+0x0/0x10 [ 2253.940092] [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc] [ 2253.941956] [<ffffffff810a9650>] ? default_wake_function+0x0/0x20 [ 2253.944070] [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc] [ 2253.945900] [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0 [ 2253.947956] [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 2253.949879] [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc] [ 2253.952132] [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [ 2253.954241] [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [ 2253.956362] [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] [ 2253.958180] [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40 [ 2253.960260] [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc] [ 2253.962104] [<ffffffff8109739f>] kthread+0xcf/0xe0 [ 2253.965627] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2253.967444] [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 [ 2253.969266] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2253.972575] LustreError: dumping log to /tmp/lustre-log.1436386922.4319 [ 2254.275273] Pid: 4320, comm: ll_ost_io00_068 [ 2254.279470] Call Trace: [ 2254.285875] [<ffffffff8160a409>] schedule+0x29/0x70 [ 2254.287722] [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0 [ 2254.289620] [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc] [ 2254.291530] [<ffffffff8107ee80>] ? process_timeout+0x0/0x10 [ 2254.293318] [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc] [ 2254.295041] [<ffffffff810a9650>] ? default_wake_function+0x0/0x20 [ 2254.296794] [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc] [ 2254.298482] [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0 [ 2254.300158] [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 2254.301808] [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc] [ 2254.303486] [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [ 2254.305306] [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [ 2254.306949] [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] [ 2254.308571] [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40 [ 2254.310115] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 [ 2254.311663] [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc] [ 2254.313179] [<ffffffff8109739f>] kthread+0xcf/0xe0 [ 2254.314588] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2254.315973] [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 [ 2254.317357] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2254.319694] Pid: 4330, comm: ll_ost_io00_071 [ 2254.320938] Call Trace: [ 2254.323020] [<ffffffff8160a409>] schedule+0x29/0x70 [ 2254.324327] [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0 [ 2254.325717] [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc] [ 2254.327192] [<ffffffff8107ee80>] ? process_timeout+0x0/0x10 [ 2254.328564] [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc] [ 2254.329942] [<ffffffff810a9650>] ? default_wake_function+0x0/0x20 [ 2254.331377] [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc] [ 2254.332768] [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0 [ 2254.334173] [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 2254.335613] [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc] [ 2254.337071] [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [ 2254.338680] [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [ 2254.340149] [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] [ 2254.341516] [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40 [ 2254.342882] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 [ 2254.344241] [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc] [ 2254.345619] [<ffffffff8109739f>] kthread+0xcf/0xe0 [ 2254.346921] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2254.348223] [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 [ 2254.349523] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2254.351850] Pid: 4257, comm: ll_ost_io00_039 [ 2254.353101] Call Trace: [ 2254.355139] [<ffffffff8160a409>] schedule+0x29/0x70 [ 2254.356389] [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0 [ 2254.357738] [<ffffffffa081e3aa>] ? ptlrpc_start_bulk_transfer+0x16a/0x710 [ptlrpc] [ 2254.359179] [<ffffffff8107ee80>] ? process_timeout+0x0/0x10 [ 2254.360681] [<ffffffffa07e2cae>] target_bulk_io+0x4de/0xb00 [ptlrpc] [ 2254.362039] [<ffffffff810a9650>] ? default_wake_function+0x0/0x20 [ 2254.363468] [<ffffffffa088f941>] tgt_brw_write+0x10b1/0x1650 [ptlrpc] [ 2254.364833] [<ffffffff812dfbab>] ? string.isra.6+0x3b/0xf0 [ 2254.366180] [<ffffffffa07e01f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 2254.367586] [<ffffffffa088b29b>] tgt_request_handle+0x88b/0x1100 [ptlrpc] [ 2254.369024] [<ffffffffa0832fbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [ 2254.370488] [<ffffffffa0830078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [ 2254.371916] [<ffffffffa0836900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] [ 2254.373266] [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40 [ 2254.374639] [<ffffffffa0835d00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc] [ 2254.375995] [<ffffffff8109739f>] kthread+0xcf/0xe0 [ 2254.377272] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2254.378521] [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 [ 2254.379843] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 2254.382140] LNet: Service thread pid 2840 was inactive for 40.17s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [ 2255.071164] LNet: Service thread pid 4267 was inactive for 40.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [ 2255.076671] LNet: Skipped 7 previous similar messages [ 2255.078304] LustreError: dumping log to /tmp/lustre-log.1436386923.4267 [ 2264.610805] Lustre: lustre-OST0001: haven't heard from client 2d4be017-9a1c-7408-76c8-bc0239710d98 (at 10.2.4.133@tcp) in 49 seconds. I think it's dead, and I am evicting it. exp ffff88004084b000, cur 1436386933 expire 1436386903 last 1436386884 [ 2264.617306] Lustre: Skipped 6 previous similar messages [ 2266.890151] LustreError: 4269:0:(ldlm_lib.c:3077:target_bulk_io()) @@@ Eviction on bulk WRITE req@ffff880051723000 x1506160196432764/t0(0) o4->2d4be017-9a1c-7408-76c8-bc0239710d98@10.2.4.133@tcp:199/0 lens 608/448 e 2 to 0 dl 1436386944 ref 1 fl Interpret:/0/0 rc 0/0 [ 2266.905189] Lustre: lustre-OST0003: Bulk IO write error with 2d4be017-9a1c-7408-76c8-bc0239710d98 (at 10.2.4.133@tcp), client will retry: rc -107