Apr 22 00:00:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109716c6000 Apr 22 00:01:37 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101640da000 Apr 22 00:01:37 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810656792000 Apr 22 00:01:37 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 00:01:37 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 7 previous similar messages Apr 22 00:02:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 00:02:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 5 previous similar messages Apr 22 00:02:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.7.189@o2ib (12) Apr 22 00:02:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 5 previous similar messages Apr 22 00:02:24 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81040820c000 Apr 22 00:02:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101632f0000 Apr 22 00:03:11 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089d3ea000 Apr 22 00:03:30 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b742c000 Apr 22 00:03:30 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d7c84000 Apr 22 00:03:57 lfs-oss-1-13 kernel: LustreError: 32167:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff81098bde5c00 x1399132011992465/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335053037 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:04:28 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105fefb4000 Apr 22 00:04:29 lfs-oss-1-13 kernel: Lustre: 32127:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0086: ignoring bulk IO comm error with 79d2d533-5e1a-b892-c7a0-4d01ee57e619@NET_0x500000aae07bf_UUID id 12345-10.174.7.191@o2ib - client will retry Apr 22 00:04:29 lfs-oss-1-13 kernel: Lustre: 32127:0:(ost_handler.c:887:ost_brw_read()) Skipped 12 previous similar messages Apr 22 00:04:36 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81055a311000 Apr 22 00:04:46 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107dd424000 Apr 22 00:04:48 lfs-oss-1-13 kernel: Lustre: scratch1-OST0087: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:04:48 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 00:05:09 lfs-oss-1-13 kernel: Lustre: 31781:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff810c17c16800; still busy with 1 active RPCs Apr 22 00:05:09 lfs-oss-1-13 kernel: Lustre: 31781:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 12 previous similar messages Apr 22 00:05:09 lfs-oss-1-13 kernel: LustreError: 31781:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c1528a400 x1398901147200104/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335053209 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 00:05:09 lfs-oss-1-13 kernel: LustreError: 31781:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 14 previous similar messages Apr 22 00:05:10 lfs-oss-1-13 kernel: LustreError: 32037:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c1476b800 x1398901147196124/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335053223 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:05:10 lfs-oss-1-13 kernel: LustreError: 32037:0:(ost_handler.c:829:ost_brw_read()) Skipped 10 previous similar messages Apr 22 00:05:13 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b0b66e000 Apr 22 00:05:25 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST008b_UUID lock: ffff810b7d8da800/0xcca1a6f6bf5f4811 lrc: 3/0,0 mode: PW/PW res: 32587672/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0xa6a88422cac004c8 expref: 12 pid: 31972 timeout 5259610803 Apr 22 00:06:09 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81048f8ae000 Apr 22 00:06:27 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073543e000 Apr 22 00:06:29 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b0b66e000 Apr 22 00:07:44 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ccb80000 Apr 22 00:08:08 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a2514000 Apr 22 00:08:59 lfs-oss-1-13 kernel: Lustre: scratch1-OST0084: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:08:59 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 00:09:00 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb5398000 Apr 22 00:09:11 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b49ca000 Apr 22 00:09:40 lfs-oss-1-13 kernel: Lustre: 31892:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008a: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 00:09:40 lfs-oss-1-13 kernel: Lustre: 31892:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 152 previous similar messages Apr 22 00:09:48 lfs-oss-1-13 kernel: LustreError: 32133:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105a9b70000 x1399132011998535/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335053388 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:09:57 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106eb4f8000 Apr 22 00:10:52 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ba9012000 Apr 22 00:10:52 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81059e2c4000 Apr 22 00:10:52 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101bef22000 Apr 22 00:10:54 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016623a000 Apr 22 00:12:08 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106dd6f7000 Apr 22 00:12:10 lfs-oss-1-13 kernel: Lustre: scratch1-OST0089: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:12:27 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ea654000 Apr 22 00:13:49 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b742c000 Apr 22 00:13:49 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 00:13:49 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 00:14:46 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81067275e000 Apr 22 00:14:52 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102c54f8000 Apr 22 00:15:35 lfs-oss-1-13 kernel: LustreError: 32108:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c2ea59450 x1399132012004161/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335053735 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:15:35 lfs-oss-1-13 kernel: Lustre: 32108:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:15:35 lfs-oss-1-13 kernel: Lustre: 32108:0:(ost_handler.c:887:ost_brw_read()) Skipped 20 previous similar messages Apr 22 00:16:07 lfs-oss-1-13 kernel: Lustre: 31803:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105e3a98c00; still busy with 1 active RPCs Apr 22 00:16:07 lfs-oss-1-13 kernel: Lustre: 31803:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 22 previous similar messages Apr 22 00:16:07 lfs-oss-1-13 kernel: LustreError: 31803:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c274f1000 x1398901147215285/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335053867 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 00:16:07 lfs-oss-1-13 kernel: LustreError: 31803:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 22 previous similar messages Apr 22 00:16:07 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81012be21000 Apr 22 00:16:07 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105e48dbac0 Apr 22 00:16:07 lfs-oss-1-13 kernel: LustreError: 32176:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810365472400 x1398901147211859/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335053847 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:16:07 lfs-oss-1-13 kernel: LustreError: 32176:0:(ost_handler.c:829:ost_brw_read()) Skipped 18 previous similar messages Apr 22 00:16:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 00:16:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 6 previous similar messages Apr 22 00:16:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (41) Apr 22 00:16:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 6 previous similar messages Apr 22 00:16:35 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e66d6000 Apr 22 00:17:23 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100390a4000 Apr 22 00:17:36 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:18:08 lfs-oss-1-13 kernel: LustreError: 32040:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810ae1358400 x1398901147221191/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335053888 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:18:08 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ba9012000 Apr 22 00:18:38 lfs-oss-1-13 kernel: Lustre: 31838:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897310266988 sent from scratch1-OST008b to NID 10.174.6.174@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 00:18:38 lfs-oss-1-13 kernel: req@ffff810c1b942800 x1398897310266988/t0 o104->@NET_0x500000aae06ae_UUID:15/16 lens 296/384 e 0 to 1 dl 1335053918 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 00:18:38 lfs-oss-1-13 kernel: Lustre: 31838:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 00:18:38 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008b: A client on nid 10.174.6.174@o2ib was evicted due to a lock blocking callback to 10.174.6.174@o2ib timed out: rc -107 Apr 22 00:18:38 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 00:18:39 lfs-oss-1-13 kernel: LustreError: 32092:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c3951dc00 x1398901147224865/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335054113 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:18:39 lfs-oss-1-13 kernel: LustreError: 32092:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 00:19:11 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ef694000 Apr 22 00:19:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d7c84000 Apr 22 00:19:42 lfs-oss-1-13 kernel: Lustre: 31733:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 00:19:42 lfs-oss-1-13 kernel: Lustre: 31733:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 130 previous similar messages Apr 22 00:20:39 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008d16e000 Apr 22 00:20:58 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c7a82000 Apr 22 00:21:50 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810abd9ba000 Apr 22 00:22:13 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81032fccc000 Apr 22 00:22:58 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810448176000 Apr 22 00:23:26 lfs-oss-1-13 kernel: Lustre: 31811:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897310399927 sent from scratch1-OST008d to NID 10.174.6.174@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 00:23:26 lfs-oss-1-13 kernel: req@ffff8105bf403400 x1398897310399927/t0 o104->@NET_0x500000aae06ae_UUID:15/16 lens 296/384 e 0 to 1 dl 1335054206 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 00:23:26 lfs-oss-1-13 kernel: Lustre: 31811:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 00:23:26 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.6.174@o2ib was evicted due to a lock blocking callback to 10.174.6.174@o2ib timed out: rc -107 Apr 22 00:23:26 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 00:23:26 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c273edc00 x1398897310407588/t0 o105->@NET_0x500000aae06ae_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 00:23:26 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.174@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008d_UUID lock: ffff810141ac3a00/0xcca1a6f6bfd7abcf lrc: 3/0,0 mode: PW/PW res: 32623205/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1007615) flags: 0x0 remote: 0xa6a88422cad93565 expref: 16 pid: 31811 timeout 0 Apr 22 00:23:27 lfs-oss-1-13 kernel: LustreError: 31998:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105b98f2c00 x1398901147291601/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335054492 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:23:27 lfs-oss-1-13 kernel: LustreError: 31998:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 00:24:20 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b93e14ac0 Apr 22 00:24:20 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ff4ab000 Apr 22 00:24:20 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 00:24:20 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 00:24:47 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100742ba000 Apr 22 00:26:03 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ac01fc000 Apr 22 00:26:20 lfs-oss-1-13 kernel: Lustre: 31969:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897310426808 sent from scratch1-OST0086 to NID 10.174.6.174@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 00:26:20 lfs-oss-1-13 kernel: req@ffff810c22108400 x1398897310426808/t0 o106->@NET_0x500000aae06ae_UUID:15/16 lens 296/424 e 0 to 1 dl 1335054380 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 00:26:39 lfs-oss-1-13 kernel: Lustre: 31959:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008a: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff810a4a95ec00; still busy with 1 active RPCs Apr 22 00:26:39 lfs-oss-1-13 kernel: Lustre: 31959:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 14 previous similar messages Apr 22 00:26:39 lfs-oss-1-13 kernel: LustreError: 31959:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105e8c7d800 x1399132012067709/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335054499 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 00:26:39 lfs-oss-1-13 kernel: LustreError: 31959:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 15 previous similar messages Apr 22 00:26:39 lfs-oss-1-13 kernel: LustreError: 32051:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810605fa2800 x1399132012039346/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335054503 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:26:39 lfs-oss-1-13 kernel: LustreError: 32051:0:(ost_handler.c:829:ost_brw_read()) Skipped 12 previous similar messages Apr 22 00:26:39 lfs-oss-1-13 kernel: Lustre: 32051:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:26:39 lfs-oss-1-13 kernel: Lustre: 32051:0:(ost_handler.c:887:ost_brw_read()) Skipped 17 previous similar messages Apr 22 00:26:45 lfs-oss-1-13 kernel: Lustre: scratch1-OST0085: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:27:29 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c21bcc980 Apr 22 00:27:29 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810476c90000 Apr 22 00:27:48 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b4938000 Apr 22 00:28:38 lfs-oss-1-13 kernel: LustreError: 32225:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105eb972c00 x1399132012069938/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335054518 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:28:49 lfs-oss-1-13 kernel: LustreError: 32121:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8106c5d24800 x1398901147321569/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 1 to 0 dl 1335054633 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:28:49 lfs-oss-1-13 kernel: LustreError: 32121:0:(ost_handler.c:1064:ost_brw_write()) Skipped 3 previous similar messages Apr 22 00:28:49 lfs-oss-1-13 kernel: Lustre: 32121:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 00:29:35 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81096bb59000 Apr 22 00:29:50 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 00:29:50 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 4 previous similar messages Apr 22 00:29:50 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (36) Apr 22 00:29:50 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 4 previous similar messages Apr 22 00:29:50 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81003a058000 Apr 22 00:29:50 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0086: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 00:29:50 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 142 previous similar messages Apr 22 00:31:06 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b6ca4000 Apr 22 00:33:04 lfs-oss-1-13 kernel: Lustre: 32109:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897310604361 sent from scratch1-OST0084 to NID 10.174.6.174@o2ib 8s ago has timed out (8s prior to deadline). Apr 22 00:33:04 lfs-oss-1-13 kernel: req@ffff810a38b69c00 x1398897310604361/t0 o104->@NET_0x500000aae06ae_UUID:15/16 lens 296/384 e 0 to 1 dl 1335054784 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 00:33:04 lfs-oss-1-13 kernel: Lustre: 32109:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Apr 22 00:33:04 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0084: A client on nid 10.174.6.174@o2ib was evicted due to a lock blocking callback to 10.174.6.174@o2ib timed out: rc -107 Apr 22 00:33:07 lfs-oss-1-13 kernel: LustreError: 32233:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105a8d08400 x1398901147691907/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335054918 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:33:23 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810392f5a000 Apr 22 00:33:25 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2a96c000 Apr 22 00:33:25 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102ad532000 Apr 22 00:33:25 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107450f6000 Apr 22 00:33:25 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109afc37e80 Apr 22 00:33:25 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810799b5c000 Apr 22 00:33:25 lfs-oss-1-13 kernel: LustreError: 32170:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8106b07b8400 x1399132012617441/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335054950 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:33:25 lfs-oss-1-13 kernel: Lustre: 32170:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:34:45 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81070008a000 Apr 22 00:34:45 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101534de000 Apr 22 00:34:45 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81017a352000 Apr 22 00:34:45 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 00:34:45 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 00:35:36 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0086: A client on nid 10.174.6.174@o2ib was evicted due to a lock blocking callback to 10.174.6.174@o2ib timed out: rc -107 Apr 22 00:35:36 lfs-oss-1-13 kernel: LustreError: Skipped 8 previous similar messages Apr 22 00:35:56 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810461c0e000 Apr 22 00:35:56 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105a27275c0 Apr 22 00:35:56 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81084baa7e00 Apr 22 00:35:56 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81084baa7e00 Apr 22 00:35:56 lfs-oss-1-13 kernel: LustreError: 32108:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(77622) req@ffff810c2f8eac50 x1398901147701384/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055082 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:35:56 lfs-oss-1-13 kernel: Lustre: 32108:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 00:35:56 lfs-oss-1-13 kernel: LustreError: 32016:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c35fc4c50 x1398901147701347/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055081 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:36:09 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ac2b3c000 Apr 22 00:36:43 lfs-oss-1-13 kernel: Lustre: 31924:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from d17457d6-ca72-8bd4-d22a-0f557ff97761@10.174.8.213@o2ib to 0xffff8105da0c7c00; still busy with 1 active RPCs Apr 22 00:36:43 lfs-oss-1-13 kernel: Lustre: 31924:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 26 previous similar messages Apr 22 00:36:43 lfs-oss-1-13 kernel: LustreError: 31924:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105edf4cc00 x1398900875490378/t0 o8->d17457d6-ca72-8bd4-d22a-0f557ff97761@NET_0x500000aae08d5_UUID:0/0 lens 368/264 e 0 to 0 dl 1335055103 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 00:36:43 lfs-oss-1-13 kernel: LustreError: 31924:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 32 previous similar messages Apr 22 00:36:43 lfs-oss-1-13 kernel: LustreError: 32232:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105aac51000 x1398900875489326/t0 o3->d17457d6-ca72-8bd4-d22a-0f557ff97761@NET_0x500000aae08d5_UUID:0/0 lens 448/400 e 0 to 0 dl 1335055079 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:36:43 lfs-oss-1-13 kernel: LustreError: 32232:0:(ost_handler.c:829:ost_brw_read()) Skipped 16 previous similar messages Apr 22 00:36:43 lfs-oss-1-13 kernel: Lustre: 32232:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with d17457d6-ca72-8bd4-d22a-0f557ff97761@NET_0x500000aae08d5_UUID id 12345-10.174.8.213@o2ib - client will retry Apr 22 00:36:43 lfs-oss-1-13 kernel: Lustre: 32232:0:(ost_handler.c:887:ost_brw_read()) Skipped 19 previous similar messages Apr 22 00:37:25 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ae1b4000 Apr 22 00:37:48 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810382c3c2c0 Apr 22 00:37:48 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b638ba000 Apr 22 00:37:49 lfs-oss-1-13 kernel: LustreError: 32063:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c2214cc00 x1398901147703742/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055107 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:37:49 lfs-oss-1-13 kernel: Lustre: 32063:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 00:37:49 lfs-oss-1-13 kernel: Lustre: 32063:0:(ost_handler.c:1224:ost_brw_write()) Skipped 1 previous similar message Apr 22 00:38:45 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029ef9e000 Apr 22 00:38:58 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810690676000 Apr 22 00:39:30 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81091f2da7c0 Apr 22 00:39:30 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c124239c0 Apr 22 00:39:30 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bdbfa0000 Apr 22 00:39:30 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e0e4c000 Apr 22 00:39:55 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 127s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST008e_UUID lock: ffff81038ec37600/0xcca1a6f6c01a38d3 lrc: 3/0,0 mode: PR/PR res: 32645736/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xa6a88422cb35d487 expref: 8 pid: 31823 timeout 5261680443 Apr 22 00:39:55 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff81067287c400 x1398897310752203/t0 o105->@NET_0x500000aae06ae_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 00:39:55 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.174@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008e_UUID lock: ffff8101f008e400/0xcca1a6f6c01a426c lrc: 3/0,0 mode: PW/PW res: 32645736/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0xa6a88422cb35edbd expref: 7 pid: 31970 timeout 0 Apr 22 00:40:01 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101bc274000 Apr 22 00:40:01 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81042f738000 Apr 22 00:40:01 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81091e91e000 Apr 22 00:40:04 lfs-oss-1-13 kernel: Lustre: 31958:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008b: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 00:40:04 lfs-oss-1-13 kernel: Lustre: 31958:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 226 previous similar messages Apr 22 00:40:19 lfs-oss-1-13 kernel: LustreError: 32217:0:(ost_handler.c:1057:ost_brw_write()) @@@ timeout on bulk GET after 100+0s req@ffff810c1a41e000 x1398901147707961/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055219 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:40:19 lfs-oss-1-13 kernel: Lustre: 32217:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 00:40:28 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:40:28 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 00:41:12 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 00:41:12 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 00:41:12 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (18) Apr 22 00:41:12 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 00:41:12 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0be36000 Apr 22 00:41:55 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029ef9e000 Apr 22 00:42:13 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101fa28e000 Apr 22 00:42:13 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a9132000 Apr 22 00:42:13 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108a9132000 Apr 22 00:42:13 lfs-oss-1-13 kernel: LustreError: 32037:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff8105d9547c00 x1398901147717519/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055550 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:42:13 lfs-oss-1-13 kernel: Lustre: 32037:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 00:42:15 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c232b0680 Apr 22 00:42:15 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054d76c000 Apr 22 00:42:52 lfs-oss-1-13 kernel: LustreError: 32241:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105a95c8400 x1399132012654552/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055477 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:42:52 lfs-oss-1-13 kernel: Lustre: 32241:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:43:00 lfs-oss-1-13 kernel: LustreError: 32030:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff81088c922c00 x1398901147717479/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335055380 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:43:00 lfs-oss-1-13 kernel: LustreError: 32030:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 00:43:12 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810419227000 Apr 22 00:43:23 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e0e4c000 Apr 22 00:44:01 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bdad25000 Apr 22 00:44:02 lfs-oss-1-13 kernel: Lustre: 31802:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897310900874 sent from scratch1-OST0088 to NID 10.174.6.174@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 00:44:02 lfs-oss-1-13 kernel: req@ffff8105c1e3dc00 x1398897310900874/t0 o104->@NET_0x500000aae06ae_UUID:15/16 lens 296/384 e 0 to 1 dl 1335055442 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 00:44:02 lfs-oss-1-13 kernel: Lustre: 31802:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Apr 22 00:44:02 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.6.174@o2ib was evicted due to a lock blocking callback to 10.174.6.174@o2ib timed out: rc -107 Apr 22 00:44:03 lfs-oss-1-13 kernel: LustreError: 32029:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c18210000 x1398901147721569/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335055531 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:44:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810058f42000 Apr 22 00:44:44 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107fc592000 Apr 22 00:44:44 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81076602e000 Apr 22 00:44:44 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105f1c06c00 Apr 22 00:44:44 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105f1c06c00 Apr 22 00:44:44 lfs-oss-1-13 kernel: LustreError: 32178:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(46429) req@ffff810c1b7d5000 x1398901147723017/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055684 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:44:44 lfs-oss-1-13 kernel: Lustre: 32178:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008e: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 00:45:42 lfs-oss-1-13 kernel: LustreError: 32208:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810920c56800 x1399132012657791/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055686 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:45:42 lfs-oss-1-13 kernel: Lustre: 32208:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:45:56 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101ac929000 Apr 22 00:45:56 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 00:45:56 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 12 previous similar messages Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c183e1480 Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81099843a000 Apr 22 00:46:53 lfs-oss-1-13 kernel: Lustre: 31817:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff810c16408400; still busy with 1 active RPCs Apr 22 00:46:53 lfs-oss-1-13 kernel: Lustre: 31817:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 30 previous similar messages Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 31786:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105bf403400 x1399132012660192/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335055713 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 31786:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 30 previous similar messages Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 32203:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff81073b364800 x1399132012659043/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335055755 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 32071:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff81073b364c00 x1399132012659044/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055755 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:46:53 lfs-oss-1-13 kernel: LustreError: 32203:0:(ost_handler.c:829:ost_brw_read()) Skipped 24 previous similar messages Apr 22 00:46:53 lfs-oss-1-13 kernel: Lustre: 32071:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:46:53 lfs-oss-1-13 kernel: Lustre: 32203:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:46:53 lfs-oss-1-13 kernel: Lustre: 32203:0:(ost_handler.c:887:ost_brw_read()) Skipped 27 previous similar messages Apr 22 00:47:17 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020c14c000 Apr 22 00:47:17 lfs-oss-1-13 kernel: LustreError: 31865:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 174 < 175 req@ffff8105a85af000 x1398901147729235/t0 o400->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 192/0 e 0 to 0 dl 1335055671 ref 1 fl Interpret:H/0/0 rc 0/0 Apr 22 00:47:25 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d8844000 Apr 22 00:47:56 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b8afe000 Apr 22 00:47:56 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102789d0000 Apr 22 00:47:56 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dacf6000 Apr 22 00:48:38 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0be36000 Apr 22 00:48:51 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dc92a280 Apr 22 00:48:51 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085a92c000 Apr 22 00:48:52 lfs-oss-1-13 kernel: LustreError: 32030:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8109e0fa7000 x1399132012663936/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055874 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:48:53 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034ce84000 Apr 22 00:49:35 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107299fe000 Apr 22 00:49:35 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103b6b58000 Apr 22 00:50:02 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c18b2bb80 Apr 22 00:50:02 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c098a000 Apr 22 00:50:06 lfs-oss-1-13 kernel: Lustre: 31754:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 289ecc88-8e10-0fb9-89fd-aa0e0e9f1c12 reconnecting Apr 22 00:50:06 lfs-oss-1-13 kernel: Lustre: 31754:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 224 previous similar messages Apr 22 00:50:14 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.0.70@o2ib ns: filter-scratch1-OST008c_UUID lock: ffff81085dc0ba00/0xcca1a6f6c04b65ab lrc: 3/0,0 mode: PW/PW res: 32659126/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->53247) flags: 0x20 remote: 0x8f0aa29f7afbbf5f expref: 5 pid: 31770 timeout 5262299749 Apr 22 00:50:21 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103a44be000 Apr 22 00:50:40 lfs-oss-1-13 kernel: LustreError: 32058:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c21c00c00 x1399132012667513/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335055844 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:50:40 lfs-oss-1-13 kernel: Lustre: 32058:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:50:40 lfs-oss-1-13 kernel: Lustre: 32058:0:(ost_handler.c:1224:ost_brw_write()) Skipped 1 previous similar message Apr 22 00:51:05 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ec38c000 Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: tx_queue, 1 seconds Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 9 previous similar messages Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.70@o2ib (51) Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 9 previous similar messages Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c048ab380 Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0be36000 Apr 22 00:52:05 lfs-oss-1-13 kernel: Lustre: 31856:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST0089: exp ffff810c14dc1c00 already connecting Apr 22 00:52:05 lfs-oss-1-13 kernel: LustreError: 31956:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 21 < 22 req@ffff81080b6db400 x1399132029383644/t0 o400->289ecc88-8e10-0fb9-89fd-aa0e0e9f1c12@NET_0x500000aae0046_UUID:0/0 lens 192/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 00:52:28 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016d623000 Apr 22 00:52:51 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b9638a000 Apr 22 00:52:59 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105b5fd6000 Apr 22 00:53:09 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0086: A client on nid 10.174.9.227@o2ib was evicted due to a lock blocking callback to 10.174.9.227@o2ib timed out: rc -107 Apr 22 00:53:09 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 00:53:26 lfs-oss-1-13 kernel: LustreError: 31785:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff8108c4833c00 ns: filter-scratch1-OST008a_UUID lock: ffff81098ed0dc00/0xcca1a6f6c05d3456 lrc: 3/0,0 mode: --/PW res: 32666108/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xc728da6897086ab5 expref: 21 pid: 31785 timeout 0 Apr 22 00:53:27 lfs-oss-1-13 kernel: LustreError: 11787:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.10.17@o2ib arrived at 1335056007 with bad export cookie 14745250231541418439 Apr 22 00:53:27 lfs-oss-1-13 kernel: LustreError: 11787:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 00:53:32 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: received MDS connection from 10.174.31.241@o2ib Apr 22 00:53:32 lfs-oss-1-13 kernel: Lustre: Skipped 10 previous similar messages Apr 22 00:53:32 lfs-oss-1-13 kernel: Lustre: scratch1-OST008c: received MDS connection from 10.174.31.241@o2ib Apr 22 00:53:32 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 00:53:32 lfs-oss-1-13 kernel: Lustre: 31949:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST008a: deleting orphan objects from 32666136 to 32667137, orphan objids won't be reused any more. Apr 22 00:53:32 lfs-oss-1-13 kernel: Lustre: 31949:0:(filter.c:3126:filter_destroy_precreated()) Skipped 8 previous similar messages Apr 22 00:53:36 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810544970000 Apr 22 00:53:36 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0be36000 Apr 22 00:53:49 lfs-oss-1-13 kernel: LustreError: 32008:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105a966bc00 x1398900878334279/t0 o4->7459b490-3ec7-3d69-529b-2367748948ae@NET_0x500000aae09e7_UUID:0/0 lens 448/416 e 0 to 0 dl 1335056247 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:53:56 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81086ca9a000 Apr 22 00:54:32 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075cd70000 Apr 22 00:54:38 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d7398000 Apr 22 00:55:16 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST008e_UUID lock: ffff8106baf90c00/0xcca1a6f6c059b433 lrc: 3/0,0 mode: PW/PW res: 32665249/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0xa6a88422cb455f7b expref: 13 pid: 31945 timeout 5262601171 Apr 22 00:55:17 lfs-oss-1-13 kernel: LustreError: 32213:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105d4a17c00 x1398901147756991/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335056245 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 00:55:17 lfs-oss-1-13 kernel: LustreError: 32213:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 00:55:35 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81031eb1d1c0 Apr 22 00:55:35 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd1ee8000 Apr 22 00:56:00 lfs-oss-1-13 kernel: LustreError: 32079:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810909250000 x1399132012679221/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335056278 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:56:00 lfs-oss-1-13 kernel: LustreError: 32079:0:(ost_handler.c:1064:ost_brw_write()) Skipped 3 previous similar messages Apr 22 00:56:00 lfs-oss-1-13 kernel: Lustre: 32079:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 00:56:00 lfs-oss-1-13 kernel: Lustre: 32079:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 00:56:04 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810285722000 Apr 22 00:56:04 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 00:56:04 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 18 previous similar messages Apr 22 00:56:14 lfs-oss-1-13 kernel: LustreError: 31758:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.178@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff810acec8d600/0xcca1a6f6c059ba4c lrc: 3/0,0 mode: PR/PR res: 32663644/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xf739398ee711d2a8 expref: 5 pid: 31750 timeout 5262659188 Apr 22 00:56:34 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff8101cf2d1000/0xcca1a6f6c061ee45 lrc: 3/0,0 mode: PR/PR res: 32666658/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xa6a88422cb4c2afb expref: 11 pid: 31906 timeout 5262679124 Apr 22 00:56:46 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 00:56:46 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 00:57:05 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a2680000 Apr 22 00:57:05 lfs-oss-1-13 kernel: Lustre: 31775:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c1b481000; still busy with 1 active RPCs Apr 22 00:57:05 lfs-oss-1-13 kernel: Lustre: 31933:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST0084: exp ffff810c1b481000 already connecting Apr 22 00:57:05 lfs-oss-1-13 kernel: Lustre: 31775:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 49 previous similar messages Apr 22 00:57:05 lfs-oss-1-13 kernel: Lustre: 31933:0:(ldlm_lib.c:803:target_handle_connect()) Skipped 1 previous similar message Apr 22 00:57:05 lfs-oss-1-13 kernel: LustreError: 31733:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-114) req@ffff810bd175c000 x1399132033616531/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335056325 ref 1 fl Interpret:/0/0 rc -114/0 Apr 22 00:57:05 lfs-oss-1-13 kernel: LustreError: 31733:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 63 previous similar messages Apr 22 00:57:06 lfs-oss-1-13 kernel: LustreError: 32038:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810884dcf000 x1399132033615331/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335056385 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 00:57:06 lfs-oss-1-13 kernel: LustreError: 32038:0:(ost_handler.c:829:ost_brw_read()) Skipped 34 previous similar messages Apr 22 00:57:06 lfs-oss-1-13 kernel: Lustre: 32038:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 00:57:06 lfs-oss-1-13 kernel: Lustre: 32038:0:(ost_handler.c:887:ost_brw_read()) Skipped 35 previous similar messages Apr 22 00:57:16 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810680392000 Apr 22 00:57:37 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109ce6e4000 Apr 22 00:57:37 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aea73c000 Apr 22 00:57:44 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.0.68@o2ib ns: filter-scratch1-OST0085_UUID lock: ffff81099b108200/0xcca1a6f6c06fc827 lrc: 3/0,0 mode: PR/PR res: 32670964/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x8afc1eaf68ee8376 expref: 5 pid: 31845 timeout 5262749163 Apr 22 00:57:54 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d7ff6000 Apr 22 00:58:21 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101f2395000 Apr 22 00:58:32 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81056a6c0000 Apr 22 00:58:50 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bd664000 Apr 22 00:58:50 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3aec4000 Apr 22 00:58:50 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab4244000 Apr 22 00:58:50 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a59fb8000 Apr 22 00:58:50 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ae1b4000 Apr 22 00:58:50 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a6630000 Apr 22 00:58:58 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020483a000 Apr 22 00:59:29 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109900c2ec0 Apr 22 00:59:29 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81082e820000 Apr 22 00:59:37 lfs-oss-1-13 kernel: Lustre: 31755:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897311218522 sent from scratch1-OST008e to NID 10.174.6.178@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 00:59:37 lfs-oss-1-13 kernel: req@ffff810c27d2c400 x1398897311218522/t0 o104->@NET_0x500000aae06b2_UUID:15/16 lens 296/384 e 0 to 1 dl 1335056377 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 00:59:37 lfs-oss-1-13 kernel: Lustre: 31755:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Apr 22 00:59:37 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008e: A client on nid 10.174.6.178@o2ib was evicted due to a lock blocking callback to 10.174.6.178@o2ib timed out: rc -107 Apr 22 00:59:37 lfs-oss-1-13 kernel: LustreError: Skipped 3 previous similar messages Apr 22 00:59:37 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810883513c00 x1398897311218629/t0 o105->@NET_0x500000aae06b2_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 00:59:37 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.178@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008e_UUID lock: ffff810727df4600/0xcca1a6f6c079a9f1 lrc: 3/0,0 mode: PW/PW res: 32676157/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xf739398ee711d6d0 expref: 23 pid: 31755 timeout 0 Apr 22 00:59:38 lfs-oss-1-13 kernel: LustreError: 32123:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c158cec00 x1399132012683715/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335056433 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:00:02 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ce9d5000 Apr 22 01:00:12 lfs-oss-1-13 kernel: Lustre: 31971:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 01:00:12 lfs-oss-1-13 kernel: Lustre: 31971:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 344 previous similar messages Apr 22 01:00:20 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81051efdc000 Apr 22 01:00:29 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81077945a000 Apr 22 01:00:30 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027c39c000 Apr 22 01:01:05 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105602d1000 Apr 22 01:01:09 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST0089_UUID lock: ffff81036cf1f200/0xcca1a6f6c0797e8c lrc: 3/0,0 mode: PW/PW res: 32672925/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0xa6a88422cb59b22d expref: 13 pid: 31735 timeout 5262954908 Apr 22 01:01:16 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aea73c000 Apr 22 01:02:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 01:02:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 01:02:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (24) Apr 22 01:02:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 01:02:13 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016a794000 Apr 22 01:02:46 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a27fc3000 Apr 22 01:03:13 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106d5d5a000 Apr 22 01:03:22 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073f122000 Apr 22 01:03:28 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023af26000 Apr 22 01:03:28 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107db484000 Apr 22 01:03:49 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c01e8000 Apr 22 01:04:17 lfs-oss-1-13 kernel: LustreError: 32009:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c1476b800 x1398901147864395/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335056657 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:04:45 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81035355d080 Apr 22 01:04:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810162288000 Apr 22 01:04:46 lfs-oss-1-13 kernel: LustreError: 32055:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105b00edc00 x1399132012688190/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335056828 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:04:46 lfs-oss-1-13 kernel: LustreError: 32055:0:(ost_handler.c:1064:ost_brw_write()) Skipped 1 previous similar message Apr 22 01:04:46 lfs-oss-1-13 kernel: Lustre: 32055:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 01:04:46 lfs-oss-1-13 kernel: Lustre: 32055:0:(ost_handler.c:1224:ost_brw_write()) Skipped 1 previous similar message Apr 22 01:05:07 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810940a28c00 x1398897311224392/t0 o105->@NET_0x500000aae06b2_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:05:07 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.178@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008e_UUID lock: ffff810a87e94400/0xcca1a6f6c07ba134 lrc: 3/0,0 mode: PW/PW res: 32676157/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xf739398ee711d946 expref: 8 pid: 31815 timeout 0 Apr 22 01:05:08 lfs-oss-1-13 kernel: LustreError: 32187:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105e7730800 x1399132012689007/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335056946 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:05:08 lfs-oss-1-13 kernel: LustreError: 32187:0:(ost_handler.c:825:ost_brw_read()) Skipped 3 previous similar messages Apr 22 01:05:30 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104cc649000 Apr 22 01:05:41 lfs-oss-1-13 kernel: LustreError: 776:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.6.178@o2ib arrived at 1335056741 with bad export cookie 14745250233709119772 Apr 22 01:05:47 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f2dac000 Apr 22 01:05:47 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109ce6e4000 Apr 22 01:06:22 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81012d4d0000 Apr 22 01:06:22 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810291eca000 Apr 22 01:06:22 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3aec4000 Apr 22 01:06:22 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073f122000 Apr 22 01:06:22 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81066f5e8000 Apr 22 01:06:35 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810694fdb000 Apr 22 01:07:16 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b1152000 Apr 22 01:07:16 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 01:07:16 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 19 previous similar messages Apr 22 01:07:16 lfs-oss-1-13 kernel: Lustre: 31890:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff810c173d7c00; still busy with 1 active RPCs Apr 22 01:07:16 lfs-oss-1-13 kernel: Lustre: 31711:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff810c16408400; still busy with 1 active RPCs Apr 22 01:07:16 lfs-oss-1-13 kernel: Lustre: 31711:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 28 previous similar messages Apr 22 01:07:16 lfs-oss-1-13 kernel: LustreError: 31890:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105bd83fc00 x1399132012696828/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335056936 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 01:07:16 lfs-oss-1-13 kernel: LustreError: 31711:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a5461000 x1399132012696826/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335056936 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 01:07:16 lfs-oss-1-13 kernel: LustreError: 31711:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 38 previous similar messages Apr 22 01:07:21 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023af26000 Apr 22 01:07:22 lfs-oss-1-13 kernel: LustreError: 32202:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff81092ec87800 x1398900873193321/t0 o3->e48ddc16-89ea-1336-5783-2b911f80071a@NET_0x500000aae08cf_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057098 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:07:22 lfs-oss-1-13 kernel: LustreError: 32202:0:(ost_handler.c:829:ost_brw_read()) Skipped 35 previous similar messages Apr 22 01:07:22 lfs-oss-1-13 kernel: Lustre: 32202:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with e48ddc16-89ea-1336-5783-2b911f80071a@NET_0x500000aae08cf_UUID id 12345-10.174.8.207@o2ib - client will retry Apr 22 01:07:22 lfs-oss-1-13 kernel: Lustre: 32202:0:(ost_handler.c:887:ost_brw_read()) Skipped 42 previous similar messages Apr 22 01:08:14 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106ee800000 Apr 22 01:08:20 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 01:08:20 lfs-oss-1-13 kernel: Lustre: Skipped 7 previous similar messages Apr 22 01:08:25 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0084: A client on nid 10.174.6.174@o2ib was evicted due to a lock blocking callback to 10.174.6.174@o2ib timed out: rc -107 Apr 22 01:08:25 lfs-oss-1-13 kernel: LustreError: Skipped 3 previous similar messages Apr 22 01:08:26 lfs-oss-1-13 kernel: LustreError: 32115:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105adc68c00 x1398901147933210/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057206 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:08:32 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d0938000 Apr 22 01:08:41 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd9188000 Apr 22 01:08:41 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8d6b0000 Apr 22 01:08:41 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d8ebe000 Apr 22 01:08:41 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024ce60000 Apr 22 01:08:57 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.6.178@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff8101a7473e00/0xcca1a6f6c07bc8fd lrc: 3/0,0 mode: PW/PW res: 32675259/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0xf739398ee7121981 expref: 8 pid: 31737 timeout 5263422159 Apr 22 01:09:09 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810079d8a000 Apr 22 01:09:09 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810986fee000 Apr 22 01:09:09 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810837fda000 Apr 22 01:10:06 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810695696000 Apr 22 01:10:21 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a22d06000 Apr 22 01:10:21 lfs-oss-1-13 kernel: Lustre: 31746:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008a: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 01:10:21 lfs-oss-1-13 kernel: Lustre: 31746:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 218 previous similar messages Apr 22 01:10:25 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81058b46c000 Apr 22 01:10:25 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a99088000 Apr 22 01:10:51 lfs-oss-1-13 kernel: LustreError: 32014:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105b0739400 x1399132012752981/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057051 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:10:51 lfs-oss-1-13 kernel: LustreError: 32014:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 01:11:04 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810285204000 Apr 22 01:11:04 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054741e000 Apr 22 01:11:28 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81057b56c000 Apr 22 01:12:00 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c28d7b800 Apr 22 01:12:00 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4a362000 Apr 22 01:12:00 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a4a362000 Apr 22 01:12:00 lfs-oss-1-13 kernel: LustreError: 32017:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c358c3050 x1399132033645761/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335057235 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:12:12 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff8102a5b00000/0xcca1a6f6c08dd328 lrc: 3/0,0 mode: PW/PW res: 32675236/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xa6a88422cb939e1e expref: 7 pid: 31735 timeout 5263617392 Apr 22 01:12:19 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 01:12:19 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 8 previous similar messages Apr 22 01:12:19 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (34) Apr 22 01:12:19 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 8 previous similar messages Apr 22 01:12:19 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e5c14000 Apr 22 01:12:44 lfs-oss-1-13 kernel: Lustre: 31851:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897311383241 sent from scratch1-OST008d to NID 10.174.0.70@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 01:12:44 lfs-oss-1-13 kernel: req@ffff810c33ab3800 x1398897311383241/t0 o106->@NET_0x500000aae0046_UUID:15/16 lens 296/424 e 0 to 1 dl 1335057164 ref 1 fl Rpc:/0/0 rc 0/0 Apr 22 01:12:44 lfs-oss-1-13 kernel: Lustre: 31851:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Apr 22 01:12:49 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81086dd7c000 Apr 22 01:13:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32066 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 01:13:11 lfs-oss-1-13 kernel: Pid: 32066, comm: ll_ost_io_74 Apr 22 01:13:11 lfs-oss-1-13 kernel: Apr 22 01:13:11 lfs-oss-1-13 kernel: Call Trace: Apr 22 01:13:11 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 01:13:11 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 01:13:11 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 01:13:11 lfs-oss-1-13 kernel: Apr 22 01:13:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32165 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 01:13:11 lfs-oss-1-13 kernel: Pid: 32165, comm: ll_ost_io_172 Apr 22 01:13:11 lfs-oss-1-13 kernel: Apr 22 01:13:11 lfs-oss-1-13 kernel: Call Trace: Apr 22 01:13:11 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 01:13:11 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 01:13:11 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 01:13:11 lfs-oss-1-13 kernel: Apr 22 01:13:11 lfs-oss-1-13 kernel: Pid: 32127, comm: ll_ost_io_134 Apr 22 01:13:11 lfs-oss-1-13 kernel: Apr 22 01:13:11 lfs-oss-1-13 kernel: Call Trace: Apr 22 01:13:11 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 01:13:11 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 01:13:11 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 01:13:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 01:13:11 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 01:13:11 lfs-oss-1-13 kernel: Apr 22 01:13:23 lfs-oss-1-13 kernel: Lustre: Service thread pid 32165 completed after 212.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 01:13:23 lfs-oss-1-13 kernel: Lustre: Service thread pid 32127 completed after 212.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 01:13:34 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090f1c2000 Apr 22 01:13:34 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d7ff6000 Apr 22 01:13:43 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81056bf6a000 Apr 22 01:13:55 lfs-oss-1-13 kernel: LustreError: 23517:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.7.53@o2ib arrived at 1335057235 with bad export cookie 14745250231541187285 Apr 22 01:14:31 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073f122000 Apr 22 01:14:31 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c22768800 x1398900885102282/t0 o4->6655a9b8-0c55-162b-cad6-550284c60b93@NET_0x500000aae073b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335057452 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:14:37 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd6e3a000 Apr 22 01:14:37 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810431274000 Apr 22 01:15:18 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.178@o2ib ns: filter-scratch1-OST008a_UUID lock: ffff810150523800/0xcca1a6f6c09a6e6e lrc: 3/0,0 mode: PR/PR res: 32685223/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xf739398ee7559374 expref: 5 pid: 31722 timeout 5263803581 Apr 22 01:15:21 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108d627c000 Apr 22 01:15:25 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105eee52c00 x1398901148009033/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057325 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:15:25 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 01:15:25 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810908c64000 Apr 22 01:15:25 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a59fb8000 Apr 22 01:15:25 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073f122000 Apr 22 01:16:07 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81079ee34000 Apr 22 01:16:21 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8107ef9dac00 x1398897311520603/t0 o105->@NET_0x500000aae06ae_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:16:21 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.174@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0085_UUID lock: ffff810296574e00/0xcca1a6f6c0b4cec7 lrc: 3/0,0 mode: PW/PW res: 32691318/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0xa6a88422cbb11efb expref: 6 pid: 31925 timeout 0 Apr 22 01:16:39 lfs-oss-1-13 kernel: LustreError: 779:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.68@o2ib arrived at 1335057399 with bad export cookie 14745250231713763269 Apr 22 01:17:04 lfs-oss-1-13 kernel: LustreError: 23515:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.6.174@o2ib arrived at 1335057424 with bad export cookie 14745250233708506061 Apr 22 01:17:04 lfs-oss-1-13 kernel: LustreError: 23515:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 01:17:30 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e36f5000 Apr 22 01:17:30 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e49cd000 Apr 22 01:17:30 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101693bf000 Apr 22 01:17:30 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 01:17:30 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 01:17:41 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106cd378000 Apr 22 01:17:43 lfs-oss-1-13 kernel: Lustre: 31755:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff8105aacd0400; still busy with 1 active RPCs Apr 22 01:17:43 lfs-oss-1-13 kernel: Lustre: 31755:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 30 previous similar messages Apr 22 01:17:43 lfs-oss-1-13 kernel: LustreError: 31755:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105d984e000 x1399132013009555/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335057563 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 01:17:43 lfs-oss-1-13 kernel: LustreError: 31755:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 37 previous similar messages Apr 22 01:17:43 lfs-oss-1-13 kernel: LustreError: 32206:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810bbee25400 x1399132013004102/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057588 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:17:43 lfs-oss-1-13 kernel: LustreError: 32206:0:(ost_handler.c:829:ost_brw_read()) Skipped 27 previous similar messages Apr 22 01:17:43 lfs-oss-1-13 kernel: Lustre: 32206:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 01:17:43 lfs-oss-1-13 kernel: Lustre: 32206:0:(ost_handler.c:887:ost_brw_read()) Skipped 35 previous similar messages Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b3ff26000 Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81086dd7c000 Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810225e22000 Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e4f0c000 Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107e4f0c000 Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81057b56c000 Apr 22 01:17:59 lfs-oss-1-13 kernel: LustreError: 32246:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff8105a9383c00 x1398901148058937/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335057605 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:17:59 lfs-oss-1-13 kernel: Lustre: 32246:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008d: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 01:17:59 lfs-oss-1-13 kernel: Lustre: 32246:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 01:18:18 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a99088000 Apr 22 01:18:51 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944854000 Apr 22 01:19:46 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105796c6000 Apr 22 01:19:54 lfs-oss-1-13 kernel: Lustre: scratch1-OST008e: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 01:19:54 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 01:19:54 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d789e000 Apr 22 01:20:14 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107636f9000 Apr 22 01:20:25 lfs-oss-1-13 kernel: Lustre: 31937:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 01:20:25 lfs-oss-1-13 kernel: Lustre: 31937:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 259 previous similar messages Apr 22 01:20:57 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a6ae6e000 Apr 22 01:21:27 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008e0cc000 Apr 22 01:21:52 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0084: A client on nid 10.174.0.68@o2ib was evicted due to a lock blocking callback to 10.174.0.68@o2ib timed out: rc -107 Apr 22 01:21:52 lfs-oss-1-13 kernel: LustreError: Skipped 8 previous similar messages Apr 22 01:21:52 lfs-oss-1-13 kernel: LustreError: 32015:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c3951dc00 x1399132033703226/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057892 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:21:52 lfs-oss-1-13 kernel: LustreError: 32015:0:(ost_handler.c:825:ost_brw_read()) Skipped 2 previous similar messages Apr 22 01:21:53 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a10780000 Apr 22 01:22:00 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810057522000 Apr 22 01:22:08 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81030be7a000 Apr 22 01:22:09 lfs-oss-1-13 kernel: LustreError: 32239:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810614686c00 x1398901148112446/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057805 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810783170000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104f980e580 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108164de000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108164de000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e4f0c000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 32228:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810a2bbc5000 x1398901148112469/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335057886 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81004f40c000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81004f40c000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093eba0000 Apr 22 01:22:37 lfs-oss-1-13 kernel: LustreError: 32137:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff810020332000 Apr 22 01:22:56 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810526ecc000 Apr 22 01:23:46 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106c6ab0000 Apr 22 01:24:01 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100a2d36000 Apr 22 01:24:01 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102c0ef1000 Apr 22 01:24:27 lfs-oss-1-13 kernel: Lustre: 31909:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897311696929 sent from scratch1-OST008d to NID 10.174.6.174@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 01:24:27 lfs-oss-1-13 kernel: req@ffff8105d592f400 x1398897311696929/t0 o106->@NET_0x500000aae06ae_UUID:15/16 lens 296/424 e 0 to 1 dl 1335057867 ref 1 fl Rpc:/0/0 rc 0/0 Apr 22 01:24:27 lfs-oss-1-13 kernel: Lustre: 31909:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 34 previous similar messages Apr 22 01:24:50 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020f7cc000 Apr 22 01:25:21 lfs-oss-1-13 kernel: LustreError: 32137:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810bc796c800 x1398901148115768/t0 o4->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/416 e 0 to 0 dl 1335057938 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:25:21 lfs-oss-1-13 kernel: LustreError: 32137:0:(ost_handler.c:1064:ost_brw_write()) Skipped 2 previous similar messages Apr 22 01:25:38 lfs-oss-1-13 kernel: LustreError: 32028:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105a93f5c00 x1399132013070107/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335057938 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:25:38 lfs-oss-1-13 kernel: LustreError: 32028:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 01:25:55 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033f5ca000 Apr 22 01:26:12 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090f1c2000 Apr 22 01:26:12 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109dce7e000 Apr 22 01:26:12 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81057b56c000 Apr 22 01:26:12 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81066f5e8000 Apr 22 01:26:18 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107128b6000 Apr 22 01:26:18 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101cf286000 Apr 22 01:26:18 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8101cf286000 Apr 22 01:26:18 lfs-oss-1-13 kernel: LustreError: 31995:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c2053dc00 x1399132013101292/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335058212 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:26:18 lfs-oss-1-13 kernel: LustreError: 31995:0:(ost_handler.c:1073:ost_brw_write()) Skipped 1 previous similar message Apr 22 01:26:24 lfs-oss-1-13 kernel: LustreError: 32158:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810a96b8dc00 x1399132033718135/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 1 to 0 dl 1335058114 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:27:00 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.174@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff81014b8b2e00/0xcca1a6f6c0b38ef0 lrc: 3/0,0 mode: PW/PW res: 32388491/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x80010020 remote: 0xa6a88422cbaf4619 expref: 6 pid: 31783 timeout 5264505642 Apr 22 01:27:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 6 seconds Apr 22 01:27:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 12 previous similar messages Apr 22 01:27:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (21) Apr 22 01:27:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 12 previous similar messages Apr 22 01:27:28 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105f6c8f000 Apr 22 01:27:30 lfs-oss-1-13 kernel: LustreError: 11791:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.7.59@o2ib arrived at 1335058050 with bad export cookie 14745250233437534360 Apr 22 01:27:37 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab7864000 Apr 22 01:27:37 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 01:27:37 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 21 previous similar messages Apr 22 01:27:43 lfs-oss-1-13 kernel: Lustre: 31976:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff8105a966e600; still busy with 1 active RPCs Apr 22 01:27:43 lfs-oss-1-13 kernel: Lustre: 31976:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 26 previous similar messages Apr 22 01:27:43 lfs-oss-1-13 kernel: LustreError: 31976:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a927fc50 x1399132033737068/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335058163 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 01:27:43 lfs-oss-1-13 kernel: LustreError: 31976:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 27 previous similar messages Apr 22 01:27:43 lfs-oss-1-13 kernel: LustreError: 32209:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105e7206800 x1399132033725656/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335058081 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:27:43 lfs-oss-1-13 kernel: LustreError: 32209:0:(ost_handler.c:829:ost_brw_read()) Skipped 26 previous similar messages Apr 22 01:27:43 lfs-oss-1-13 kernel: Lustre: 32209:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 01:27:43 lfs-oss-1-13 kernel: Lustre: 32209:0:(ost_handler.c:887:ost_brw_read()) Skipped 31 previous similar messages Apr 22 01:28:31 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d7ff6000 Apr 22 01:28:36 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810985e1a000 Apr 22 01:28:53 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810172e4e000 Apr 22 01:29:12 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c1fb6a800 x1398901148151800/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335058407 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:29:31 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a05ba6000 Apr 22 01:29:31 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092e14e000 Apr 22 01:29:47 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810200cee000 Apr 22 01:29:59 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106ad39c000 Apr 22 01:29:59 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105019ea000 Apr 22 01:29:59 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81087bda8000 Apr 22 01:29:59 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b1e91a000 Apr 22 01:30:33 lfs-oss-1-13 kernel: Lustre: 31765:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008e: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 01:30:33 lfs-oss-1-13 kernel: Lustre: 31765:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 219 previous similar messages Apr 22 01:30:55 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b7c064000 Apr 22 01:31:09 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b37789000 Apr 22 01:31:45 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103d1d00000 Apr 22 01:31:50 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024ec60000 Apr 22 01:31:50 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944854000 Apr 22 01:31:50 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103ea5b0000 Apr 22 01:31:50 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c2883f800 Apr 22 01:31:50 lfs-oss-1-13 kernel: LustreError: 31811:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 61 < 62 req@ffff8105a3b3c800 x1399132033742857/t0 o400->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 192/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 01:31:59 lfs-oss-1-13 kernel: Lustre: scratch1-OST008d: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 01:31:59 lfs-oss-1-13 kernel: Lustre: Skipped 7 previous similar messages Apr 22 01:32:06 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810665f4e000 Apr 22 01:32:32 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81087bda8000 Apr 22 01:32:32 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81038dbf0000 Apr 22 01:33:12 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106c6ab0000 Apr 22 01:33:12 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810790690000 Apr 22 01:33:21 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103d1d00000 Apr 22 01:33:21 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108daf70000 Apr 22 01:33:21 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020e62e000 Apr 22 01:33:21 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81080b09e000 Apr 22 01:33:22 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102db35d000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81046b354000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810071c20000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810071c20000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810abd298000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 32084:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c18353000 x1399132013135152/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335058681 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810abd298000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105019ea000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105019ea000 Apr 22 01:34:31 lfs-oss-1-13 kernel: Lustre: 32084:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008e: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008a862000 Apr 22 01:34:31 lfs-oss-1-13 kernel: Lustre: 32084:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81008a862000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bc018000 Apr 22 01:34:31 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103bc018000 Apr 22 01:34:58 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b4a812000 Apr 22 01:35:18 lfs-oss-1-13 kernel: LustreError: 32053:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c29865c00 x1399132013135139/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335058518 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:35:18 lfs-oss-1-13 kernel: LustreError: 32053:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 01:36:24 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a52dfc000 Apr 22 01:36:24 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a44e8000 Apr 22 01:36:24 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104a44e8000 Apr 22 01:36:24 lfs-oss-1-13 kernel: LustreError: 32190:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c26e15800 x1399132013138710/t0 o4->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/416 e 0 to 0 dl 1335058894 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:36:24 lfs-oss-1-13 kernel: LustreError: 32190:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 01:36:51 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b81452000 Apr 22 01:36:51 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103ddaab000 Apr 22 01:36:51 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103ddaab000 Apr 22 01:36:51 lfs-oss-1-13 kernel: LustreError: 32045:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8105c0f98c00 x1399132033783781/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335058816 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:36:53 lfs-oss-1-13 kernel: Lustre: Service thread pid 32192 was inactive for 412.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 01:36:53 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 01:36:53 lfs-oss-1-13 kernel: Pid: 32192, comm: ll_ost_io_199 Apr 22 01:36:53 lfs-oss-1-13 kernel: Apr 22 01:36:53 lfs-oss-1-13 kernel: Call Trace: Apr 22 01:36:53 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 01:36:53 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 01:36:53 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 01:36:53 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 01:36:53 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 01:36:53 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 01:36:53 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 01:36:53 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 01:36:53 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 01:36:53 lfs-oss-1-13 kernel: Apr 22 01:36:55 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a58fb4000 Apr 22 01:36:55 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810206a00000 Apr 22 01:36:55 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810790690000 Apr 22 01:36:55 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008a860000 Apr 22 01:36:55 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b97eea000 Apr 22 01:37:49 lfs-oss-1-13 kernel: Lustre: 31720:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c0f8a6c00; still busy with 1 active RPCs Apr 22 01:37:49 lfs-oss-1-13 kernel: Lustre: 31720:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 19 previous similar messages Apr 22 01:37:49 lfs-oss-1-13 kernel: LustreError: 31720:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105d1885400 x1399132033786060/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335058769 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 01:37:49 lfs-oss-1-13 kernel: LustreError: 31720:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 20 previous similar messages Apr 22 01:37:49 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c158cec00 x1399132033783780/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335058816 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:37:49 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:829:ost_brw_read()) Skipped 28 previous similar messages Apr 22 01:37:49 lfs-oss-1-13 kernel: Lustre: 32023:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 01:37:49 lfs-oss-1-13 kernel: Lustre: 32023:0:(ost_handler.c:887:ost_brw_read()) Skipped 36 previous similar messages Apr 22 01:37:57 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105c25d8000 Apr 22 01:37:57 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106bb51c000 Apr 22 01:38:05 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a9c12000 Apr 22 01:39:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 4 seconds Apr 22 01:39:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 01:39:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (29) Apr 22 01:39:28 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 01:39:28 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810852872000 Apr 22 01:39:28 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 01:39:28 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 01:39:48 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089be2a000 Apr 22 01:39:48 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ea3d3000 Apr 22 01:39:48 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104ea3d3000 Apr 22 01:39:48 lfs-oss-1-13 kernel: LustreError: 32153:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c28c8c000 x1399132033801622/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335058991 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:40:23 lfs-oss-1-13 kernel: Lustre: 31819:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897311876924 sent from scratch1-OST008e to NID 10.174.15.68@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 01:40:23 lfs-oss-1-13 kernel: req@ffff810c17eca000 x1398897311876924/t0 o104->@NET_0x500000aae0f44_UUID:15/16 lens 296/384 e 0 to 1 dl 1335058823 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 01:40:23 lfs-oss-1-13 kernel: Lustre: 31819:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Apr 22 01:40:23 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008e: A client on nid 10.174.15.68@o2ib was evicted due to a lock blocking callback to 10.174.15.68@o2ib timed out: rc -107 Apr 22 01:40:23 lfs-oss-1-13 kernel: LustreError: Skipped 4 previous similar messages Apr 22 01:40:30 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050a594000 Apr 22 01:40:30 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81052b256000 Apr 22 01:40:30 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100af402000 Apr 22 01:40:30 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008a860000 Apr 22 01:40:40 lfs-oss-1-13 kernel: Lustre: 31959:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008e: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 01:40:40 lfs-oss-1-13 kernel: Lustre: 31959:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 176 previous similar messages Apr 22 01:40:52 lfs-oss-1-13 kernel: LustreError: 32494:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.15.62@o2ib arrived at 1335058852 with bad export cookie 14745250231713740736 Apr 22 01:40:56 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102803a8000 Apr 22 01:41:30 lfs-oss-1-13 kernel: LustreError: 32209:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105c6d87c00 x1399131977151990/t0 o4->165f9fd8-1d96-9b62-14b6-1a6076938fd6@NET_0x500000aae0f42_UUID:0/0 lens 448/416 e 0 to 0 dl 1335059565 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:42:12 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810783337000 Apr 22 01:42:12 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81066d4ce000 Apr 22 01:42:14 lfs-oss-1-13 kernel: Lustre: scratch1-OST0089: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 01:42:14 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 01:42:20 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a5de3f000 Apr 22 01:42:20 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073bb0a000 Apr 22 01:42:20 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81073bb0a000 Apr 22 01:42:20 lfs-oss-1-13 kernel: LustreError: 32219:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8106c1718800 x1399132033804810/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335059143 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:42:43 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.15.68@o2ib ns: filter-scratch1-OST0086_UUID lock: ffff8109f0542e00/0xcca1a6f6c0fb1235 lrc: 3/0,0 mode: PR/PR res: 32714890/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xa7bbf3387e6b4f81 expref: 7 pid: 31857 timeout 5265448003 Apr 22 01:42:43 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810a14c5ec00 x1398897311920025/t0 o105->@NET_0x500000aae0f44_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:42:43 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.15.68@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0086_UUID lock: ffff810b27107c00/0xcca1a6f6c1007a2f lrc: 3/0,0 mode: PW/PW res: 32714890/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xa7bbf3387e6b8814 expref: 6 pid: 31751 timeout 0 Apr 22 01:42:57 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.15.68@o2ib ns: filter-scratch1-OST0085_UUID lock: ffff8101d0c2ae00/0xcca1a6f6c0f93483 lrc: 3/0,0 mode: PW/PW res: 32712312/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x80010020 remote: 0xa7bbf3387e690223 expref: 6 pid: 31926 timeout 5265462897 Apr 22 01:43:02 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810412e28000 Apr 22 01:43:02 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108abc5c000 Apr 22 01:43:02 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e1faa000 Apr 22 01:43:02 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100af402000 Apr 22 01:43:21 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81080ca30000 Apr 22 01:44:05 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81003975a000 Apr 22 01:44:05 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b97eea000 Apr 22 01:44:05 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810140074000 Apr 22 01:44:05 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81014831e000 Apr 22 01:44:37 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088da14000 Apr 22 01:44:51 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810486762000 Apr 22 01:44:51 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810728d8a000 Apr 22 01:44:51 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810728d8a000 Apr 22 01:44:51 lfs-oss-1-13 kernel: LustreError: 32219:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8105e16d4800 x1399132033808242/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335059295 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:44:51 lfs-oss-1-13 kernel: Lustre: 32219:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 01:44:51 lfs-oss-1-13 kernel: Lustre: 32219:0:(ost_handler.c:1224:ost_brw_write()) Skipped 9 previous similar messages Apr 22 01:45:08 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a61d78000 Apr 22 01:45:08 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065c68e000 Apr 22 01:45:08 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102dd738000 Apr 22 01:45:08 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b0d74000 Apr 22 01:45:14 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944854000 Apr 22 01:45:23 lfs-oss-1-13 kernel: LustreError: 32221:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c25c39800 x1398901148214658/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335059209 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:45:23 lfs-oss-1-13 kernel: LustreError: 32221:0:(ost_handler.c:825:ost_brw_read()) Skipped 4 previous similar messages Apr 22 01:45:59 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088e634000 Apr 22 01:45:59 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81022341c000 Apr 22 01:46:05 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81002fa82000 Apr 22 01:47:02 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109767f4000 Apr 22 01:47:08 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810153a34000 Apr 22 01:47:27 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810beff76000 Apr 22 01:47:59 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109de58a000 Apr 22 01:47:59 lfs-oss-1-13 kernel: Lustre: 31906:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff8105d595c200; still busy with 1 active RPCs Apr 22 01:47:59 lfs-oss-1-13 kernel: Lustre: 31906:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 41 previous similar messages Apr 22 01:47:59 lfs-oss-1-13 kernel: LustreError: 31906:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c1f582800 x1399132013154325/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335059379 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 01:47:59 lfs-oss-1-13 kernel: LustreError: 31906:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 49 previous similar messages Apr 22 01:48:00 lfs-oss-1-13 kernel: LustreError: 32242:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105d0745c00 x1399132013153525/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335059440 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:48:00 lfs-oss-1-13 kernel: LustreError: 32242:0:(ost_handler.c:829:ost_brw_read()) Skipped 39 previous similar messages Apr 22 01:48:00 lfs-oss-1-13 kernel: Lustre: 32242:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 01:48:00 lfs-oss-1-13 kernel: Lustre: 32242:0:(ost_handler.c:887:ost_brw_read()) Skipped 45 previous similar messages Apr 22 01:48:24 lfs-oss-1-13 kernel: Lustre: 32065:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-503), not sending early reply Apr 22 01:48:24 lfs-oss-1-13 kernel: req@ffff810c3052c850 x1398900876414997/t0 o3->83623168-021f-f795-eea4-b2b1bf84d52c@NET_0x500000aae08d3_UUID:0/0 lens 448/400 e 2 to 0 dl 1335059309 ref 2 fl Interpret:/0/0 rc 0/0 Apr 22 01:48:24 lfs-oss-1-13 kernel: Lustre: 32065:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 22 01:48:26 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a52b99000 Apr 22 01:48:26 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107288ea000 Apr 22 01:48:26 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107288ea000 Apr 22 01:48:26 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c344bd850 x1399132033812488/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335059508 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:48:29 lfs-oss-1-13 kernel: LustreError: 32192:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 1108+0s req@ffff810c3052c850 x1398900876414997/t0 o3->83623168-021f-f795-eea4-b2b1bf84d52c@NET_0x500000aae08d3_UUID:0/0 lens 448/400 e 2 to 0 dl 1335059309 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:48:29 lfs-oss-1-13 kernel: LustreError: 32192:0:(ost_handler.c:822:ost_brw_read()) Skipped 4 previous similar messages Apr 22 01:48:29 lfs-oss-1-13 kernel: Lustre: Service thread pid 32192 completed after 1108.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 01:48:29 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 01:48:33 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810470202000 Apr 22 01:48:56 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad58da000 Apr 22 01:49:34 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 4 seconds Apr 22 01:49:34 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 01:49:34 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (46) Apr 22 01:49:34 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 01:49:34 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b43e0000 Apr 22 01:49:34 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 01:49:34 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 22 01:49:42 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103f4fa7000 Apr 22 01:49:42 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81055b7a9000 Apr 22 01:49:42 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81055b7a9000 Apr 22 01:50:11 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102ff732000 Apr 22 01:50:37 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109780c2000 Apr 22 01:50:41 lfs-oss-1-13 kernel: Lustre: 31917:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008e: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 01:50:41 lfs-oss-1-13 kernel: Lustre: 31917:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 279 previous similar messages Apr 22 01:51:23 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008a860000 Apr 22 01:51:33 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be7d94000 Apr 22 01:52:04 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf9d94000 Apr 22 01:52:38 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026cc18000 Apr 22 01:52:38 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105457ac000 Apr 22 01:52:38 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105457ac000 Apr 22 01:52:44 lfs-oss-1-13 kernel: Lustre: 31960:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312024791 sent from scratch1-OST008a to NID 10.174.6.178@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 01:52:44 lfs-oss-1-13 kernel: req@ffff810c2bfaf000 x1398897312024791/t0 o106->@NET_0x500000aae06b2_UUID:15/16 lens 296/424 e 0 to 1 dl 1335059564 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 01:52:44 lfs-oss-1-13 kernel: Lustre: 31960:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 24 previous similar messages Apr 22 01:52:49 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a7bdcc000 Apr 22 01:53:21 lfs-oss-1-13 kernel: Lustre: scratch1-OST0087: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 190 seconds. I think it's dead, and I am evicting it. Apr 22 01:53:21 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 01:53:33 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810890260000 Apr 22 01:53:40 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81025a274000 Apr 22 01:53:40 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81055b168000 Apr 22 01:54:03 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c352e9c50 x1399131989747266/t0 o4->c78597d7-c7ba-4cfb-e767-70b730aecc45@NET_0x500000aae0042_UUID:0/0 lens 448/416 e 0 to 0 dl 1335060398 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 01:54:19 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108e821e000 Apr 22 01:54:19 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102201b0000 Apr 22 01:55:08 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108e5cd9000 Apr 22 01:56:17 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bab62000 Apr 22 01:56:24 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81049a445000 Apr 22 01:56:38 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105827cf000 Apr 22 01:56:38 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100710f9000 Apr 22 01:56:38 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8100710f9000 Apr 22 01:56:38 lfs-oss-1-13 kernel: LustreError: 32226:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8107beab5800 x1399132033820941/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335060006 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:56:38 lfs-oss-1-13 kernel: LustreError: 32226:0:(ost_handler.c:1073:ost_brw_write()) Skipped 2 previous similar messages Apr 22 01:56:38 lfs-oss-1-13 kernel: Lustre: 32226:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 01:56:38 lfs-oss-1-13 kernel: Lustre: 32226:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 01:57:34 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81045df5c000 Apr 22 01:57:41 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008e: A client on nid 10.174.0.68@o2ib was evicted due to a lock blocking callback to 10.174.0.68@o2ib timed out: rc -107 Apr 22 01:57:41 lfs-oss-1-13 kernel: LustreError: Skipped 26 previous similar messages Apr 22 01:57:41 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c1f88e000 x1398897312182990/t0 o105->@NET_0x500000aae0044_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:57:41 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.68@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008e_UUID lock: ffff81077096b600/0xcca1a6f6c1376eb5 lrc: 3/0,0 mode: PW/PW res: 32733703/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x8afc1eaf692c001c expref: 17 pid: 31826 timeout 0 Apr 22 01:57:42 lfs-oss-1-13 kernel: LustreError: 32202:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8107ef9da800 x1399132033823342/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335060570 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:57:42 lfs-oss-1-13 kernel: LustreError: 32202:0:(ost_handler.c:825:ost_brw_read()) Skipped 2 previous similar messages Apr 22 01:57:51 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810323808000 Apr 22 01:58:30 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103318d5000 Apr 22 01:58:32 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81096a168000 Apr 22 01:58:32 lfs-oss-1-13 kernel: Lustre: 32202:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008e: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 01:58:32 lfs-oss-1-13 kernel: Lustre: 32202:0:(ost_handler.c:887:ost_brw_read()) Skipped 30 previous similar messages Apr 22 01:58:32 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aae206000 Apr 22 01:58:32 lfs-oss-1-13 kernel: LustreError: 31818:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107) req@ffff810c35cec850 x1399132033824094/t0 o400->@:0/0 lens 192/0 e 0 to 0 dl 1335059939 ref 1 fl Interpret:H/0/0 rc -107/0 Apr 22 01:58:32 lfs-oss-1-13 kernel: LustreError: 31818:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 33 previous similar messages Apr 22 01:58:36 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107b7328000 Apr 22 01:59:08 lfs-oss-1-13 kernel: Lustre: 31780:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff8105aacd0400; still busy with 1 active RPCs Apr 22 01:59:08 lfs-oss-1-13 kernel: Lustre: 31780:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 32 previous similar messages Apr 22 01:59:08 lfs-oss-1-13 kernel: LustreError: 32141:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c18fe6800 x1399132013165767/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335060035 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:59:08 lfs-oss-1-13 kernel: LustreError: 32141:0:(ost_handler.c:829:ost_brw_read()) Skipped 28 previous similar messages Apr 22 01:59:23 lfs-oss-1-13 kernel: LustreError: 21827:0:(quota_context.c:699:dqacq_completion()) acquire qunit got error! (rc:-107) Apr 22 01:59:23 lfs-oss-1-13 kernel: LustreError: 32220:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105a8dd2400 x1398901148229093/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335059963 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 01:59:23 lfs-oss-1-13 kernel: LustreError: 32220:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 01:59:34 lfs-oss-1-13 kernel: LustreError: 21827:0:(quota_context.c:699:dqacq_completion()) acquire qunit got error! (rc:-107) Apr 22 01:59:37 lfs-oss-1-13 kernel: Lustre: scratch1-OST0084: received MDS connection from 10.174.31.241@o2ib Apr 22 01:59:37 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 01:59:37 lfs-oss-1-13 kernel: Lustre: 31762:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST0085: deleting orphan objects from 32731565 to 32732641, orphan objids won't be reused any more. Apr 22 01:59:37 lfs-oss-1-13 kernel: Lustre: 31762:0:(filter.c:3126:filter_destroy_precreated()) Skipped 7 previous similar messages Apr 22 01:59:40 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 01:59:40 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 01:59:40 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (27) Apr 22 01:59:40 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 01:59:40 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d7c20000 Apr 22 02:00:11 lfs-oss-1-13 kernel: LustreError: 28462:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.64@o2ib arrived at 1335060011 with bad export cookie 14745250231713752923 Apr 22 02:00:11 lfs-oss-1-13 kernel: LustreError: 28462:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 02:00:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c27ad3400 x1398897312184911/t0 o105->@NET_0x500000aae0dc0_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:00:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.13.192@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0087_UUID lock: ffff810abeca1a00/0xcca1a6f6c1392f4f lrc: 3/0,0 mode: PW/PW res: 32734767/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x0 remote: 0xdf3cecf6e5bcdb22 expref: 5 pid: 31751 timeout 0 Apr 22 02:00:24 lfs-oss-1-13 kernel: LustreError: 767:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.13.192@o2ib arrived at 1335060024 with bad export cookie 14745250231541490658 Apr 22 02:00:24 lfs-oss-1-13 kernel: LustreError: 767:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 31 previous similar messages Apr 22 02:00:30 lfs-oss-1-13 kernel: Lustre: scratch1-OST008b: received MDS connection from 10.174.31.241@o2ib Apr 22 02:00:30 lfs-oss-1-13 kernel: Lustre: Skipped 10 previous similar messages Apr 22 02:00:30 lfs-oss-1-13 kernel: Lustre: 31740:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST008b: deleting orphan objects from 32733680 to 32734689, orphan objids won't be reused any more. Apr 22 02:00:30 lfs-oss-1-13 kernel: Lustre: 31740:0:(filter.c:3126:filter_destroy_precreated()) Skipped 10 previous similar messages Apr 22 02:00:39 lfs-oss-1-13 kernel: Lustre: scratch1-OST008d: received MDS connection from 10.174.31.241@o2ib Apr 22 02:00:39 lfs-oss-1-13 kernel: Lustre: 31915:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST008e: deleting orphan objects from 32735344 to 32736353, orphan objids won't be reused any more. Apr 22 02:00:41 lfs-oss-1-13 kernel: Lustre: 31926:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008a: 65647de8-83cc-c6f3-ad04-800e024dc94d reconnecting Apr 22 02:00:41 lfs-oss-1-13 kernel: Lustre: 31926:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 5263 previous similar messages Apr 22 02:00:43 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81047850a000 Apr 22 02:00:51 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810290f69000 Apr 22 02:00:51 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81049f1c7000 Apr 22 02:00:51 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81049f1c7000 Apr 22 02:00:51 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 02:00:51 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 20 previous similar messages Apr 22 02:01:20 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3c380000 Apr 22 02:01:52 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810446866000 Apr 22 02:01:53 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810acc446000 Apr 22 02:02:12 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105c3600400 Apr 22 02:03:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108eb471000 Apr 22 02:03:47 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027ff1a000 Apr 22 02:03:47 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a4b4f000 Apr 22 02:03:47 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104a4b4f000 Apr 22 02:03:58 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81043cf02000 Apr 22 02:04:18 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81002860e000 Apr 22 02:04:29 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810535c46000 Apr 22 02:04:51 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105954ae000 Apr 22 02:04:51 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a45b4000 Apr 22 02:04:51 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108a45b4000 Apr 22 02:05:32 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810befdd0000 Apr 22 02:05:33 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81014ec4e000 Apr 22 02:06:06 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a003fe000 Apr 22 02:06:06 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb968e000 Apr 22 02:06:06 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810bb968e000 Apr 22 02:06:06 lfs-oss-1-13 kernel: LustreError: 32196:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8105c2fde400 x1399132033832491/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335060576 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:06:06 lfs-oss-1-13 kernel: LustreError: 32196:0:(ost_handler.c:1073:ost_brw_write()) Skipped 3 previous similar messages Apr 22 02:06:40 lfs-oss-1-13 kernel: Lustre: scratch1-OST0087: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 02:06:40 lfs-oss-1-13 kernel: Lustre: Skipped 5 previous similar messages Apr 22 02:06:49 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b88342000 Apr 22 02:07:01 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107012fc000 Apr 22 02:08:01 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2d7ea000 Apr 22 02:08:01 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9b350000 Apr 22 02:08:01 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a9b350000 Apr 22 02:08:01 lfs-oss-1-13 kernel: Lustre: 32035:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 02:08:01 lfs-oss-1-13 kernel: Lustre: 32035:0:(ost_handler.c:1224:ost_brw_write()) Skipped 28 previous similar messages Apr 22 02:08:10 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810246dd4000 Apr 22 02:08:18 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100a4511000 Apr 22 02:08:55 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105220ec000 Apr 22 02:08:55 lfs-oss-1-13 kernel: LustreError: 31850:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a293cc00 x1398901148240031/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335060635 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 02:08:55 lfs-oss-1-13 kernel: LustreError: 31850:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 85 previous similar messages Apr 22 02:08:56 lfs-oss-1-13 kernel: Lustre: 32053:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 02:08:56 lfs-oss-1-13 kernel: Lustre: 32053:0:(ost_handler.c:887:ost_brw_read()) Skipped 38 previous similar messages Apr 22 02:09:28 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81053c4d3000 Apr 22 02:09:28 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81021c3d6000 Apr 22 02:09:28 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81021c3d6000 Apr 22 02:09:33 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b00b8000 Apr 22 02:09:33 lfs-oss-1-13 kernel: Lustre: 31925:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0088: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff810b095fc200; still busy with 1 active RPCs Apr 22 02:09:33 lfs-oss-1-13 kernel: Lustre: 31925:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 66 previous similar messages Apr 22 02:09:53 lfs-oss-1-13 kernel: LustreError: 32188:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105c4301000 x1399132013176243/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335060716 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:09:53 lfs-oss-1-13 kernel: LustreError: 32188:0:(ost_handler.c:829:ost_brw_read()) Skipped 36 previous similar messages Apr 22 02:10:23 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104de9d2000 Apr 22 02:10:43 lfs-oss-1-13 kernel: Lustre: 31900:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 02:10:43 lfs-oss-1-13 kernel: Lustre: 31900:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 262 previous similar messages Apr 22 02:11:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 02:11:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 11 previous similar messages Apr 22 02:11:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (34) Apr 22 02:11:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 11 previous similar messages Apr 22 02:11:02 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033b0da000 Apr 22 02:11:35 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023c4f5000 Apr 22 02:11:35 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810940f7c000 Apr 22 02:11:35 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810940f7c000 Apr 22 02:11:51 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810726566000 Apr 22 02:11:51 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 02:11:51 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 13 previous similar messages Apr 22 02:12:17 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102db0a2000 Apr 22 02:12:30 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c20243800 x1398901148241817/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335060750 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:13:28 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a6090000 Apr 22 02:13:33 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054f104000 Apr 22 02:13:39 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810029cb6000 Apr 22 02:13:41 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103fcb8a000 Apr 22 02:13:41 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81045ed8c000 Apr 22 02:13:41 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81045ed8c000 Apr 22 02:14:55 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c86c3000 Apr 22 02:14:57 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810696b08000 Apr 22 02:14:57 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104bd593000 Apr 22 02:14:57 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104bd593000 Apr 22 02:15:01 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106c430e000 Apr 22 02:15:22 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810943772000 Apr 22 02:16:28 lfs-oss-1-13 kernel: Lustre: 31883:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312510808 sent from scratch1-OST0084 to NID 10.174.13.195@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:16:28 lfs-oss-1-13 kernel: req@ffff810c18210000 x1398897312510808/t0 o106->@NET_0x500000aae0dc3_UUID:15/16 lens 296/424 e 0 to 1 dl 1335060988 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 02:16:28 lfs-oss-1-13 kernel: Lustre: 31883:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 55 previous similar messages Apr 22 02:16:38 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: received MDS connection from 10.174.31.241@o2ib Apr 22 02:16:38 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 02:16:38 lfs-oss-1-13 kernel: Lustre: 31870:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST0088: deleting orphan objects from 32753295 to 32754273, orphan objids won't be reused any more. Apr 22 02:16:38 lfs-oss-1-13 kernel: Lustre: 31870:0:(filter.c:3126:filter_destroy_precreated()) Skipped 1 previous similar message Apr 22 02:16:54 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108aa10c000 Apr 22 02:17:01 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810193c1e000 Apr 22 02:17:02 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054eef2000 Apr 22 02:17:03 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ab48e000 Apr 22 02:17:03 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107fa7c6000 Apr 22 02:17:03 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107fa7c6000 Apr 22 02:17:03 lfs-oss-1-13 kernel: LustreError: 32227:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c24497c00 x1399132033845632/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335061232 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:17:03 lfs-oss-1-13 kernel: LustreError: 32227:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 02:18:00 lfs-oss-1-13 kernel: Lustre: scratch1-OST008c: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 02:18:00 lfs-oss-1-13 kernel: Lustre: Skipped 6 previous similar messages Apr 22 02:18:10 lfs-oss-1-13 kernel: Lustre: 28462:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312511398 sent from scratch1-OST008d to NID 10.174.0.222@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:18:10 lfs-oss-1-13 kernel: req@ffff81099e2fa400 x1398897312511398/t0 o105->@NET_0x500000aae00de_UUID:15/16 lens 344/384 e 0 to 1 dl 1335061090 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 02:18:10 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.0.222@o2ib was evicted due to a lock completion callback to 10.174.0.222@o2ib timed out: rc -107 Apr 22 02:18:10 lfs-oss-1-13 kernel: LustreError: Skipped 36 previous similar messages Apr 22 02:18:19 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105390f2000 Apr 22 02:18:19 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ab48e000 Apr 22 02:18:19 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104ab48e000 Apr 22 02:18:19 lfs-oss-1-13 kernel: Lustre: 32211:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 02:18:19 lfs-oss-1-13 kernel: Lustre: 32211:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 02:18:29 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810527cfb000 Apr 22 02:19:06 lfs-oss-1-13 kernel: LustreError: 31970:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c2ead3850 x1399132013186922/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335061246 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 02:19:06 lfs-oss-1-13 kernel: LustreError: 31970:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 23 previous similar messages Apr 22 02:19:06 lfs-oss-1-13 kernel: Lustre: 32165:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 02:19:06 lfs-oss-1-13 kernel: Lustre: 32165:0:(ost_handler.c:887:ost_brw_read()) Skipped 20 previous similar messages Apr 22 02:19:14 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a351dc000 Apr 22 02:19:27 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810955048000 Apr 22 02:19:41 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a9e82000 Apr 22 02:19:41 lfs-oss-1-13 kernel: Lustre: 31939:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from 63dca68b-9830-9acf-a379-3e5c8925c430@10.174.10.245@o2ib to 0xffff810c03933400; still busy with 1 active RPCs Apr 22 02:19:41 lfs-oss-1-13 kernel: Lustre: 31939:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 24 previous similar messages Apr 22 02:20:26 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810798598000 Apr 22 02:20:26 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810734996000 Apr 22 02:20:26 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810734996000 Apr 22 02:20:35 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b739b000 Apr 22 02:20:45 lfs-oss-1-13 kernel: Lustre: 31722:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 63dca68b-9830-9acf-a379-3e5c8925c430 reconnecting Apr 22 02:20:45 lfs-oss-1-13 kernel: Lustre: 31722:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 268 previous similar messages Apr 22 02:20:53 lfs-oss-1-13 kernel: LustreError: 32024:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c32072800 x1399132033850165/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335061275 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:20:53 lfs-oss-1-13 kernel: LustreError: 32024:0:(ost_handler.c:829:ost_brw_read()) Skipped 22 previous similar messages Apr 22 02:21:19 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a40216000 Apr 22 02:22:14 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034f224000 Apr 22 02:22:14 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d9cc6000 Apr 22 02:22:14 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104d9cc6000 Apr 22 02:22:14 lfs-oss-1-13 kernel: Lustre: 9207:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 02:22:14 lfs-oss-1-13 kernel: Lustre: 9207:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 13 previous similar messages Apr 22 02:22:16 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd8c60000 Apr 22 02:22:31 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810801c62000 Apr 22 02:23:02 lfs-oss-1-13 kernel: LustreError: 32160:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c28dfec00 x1399132013189386/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335061382 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:23:02 lfs-oss-1-13 kernel: LustreError: 32160:0:(ost_handler.c:822:ost_brw_read()) Skipped 2 previous similar messages Apr 22 02:23:13 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a66c2a000 Apr 22 02:23:14 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 5 seconds Apr 22 02:23:14 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 5 previous similar messages Apr 22 02:23:14 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (4) Apr 22 02:23:14 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 5 previous similar messages Apr 22 02:23:14 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810299f33000 Apr 22 02:23:34 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810914f92000 Apr 22 02:23:34 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e3dce000 Apr 22 02:23:34 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109fd348000 Apr 22 02:23:34 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8109fd348000 Apr 22 02:24:10 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a66c2a000 Apr 22 02:24:12 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100af4c0000 Apr 22 02:24:41 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034b4ea000 Apr 22 02:25:51 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aeb1e9000 Apr 22 02:26:05 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e6cc6000 Apr 22 02:26:31 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106fe6d5000 Apr 22 02:26:31 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106cc815000 Apr 22 02:26:31 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8106cc815000 Apr 22 02:27:07 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105b9368000 Apr 22 02:27:13 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109eeea2000 Apr 22 02:28:11 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dd582000 Apr 22 02:28:28 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108c8b32000 Apr 22 02:28:41 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034b4ea000 Apr 22 02:29:24 lfs-oss-1-13 kernel: Lustre: 32175:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 02:29:24 lfs-oss-1-13 kernel: Lustre: 32175:0:(ost_handler.c:887:ost_brw_read()) Skipped 23 previous similar messages Apr 22 02:29:28 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c9aae000 Apr 22 02:29:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103f2780000 Apr 22 02:29:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103f2780000 Apr 22 02:29:28 lfs-oss-1-13 kernel: LustreError: 31991:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8105c1e3d800 x1399132033860767/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335061817 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:29:28 lfs-oss-1-13 kernel: LustreError: 31991:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 02:29:28 lfs-oss-1-13 kernel: Lustre: 31991:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 02:29:28 lfs-oss-1-13 kernel: Lustre: 31991:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 02:30:04 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810159c39000 Apr 22 02:30:46 lfs-oss-1-13 kernel: Lustre: 31919:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 02:30:46 lfs-oss-1-13 kernel: Lustre: 31919:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 221 previous similar messages Apr 22 02:30:47 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810193c1e000 Apr 22 02:31:00 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 02:31:00 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 02:31:24 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81018974e000 Apr 22 02:31:24 lfs-oss-1-13 kernel: Lustre: 31792:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from d36ec9cb-d462-b00c-16b8-a5af49fac51e@10.174.10.247@o2ib to 0xffff810c20015600; still busy with 1 active RPCs Apr 22 02:31:24 lfs-oss-1-13 kernel: Lustre: 31792:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Apr 22 02:31:24 lfs-oss-1-13 kernel: LustreError: 31792:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c1b7d5000 x1398900877731196/t0 o8->d36ec9cb-d462-b00c-16b8-a5af49fac51e@NET_0x500000aae0af7_UUID:0/0 lens 368/264 e 0 to 0 dl 1335061984 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 02:31:24 lfs-oss-1-13 kernel: LustreError: 31792:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 24 previous similar messages Apr 22 02:31:25 lfs-oss-1-13 kernel: LustreError: 32033:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810803385000 x1398900877730389/t0 o3->d36ec9cb-d462-b00c-16b8-a5af49fac51e@NET_0x500000aae0af7_UUID:0/0 lens 448/400 e 0 to 0 dl 1335062032 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 02:31:25 lfs-oss-1-13 kernel: LustreError: 32033:0:(ost_handler.c:829:ost_brw_read()) Skipped 15 previous similar messages Apr 22 02:31:26 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81059d512000 Apr 22 02:31:47 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810159c39000 Apr 22 02:31:47 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105629d0000 Apr 22 02:31:47 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105629d0000 Apr 22 02:32:48 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810479934000 Apr 22 02:32:50 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810295086000 Apr 22 02:32:50 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.10.245@o2ib Apr 22 02:32:50 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 02:33:06 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e00d8000 Apr 22 02:34:05 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bab94d000 Apr 22 02:34:05 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b4488000 Apr 22 02:34:05 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102b4488000 Apr 22 02:34:10 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 4 seconds Apr 22 02:34:10 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 2 previous similar messages Apr 22 02:34:10 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (22) Apr 22 02:34:10 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 2 previous similar messages Apr 22 02:34:10 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81049a0e8000 Apr 22 02:34:48 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810773b1a000 Apr 22 02:35:09 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108b8fb6000 Apr 22 02:35:09 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bab94d000 Apr 22 02:35:09 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810bab94d000 Apr 22 02:35:22 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106fc708000 Apr 22 02:36:01 lfs-oss-1-13 kernel: Lustre: 31839:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312864456 sent from scratch1-OST008c to NID 10.174.5.177@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:36:01 lfs-oss-1-13 kernel: req@ffff8105e0cf3c00 x1398897312864456/t0 o104->@NET_0x500000aae05b1_UUID:15/16 lens 296/384 e 0 to 1 dl 1335062161 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 02:36:01 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008c: A client on nid 10.174.5.177@o2ib was evicted due to a lock blocking callback to 10.174.5.177@o2ib timed out: rc -107 Apr 22 02:36:01 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff81088c922800 x1398897312864478/t0 o105->@NET_0x500000aae05b1_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:36:01 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.5.177@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0087_UUID lock: ffff810731d9dc00/0xcca1a6f6c1bf38ae lrc: 3/0,0 mode: PW/PW res: 32775544/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x26036d3a5e029e70 expref: 22 pid: 31740 timeout 0 Apr 22 02:36:02 lfs-oss-1-13 kernel: LustreError: 23519:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.5.176@o2ib arrived at 1335062162 with bad export cookie 14745250231713805325 Apr 22 02:36:02 lfs-oss-1-13 kernel: LustreError: 23519:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 02:36:02 lfs-oss-1-13 kernel: LustreError: 11767:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.5.156@o2ib arrived at 1335062162 with bad export cookie 14745250231713790478 Apr 22 02:36:02 lfs-oss-1-13 kernel: LustreError: 11767:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 4 previous similar messages Apr 22 02:36:04 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b708e000 Apr 22 02:36:20 lfs-oss-1-13 kernel: Lustre: 31852:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312870528 sent from scratch1-OST0088 to NID 10.174.5.152@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:36:20 lfs-oss-1-13 kernel: req@ffff8105e737e400 x1398897312870528/t0 o104->@NET_0x500000aae0598_UUID:15/16 lens 296/384 e 0 to 1 dl 1335062180 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 02:36:20 lfs-oss-1-13 kernel: Lustre: 31852:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Apr 22 02:36:20 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.5.152@o2ib was evicted due to a lock blocking callback to 10.174.5.152@o2ib timed out: rc -107 Apr 22 02:36:20 lfs-oss-1-13 kernel: LustreError: Skipped 6 previous similar messages Apr 22 02:36:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c1c93c000 x1398897312870536/t0 o105->@NET_0x500000aae0598_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:36:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 22 02:36:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.5.152@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0088_UUID lock: ffff81036725dc00/0xcca1a6f6c1c066d4 lrc: 3/0,0 mode: PW/PW res: 32774985/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x60915edb0ac1b07a expref: 5 pid: 31852 timeout 0 Apr 22 02:36:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 02:36:23 lfs-oss-1-13 kernel: LustreError: 32496:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.5.152@o2ib arrived at 1335062183 with bad export cookie 14745250231541201789 Apr 22 02:36:24 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0048d000 Apr 22 02:36:24 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810162338000 Apr 22 02:36:24 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810162338000 Apr 22 02:36:41 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100330f0000 Apr 22 02:37:32 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106bdd38000 Apr 22 02:37:40 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0048d000 Apr 22 02:37:40 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81078e123000 Apr 22 02:37:40 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81078e123000 Apr 22 02:37:52 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103d33c6000 Apr 22 02:38:58 lfs-oss-1-13 kernel: Lustre: 31807:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312987385 sent from scratch1-OST0084 to NID 10.174.5.153@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:38:58 lfs-oss-1-13 kernel: req@ffff810c05680000 x1398897312987385/t0 o104->@NET_0x500000aae0599_UUID:15/16 lens 296/384 e 0 to 1 dl 1335062338 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 02:38:58 lfs-oss-1-13 kernel: Lustre: 31807:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 02:38:58 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0084: A client on nid 10.174.5.153@o2ib was evicted due to a lock blocking callback to 10.174.5.153@o2ib timed out: rc -107 Apr 22 02:38:58 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 02:38:58 lfs-oss-1-13 kernel: LustreError: 31807:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff810b732bec00 ns: filter-scratch1-OST0084_UUID lock: ffff810b0c96ee00/0xcca1a6f6c1d779fb lrc: 3/0,0 mode: --/PW res: 32781813/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x43cbd8858cbf6491 expref: 18 pid: 31807 timeout 0 Apr 22 02:39:00 lfs-oss-1-13 kernel: LustreError: 795:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.5.153@o2ib arrived at 1335062340 with bad export cookie 14745250231713784339 Apr 22 02:39:00 lfs-oss-1-13 kernel: LustreError: 795:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 02:39:01 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81040bca5000 Apr 22 02:39:08 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810050711000 Apr 22 02:39:08 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102f9e1f000 Apr 22 02:39:08 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102f9e1f000 Apr 22 02:39:40 lfs-oss-1-13 kernel: Lustre: 32233:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 02:39:40 lfs-oss-1-13 kernel: Lustre: 32233:0:(ost_handler.c:887:ost_brw_read()) Skipped 22 previous similar messages Apr 22 02:40:12 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810579829000 Apr 22 02:40:12 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109709f1000 Apr 22 02:40:12 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8109709f1000 Apr 22 02:40:12 lfs-oss-1-13 kernel: LustreError: 32194:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c2e775c50 x1399132033879918/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335062615 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:40:12 lfs-oss-1-13 kernel: LustreError: 32194:0:(ost_handler.c:1073:ost_brw_write()) Skipped 6 previous similar messages Apr 22 02:40:12 lfs-oss-1-13 kernel: Lustre: 32194:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 02:40:12 lfs-oss-1-13 kernel: Lustre: 32194:0:(ost_handler.c:1224:ost_brw_write()) Skipped 6 previous similar messages Apr 22 02:40:16 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101af974000 Apr 22 02:40:39 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dae92000 Apr 22 02:41:07 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81058fe6f000 Apr 22 02:41:07 lfs-oss-1-13 kernel: Lustre: 31761:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 02:41:07 lfs-oss-1-13 kernel: Lustre: 31761:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 226 previous similar messages Apr 22 02:41:15 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020d2d2000 Apr 22 02:41:15 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088c02c000 Apr 22 02:41:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a4d68000 Apr 22 02:41:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102a4d68000 Apr 22 02:41:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a5e48b000 Apr 22 02:41:55 lfs-oss-1-13 kernel: Lustre: 31770:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008c: refuse reconnection from eae58d84-b0cb-c116-d769-527e4c57bd0b@10.174.1.221@o2ib to 0xffff8105df970000; still busy with 1 active RPCs Apr 22 02:41:55 lfs-oss-1-13 kernel: Lustre: 31770:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 27 previous similar messages Apr 22 02:41:55 lfs-oss-1-13 kernel: LustreError: 31770:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105edf4cc00 x1398900875875217/t0 o8->eae58d84-b0cb-c116-d769-527e4c57bd0b@NET_0x500000aae01dd_UUID:0/0 lens 368/264 e 0 to 0 dl 1335062615 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 02:41:55 lfs-oss-1-13 kernel: LustreError: 31770:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 35 previous similar messages Apr 22 02:41:56 lfs-oss-1-13 kernel: LustreError: 32166:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105bcdcc000 x1398900875873220/t0 o3->eae58d84-b0cb-c116-d769-527e4c57bd0b@NET_0x500000aae01dd_UUID:0/0 lens 448/400 e 1 to 0 dl 1335062531 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 02:41:56 lfs-oss-1-13 kernel: LustreError: 32166:0:(ost_handler.c:829:ost_brw_read()) Skipped 24 previous similar messages Apr 22 02:42:09 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a992c000 Apr 22 02:42:35 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103473a1000 Apr 22 02:42:56 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81097e11d000 Apr 22 02:42:56 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81097e11d000 Apr 22 02:42:56 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016ed96000 Apr 22 02:42:56 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 02:42:56 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 02:43:25 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810789032000 Apr 22 02:44:28 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ce116000 Apr 22 02:44:42 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 02:44:42 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 8 previous similar messages Apr 22 02:44:42 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (26) Apr 22 02:44:42 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 8 previous similar messages Apr 22 02:44:42 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810854262000 Apr 22 02:45:27 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054ad45000 Apr 22 02:45:27 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810063665000 Apr 22 02:45:27 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810063665000 Apr 22 02:45:31 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bc6890000 Apr 22 02:45:57 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa7537000 Apr 22 02:46:30 lfs-oss-1-13 kernel: Lustre: 31963:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897312995568 sent from scratch1-OST0089 to NID 10.174.12.157@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:46:30 lfs-oss-1-13 kernel: req@ffff8105ceebb000 x1398897312995568/t0 o104->@NET_0x500000aae0c9d_UUID:15/16 lens 296/384 e 0 to 1 dl 1335062790 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 02:46:30 lfs-oss-1-13 kernel: Lustre: 31963:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 02:46:30 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.12.157@o2ib was evicted due to a lock blocking callback to 10.174.12.157@o2ib timed out: rc -107 Apr 22 02:46:30 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 02:46:30 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c1fb6a800 x1398897312995577/t0 o105->@NET_0x500000aae0c9d_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:46:30 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.12.157@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0089_UUID lock: ffff8101a31c5c00/0xcca1a6f6c1d9259c lrc: 3/0,0 mode: PW/PW res: 32779962/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xa9dc46370fd22589 expref: 118 pid: 31963 timeout 0 Apr 22 02:46:30 lfs-oss-1-13 kernel: LustreError: 32005:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff8105d7147000 x1398900882465424/t0 o4->07ae4f90-0ea4-ef15-4a35-4cc0953f3075@NET_0x500000aae0c9d_UUID:0/0 lens 448/416 e 0 to 0 dl 1335062899 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 02:46:33 lfs-oss-1-13 kernel: LustreError: 28462:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.15.230@o2ib arrived at 1335062793 with bad export cookie 14745250231541337148 Apr 22 02:46:42 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81019dc82000 Apr 22 02:46:54 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81068a5a4000 Apr 22 02:47:00 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104b96ce000 Apr 22 02:47:24 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a13cbc000 Apr 22 02:48:11 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81038adeb000 Apr 22 02:48:11 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81041db8f000 Apr 22 02:48:11 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81041db8f000 Apr 22 02:48:15 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81048a28e000 Apr 22 02:49:18 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103a70bc000 Apr 22 02:49:23 lfs-oss-1-13 kernel: Lustre: Service thread pid 32148 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 02:49:23 lfs-oss-1-13 kernel: Pid: 32148, comm: ll_ost_io_155 Apr 22 02:49:23 lfs-oss-1-13 kernel: Apr 22 02:49:23 lfs-oss-1-13 kernel: Call Trace: Apr 22 02:49:23 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 02:49:23 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 02:49:23 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 02:49:23 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 02:49:23 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 02:49:23 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 02:49:23 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 02:49:23 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 02:49:23 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 02:49:23 lfs-oss-1-13 kernel: Apr 22 02:50:34 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aeaf67000 Apr 22 02:50:35 lfs-oss-1-13 kernel: Lustre: 32171:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008c: ignoring bulk IO comm error with 80b99154-05d8-fe08-ddaa-c556960f811e@NET_0x500000aae01d9_UUID id 12345-10.174.1.217@o2ib - client will retry Apr 22 02:50:35 lfs-oss-1-13 kernel: Lustre: 32171:0:(ost_handler.c:887:ost_brw_read()) Skipped 22 previous similar messages Apr 22 02:50:55 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bbcffe000 Apr 22 02:50:55 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024d985000 Apr 22 02:50:55 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81024d985000 Apr 22 02:50:55 lfs-oss-1-13 kernel: LustreError: 32235:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c34d08850 x1399132033892444/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335063258 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:50:55 lfs-oss-1-13 kernel: LustreError: 32235:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 02:50:55 lfs-oss-1-13 kernel: Lustre: 32235:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 02:50:55 lfs-oss-1-13 kernel: Lustre: 32235:0:(ost_handler.c:1224:ost_brw_write()) Skipped 6 previous similar messages Apr 22 02:51:00 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810506d36000 Apr 22 02:51:14 lfs-oss-1-13 kernel: Lustre: 31765:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: 80b99154-05d8-fe08-ddaa-c556960f811e reconnecting Apr 22 02:51:14 lfs-oss-1-13 kernel: Lustre: 31765:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 231 previous similar messages Apr 22 02:51:36 lfs-oss-1-13 kernel: LustreError: 32207:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105cbf3d000 x1399132033892443/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335063096 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:51:36 lfs-oss-1-13 kernel: LustreError: 32207:0:(ost_handler.c:822:ost_brw_read()) Skipped 6 previous similar messages Apr 22 02:52:40 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109cb258000 Apr 22 02:52:54 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 02:53:02 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101caae5000 Apr 22 02:53:02 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ada278000 Apr 22 02:53:02 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810ada278000 Apr 22 02:53:02 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 02:53:02 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 30 previous similar messages Apr 22 02:53:02 lfs-oss-1-13 kernel: Lustre: 31791:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff8105a6fff200; still busy with 1 active RPCs Apr 22 02:53:02 lfs-oss-1-13 kernel: Lustre: 31791:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 19 previous similar messages Apr 22 02:53:02 lfs-oss-1-13 kernel: LustreError: 31791:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105bf9be400 x1399132033896303/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335063282 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 02:53:02 lfs-oss-1-13 kernel: LustreError: 31791:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Apr 22 02:53:40 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d0551000 Apr 22 02:53:40 lfs-oss-1-13 kernel: LustreError: 32004:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c27239800 x1398900878636213/t0 o3->63dca68b-9830-9acf-a379-3e5c8925c430@NET_0x500000aae0af5_UUID:0/0 lens 448/400 e 0 to 0 dl 1335063672 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 02:53:40 lfs-oss-1-13 kernel: LustreError: 32004:0:(ost_handler.c:829:ost_brw_read()) Skipped 17 previous similar messages Apr 22 02:55:25 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81070951e000 Apr 22 02:55:33 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101aec2e000 Apr 22 02:55:46 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b0ac6000 Apr 22 02:55:46 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a693bc000 Apr 22 02:55:46 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a693bc000 Apr 22 02:56:14 lfs-oss-1-13 kernel: LustreError: 32014:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105a8e0b800 x1398901148339140/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335063374 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:56:27 lfs-oss-1-13 kernel: Lustre: scratch1-OST0089: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 02:57:07 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81094d21a000 Apr 22 02:57:07 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810386f06000 Apr 22 02:57:07 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810386f06000 Apr 22 02:57:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 02:57:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 4 previous similar messages Apr 22 02:57:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (46) Apr 22 02:57:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 4 previous similar messages Apr 22 02:57:31 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81070951e000 Apr 22 02:58:18 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104894d4000 Apr 22 02:58:18 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a53591000 Apr 22 02:58:18 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a53591000 Apr 22 02:59:02 lfs-oss-1-13 kernel: Lustre: 32162:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-179), not sending early reply Apr 22 02:59:02 lfs-oss-1-13 kernel: req@ffff810c34e9c050 x1399132013214453/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 2 to 0 dl 1335063547 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 02:59:07 lfs-oss-1-13 kernel: LustreError: 32148:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 784+0s req@ffff810c34e9c050 x1399132013214453/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 2 to 0 dl 1335063547 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 02:59:07 lfs-oss-1-13 kernel: Lustre: Service thread pid 32148 completed after 784.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 02:59:20 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b52a3000 Apr 22 02:59:20 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81035a5fc000 Apr 22 02:59:20 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81035a5fc000 Apr 22 02:59:38 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810236bde000 Apr 22 02:59:48 lfs-oss-1-13 kernel: Lustre: 31835:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897313133421 sent from scratch1-OST008a to NID 10.174.17.10@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 02:59:48 lfs-oss-1-13 kernel: req@ffff81095a34c000 x1398897313133421/t0 o104->@NET_0x500000aae110a_UUID:15/16 lens 296/384 e 0 to 1 dl 1335063588 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 02:59:48 lfs-oss-1-13 kernel: Lustre: 31835:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Apr 22 02:59:48 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008a: A client on nid 10.174.17.10@o2ib was evicted due to a lock blocking callback to 10.174.17.10@o2ib timed out: rc -107 Apr 22 02:59:48 lfs-oss-1-13 kernel: LustreError: Skipped 6 previous similar messages Apr 22 02:59:53 lfs-oss-1-13 kernel: LustreError: 23516:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.17.10@o2ib arrived at 1335063593 with bad export cookie 14745250231713821061 Apr 22 02:59:53 lfs-oss-1-13 kernel: LustreError: 23516:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 5 previous similar messages Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: scratch1-OST0084: received MDS connection from 10.174.31.241@o2ib Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: 31820:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST0084: deleting orphan objects from 32790664 to 32791969, orphan objids won't be reused any more. Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: 31820:0:(filter.c:3126:filter_destroy_precreated()) Skipped 2 previous similar messages Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: 31734:0:(filter.c:3126:filter_destroy_precreated()) scratch1-OST0087: deleting orphan objects from 32791817 to 32793121, orphan objids won't be reused any more. Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: 31734:0:(filter.c:3126:filter_destroy_precreated()) Skipped 3 previous similar messages Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: received MDS connection from 10.174.31.241@o2ib Apr 22 02:59:53 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 03:00:53 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a42a72000 Apr 22 03:00:53 lfs-oss-1-13 kernel: Lustre: 32160:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 03:00:53 lfs-oss-1-13 kernel: Lustre: 32160:0:(ost_handler.c:887:ost_brw_read()) Skipped 14 previous similar messages Apr 22 03:01:01 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8d608000 Apr 22 03:01:01 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088b733000 Apr 22 03:01:01 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81088b733000 Apr 22 03:01:01 lfs-oss-1-13 kernel: LustreError: 32183:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8105df6f3c00 x1399132033903470/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335063868 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:01:01 lfs-oss-1-13 kernel: LustreError: 32183:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 03:01:01 lfs-oss-1-13 kernel: Lustre: 32183:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:01:01 lfs-oss-1-13 kernel: Lustre: 32183:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 03:01:19 lfs-oss-1-13 kernel: Lustre: 31773:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 03:01:19 lfs-oss-1-13 kernel: Lustre: 31773:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 131 previous similar messages Apr 22 03:02:05 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810254189000 Apr 22 03:02:05 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810597ac9000 Apr 22 03:02:05 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810597ac9000 Apr 22 03:02:09 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a42a72000 Apr 22 03:02:13 lfs-oss-1-13 kernel: Lustre: scratch1-OST008a: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 03:02:22 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81084430a000 Apr 22 03:02:22 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810097251000 Apr 22 03:03:10 lfs-oss-1-13 kernel: Lustre: 31793:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 03:03:10 lfs-oss-1-13 kernel: Lustre: 31793:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 18 previous similar messages Apr 22 03:03:10 lfs-oss-1-13 kernel: LustreError: 31793:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810b156cf400 x1398901148347724/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335063890 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 03:03:10 lfs-oss-1-13 kernel: LustreError: 31793:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 19 previous similar messages Apr 22 03:03:24 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109a2cb4000 Apr 22 03:03:25 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810920f3e000 Apr 22 03:03:25 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 03:03:25 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 10 previous similar messages Apr 22 03:03:46 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103aa6b5000 Apr 22 03:03:46 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ad95c000 Apr 22 03:03:46 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107ad95c000 Apr 22 03:04:27 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104882ca000 Apr 22 03:04:28 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026bd01000 Apr 22 03:04:36 lfs-oss-1-13 kernel: LustreError: 32093:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c394a3000 x1399132013231344/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335064325 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:04:36 lfs-oss-1-13 kernel: LustreError: 32093:0:(ost_handler.c:829:ost_brw_read()) Skipped 17 previous similar messages Apr 22 03:05:13 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae2ac1000 Apr 22 03:05:40 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108b4eed000 Apr 22 03:05:44 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026bd01000 Apr 22 03:06:17 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810074438000 Apr 22 03:06:17 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104f192a000 Apr 22 03:06:17 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104f192a000 Apr 22 03:06:47 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089ff30000 Apr 22 03:07:12 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81081c5bc000 Apr 22 03:08:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 03:08:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 6 previous similar messages Apr 22 03:08:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (30) Apr 22 03:08:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 6 previous similar messages Apr 22 03:08:03 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81004cbd9000 Apr 22 03:08:49 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810154dbc000 Apr 22 03:08:49 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109da16e000 Apr 22 03:08:49 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8109da16e000 Apr 22 03:08:49 lfs-oss-1-13 kernel: Lustre: 31849:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST0084: exp ffff8105a2586a00 already connecting Apr 22 03:09:19 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810717419000 Apr 22 03:10:04 lfs-oss-1-13 kernel: Lustre: scratch1-OST008d: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 03:10:04 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 03:10:21 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bde40000 Apr 22 03:10:35 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b12c45000 Apr 22 03:11:20 lfs-oss-1-13 kernel: Lustre: 31900:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 03:11:20 lfs-oss-1-13 kernel: Lustre: 31900:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 244 previous similar messages Apr 22 03:11:20 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81022742c000 Apr 22 03:11:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a7d19000 Apr 22 03:11:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102a7d19000 Apr 22 03:11:20 lfs-oss-1-13 kernel: LustreError: 32117:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c2499f800 x1399132033913514/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335064487 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:11:20 lfs-oss-1-13 kernel: LustreError: 32117:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 03:11:20 lfs-oss-1-13 kernel: Lustre: 32117:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:11:20 lfs-oss-1-13 kernel: Lustre: 32117:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 03:11:50 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81057026a000 Apr 22 03:11:51 lfs-oss-1-13 kernel: Lustre: 32137:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID id 12345-10.174.6.178@o2ib - client will retry Apr 22 03:11:51 lfs-oss-1-13 kernel: Lustre: 32137:0:(ost_handler.c:887:ost_brw_read()) Skipped 20 previous similar messages Apr 22 03:13:06 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a1792b000 Apr 22 03:13:28 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810862fea000 Apr 22 03:13:28 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b3e38000 Apr 22 03:13:28 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8100b3e38000 Apr 22 03:13:28 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 03:13:28 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 16 previous similar messages Apr 22 03:13:28 lfs-oss-1-13 kernel: Lustre: 31867:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c0f8a6c00; still busy with 1 active RPCs Apr 22 03:13:28 lfs-oss-1-13 kernel: Lustre: 31867:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 22 previous similar messages Apr 22 03:13:28 lfs-oss-1-13 kernel: LustreError: 31867:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8106b07b8000 x1399132033916323/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335064508 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 03:13:28 lfs-oss-1-13 kernel: LustreError: 31867:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 23 previous similar messages Apr 22 03:13:30 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108665f6000 Apr 22 03:14:22 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810862fea000 Apr 22 03:14:33 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810856d18000 Apr 22 03:14:42 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c06af9000 Apr 22 03:14:42 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e7133000 Apr 22 03:14:42 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103e7133000 Apr 22 03:15:08 lfs-oss-1-13 kernel: LustreError: 32241:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c14c70800 x1399132033916726/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335064691 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:15:08 lfs-oss-1-13 kernel: LustreError: 32241:0:(ost_handler.c:829:ost_brw_read()) Skipped 20 previous similar messages Apr 22 03:15:37 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810990622000 Apr 22 03:15:38 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81095963d000 Apr 22 03:16:23 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a1e4fe000 Apr 22 03:16:23 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bc8913000 Apr 22 03:16:23 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810bc8913000 Apr 22 03:16:23 lfs-oss-1-13 kernel: LustreError: 32147:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105dff30c00 x1399132033917937/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335064785 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:16:23 lfs-oss-1-13 kernel: LustreError: 32147:0:(ost_handler.c:1064:ost_brw_write()) Skipped 24 previous similar messages Apr 22 03:16:46 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104af99f000 Apr 22 03:17:05 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a6630e000 Apr 22 03:17:26 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810414c57000 Apr 22 03:17:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108170c8000 Apr 22 03:17:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108170c8000 Apr 22 03:18:08 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b88ec000 Apr 22 03:18:09 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 5 seconds Apr 22 03:18:09 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 03:18:09 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (36) Apr 22 03:18:09 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 03:18:09 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109ca9ae000 Apr 22 03:18:55 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b31716000 Apr 22 03:18:55 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103a999e000 Apr 22 03:18:55 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103a999e000 Apr 22 03:19:18 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073349b000 Apr 22 03:20:02 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810465f4e000 Apr 22 03:20:37 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810951218000 Apr 22 03:21:06 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81022e2d7000 Apr 22 03:21:17 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a5f970000 Apr 22 03:21:24 lfs-oss-1-13 kernel: Lustre: 31862:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 03:21:24 lfs-oss-1-13 kernel: Lustre: 31862:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 221 previous similar messages Apr 22 03:22:04 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e8fb1000 Apr 22 03:22:04 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103a5693000 Apr 22 03:22:04 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103a5693000 Apr 22 03:22:04 lfs-oss-1-13 kernel: LustreError: 32030:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8109bf4c0800 x1399132033923594/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335065135 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:22:04 lfs-oss-1-13 kernel: LustreError: 32030:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 03:22:04 lfs-oss-1-13 kernel: Lustre: 32030:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:22:04 lfs-oss-1-13 kernel: Lustre: 32030:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 03:22:04 lfs-oss-1-13 kernel: Lustre: 32101:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:22:04 lfs-oss-1-13 kernel: Lustre: 32101:0:(ost_handler.c:887:ost_brw_read()) Skipped 22 previous similar messages Apr 22 03:22:15 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4d8f2000 Apr 22 03:22:33 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105470f0000 Apr 22 03:23:05 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 03:23:05 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 03:23:07 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810305143000 Apr 22 03:23:07 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810422c99000 Apr 22 03:23:07 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810422c99000 Apr 22 03:23:25 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81018bbfa000 Apr 22 03:23:45 lfs-oss-1-13 kernel: Lustre: 31823:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c0f8a6c00; still busy with 1 active RPCs Apr 22 03:23:45 lfs-oss-1-13 kernel: Lustre: 31823:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 33 previous similar messages Apr 22 03:23:45 lfs-oss-1-13 kernel: LustreError: 31823:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105aaea0c00 x1399132033926403/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335065125 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 03:23:45 lfs-oss-1-13 kernel: LustreError: 31823:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 33 previous similar messages Apr 22 03:24:27 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810af8fc8000 Apr 22 03:24:27 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 03:24:27 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 17 previous similar messages Apr 22 03:24:40 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810294f4d000 Apr 22 03:25:01 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b040d5000 Apr 22 03:25:01 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102cbea9000 Apr 22 03:25:01 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102cbea9000 Apr 22 03:25:11 lfs-oss-1-13 kernel: LustreError: 32155:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105b4a07800 x1398901148367400/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335065272 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:25:11 lfs-oss-1-13 kernel: LustreError: 32155:0:(ost_handler.c:829:ost_brw_read()) Skipped 21 previous similar messages Apr 22 03:25:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810935828000 Apr 22 03:26:04 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109063b1000 Apr 22 03:26:04 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae3fd2000 Apr 22 03:26:04 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810ae3fd2000 Apr 22 03:26:33 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088f536000 Apr 22 03:27:20 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810433436000 Apr 22 03:27:20 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108f95d6000 Apr 22 03:27:20 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108f95d6000 Apr 22 03:28:05 lfs-oss-1-13 kernel: Lustre: Service thread pid 32097 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 03:28:05 lfs-oss-1-13 kernel: Pid: 32097, comm: ll_ost_io_104 Apr 22 03:28:05 lfs-oss-1-13 kernel: Apr 22 03:28:05 lfs-oss-1-13 kernel: Call Trace: Apr 22 03:28:05 lfs-oss-1-13 kernel: [] lock_timer_base+0x1b/0x3c Apr 22 03:28:05 lfs-oss-1-13 kernel: [] __mod_timer+0x100/0x10f Apr 22 03:28:05 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 03:28:05 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 03:28:05 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 03:28:05 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 03:28:05 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 03:28:05 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 03:28:05 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 03:28:05 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 03:28:05 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 03:28:05 lfs-oss-1-13 kernel: Apr 22 03:28:39 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103171da000 Apr 22 03:29:26 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c34252000 Apr 22 03:29:26 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9c40d000 Apr 22 03:29:26 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a9c40d000 Apr 22 03:30:46 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810464108000 Apr 22 03:31:32 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81053af1e000 Apr 22 03:31:32 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a56af000 Apr 22 03:31:32 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102a56af000 Apr 22 03:31:35 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.45@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff8102fb354a00/0xcca1a6f6c259d317 lrc: 3/0,0 mode: PR/PR res: 32822553/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xdfdedee40f647031 expref: 10 pid: 31719 timeout 5271980936 Apr 22 03:31:35 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 2 previous similar messages Apr 22 03:31:35 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810892116000 x1398897313729863/t0 o105->@NET_0x500000aae0e2d_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 03:31:35 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 22 03:31:35 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.45@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0084_UUID lock: ffff810246bbda00/0xcca1a6f6c25ddc1c lrc: 3/0,0 mode: PW/PW res: 32822553/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->12287) flags: 0x0 remote: 0xdfdedee40f64705b expref: 6 pid: 31819 timeout 0 Apr 22 03:31:35 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 03:31:50 lfs-oss-1-13 kernel: Lustre: 31909:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 9c5042b6-6755-94c5-27c4-67821c300532 reconnecting Apr 22 03:31:50 lfs-oss-1-13 kernel: Lustre: 31909:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 167 previous similar messages Apr 22 03:32:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 03:32:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 4 previous similar messages Apr 22 03:32:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (45) Apr 22 03:32:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 4 previous similar messages Apr 22 03:32:02 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad6a78000 Apr 22 03:32:11 lfs-oss-1-13 kernel: Lustre: 32211:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 9c5042b6-6755-94c5-27c4-67821c300532@NET_0x500000aae0e2d_UUID id 12345-10.174.14.45@o2ib - client will retry Apr 22 03:32:11 lfs-oss-1-13 kernel: Lustre: 32211:0:(ost_handler.c:887:ost_brw_read()) Skipped 18 previous similar messages Apr 22 03:32:16 lfs-oss-1-13 kernel: LustreError: 32224:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8109e0fa7000 x1399132033932823/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335065536 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:33:13 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a47365000 Apr 22 03:33:13 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810900929000 Apr 22 03:33:13 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810900929000 Apr 22 03:33:13 lfs-oss-1-13 kernel: LustreError: 32193:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c34e91850 x1399132033934427/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335065802 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:33:13 lfs-oss-1-13 kernel: LustreError: 32193:0:(ost_handler.c:1073:ost_brw_write()) Skipped 6 previous similar messages Apr 22 03:33:13 lfs-oss-1-13 kernel: Lustre: 32193:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:33:13 lfs-oss-1-13 kernel: Lustre: 32193:0:(ost_handler.c:1224:ost_brw_write()) Skipped 6 previous similar messages Apr 22 03:33:18 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bde40000 Apr 22 03:34:03 lfs-oss-1-13 kernel: Lustre: 31853:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c0f8a6c00; still busy with 1 active RPCs Apr 22 03:34:03 lfs-oss-1-13 kernel: Lustre: 31853:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 23 previous similar messages Apr 22 03:34:03 lfs-oss-1-13 kernel: LustreError: 31853:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a5455800 x1399132033936022/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335065743 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 03:34:03 lfs-oss-1-13 kernel: LustreError: 31853:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 25 previous similar messages Apr 22 03:34:35 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 03:34:35 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 03:34:45 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81036df84000 Apr 22 03:34:45 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 03:34:45 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 13 previous similar messages Apr 22 03:35:21 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810299998000 Apr 22 03:35:21 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105105b4000 Apr 22 03:35:21 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105105b4000 Apr 22 03:35:21 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105b00f8000 x1399132033936425/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335065931 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:35:21 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:829:ost_brw_read()) Skipped 15 previous similar messages Apr 22 03:36:46 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810085a40000 Apr 22 03:36:48 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ba5436000 Apr 22 03:36:48 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b810a4000 Apr 22 03:36:48 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b810a4000 Apr 22 03:38:03 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c1bf3000 Apr 22 03:38:03 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101502c1000 Apr 22 03:38:03 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8101502c1000 Apr 22 03:38:07 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d2cf0000 Apr 22 03:38:49 lfs-oss-1-13 kernel: LustreError: 32248:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810ab444f000 x1399132033939243/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335065929 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:39:16 lfs-oss-1-13 kernel: Lustre: 31803:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897313777798 sent from scratch1-OST0087 to NID 10.174.12.149@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 03:39:16 lfs-oss-1-13 kernel: req@ffff8105b1d70400 x1398897313777798/t0 o104->@NET_0x500000aae0c95_UUID:15/16 lens 296/384 e 0 to 1 dl 1335065956 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 03:39:16 lfs-oss-1-13 kernel: Lustre: 31803:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 03:39:16 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0087: A client on nid 10.174.12.149@o2ib was evicted due to a lock blocking callback to 10.174.12.149@o2ib timed out: rc -107 Apr 22 03:39:16 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 03:39:55 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810668005000 Apr 22 03:39:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bcfc5000 Apr 22 03:39:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103bcfc5000 Apr 22 03:40:09 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810872a70000 Apr 22 03:40:09 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109658b6000 Apr 22 03:40:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b9e370000 Apr 22 03:41:43 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c07bc4000 Apr 22 03:41:43 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102e6c86000 Apr 22 03:41:43 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:844:ost_brw_read()) @@@ bulk PUT failed: rc -107 req@ffff810849afe400 x1398900886037051/t0 o3->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 448/400 e 0 to 0 dl 1335066780 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 03:41:43 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:844:ost_brw_read()) Skipped 3 previous similar messages Apr 22 03:41:55 lfs-oss-1-13 kernel: Lustre: 31904:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 03:41:55 lfs-oss-1-13 kernel: Lustre: 31904:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 143 previous similar messages Apr 22 03:42:06 lfs-oss-1-13 kernel: Lustre: 32098:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-441), not sending early reply Apr 22 03:42:06 lfs-oss-1-13 kernel: req@ffff810ad965e800 x1399132013251914/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 1 to 0 dl 1335066131 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 03:42:11 lfs-oss-1-13 kernel: LustreError: 32097:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 1046+0s req@ffff810ad965e800 x1399132013251914/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 1 to 0 dl 1335066131 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:42:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32097 completed after 1046.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.68@o2ib (26) Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102e2489000 Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103d7cbc000 Apr 22 03:42:30 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103d7cbc000 Apr 22 03:42:30 lfs-oss-1-13 kernel: Lustre: 32114:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:42:30 lfs-oss-1-13 kernel: Lustre: 32114:0:(ost_handler.c:887:ost_brw_read()) Skipped 17 previous similar messages Apr 22 03:42:45 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103d2810000 Apr 22 03:42:59 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027ec08000 Apr 22 03:42:59 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810847a46000 Apr 22 03:42:59 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a5a8e000 Apr 22 03:42:59 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104890ac000 Apr 22 03:44:10 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104b046f000 Apr 22 03:44:10 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102ab20b000 Apr 22 03:44:10 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102ab20b000 Apr 22 03:44:10 lfs-oss-1-13 kernel: LustreError: 32130:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810616d94800 x1399132033944853/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335066291 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:44:10 lfs-oss-1-13 kernel: LustreError: 32130:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 03:44:10 lfs-oss-1-13 kernel: Lustre: 32130:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:44:10 lfs-oss-1-13 kernel: Lustre: 32130:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 03:45:04 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023f4b6000 Apr 22 03:45:04 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 03:45:04 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 11 previous similar messages Apr 22 03:45:04 lfs-oss-1-13 kernel: Lustre: 31981:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0086: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bf313200; still busy with 1 active RPCs Apr 22 03:45:04 lfs-oss-1-13 kernel: Lustre: 31981:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 19 previous similar messages Apr 22 03:45:04 lfs-oss-1-13 kernel: LustreError: 31981:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105bd2a7c00 x1398901148388200/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335066404 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 03:45:04 lfs-oss-1-13 kernel: LustreError: 31981:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 20 previous similar messages Apr 22 03:45:12 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd6680000 Apr 22 03:45:12 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b50926000 Apr 22 03:45:12 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b8ef2a000 Apr 22 03:45:12 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107510e6000 Apr 22 03:45:32 lfs-oss-1-13 kernel: LustreError: 32126:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c179ea000 x1399132033944854/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335066453 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:45:32 lfs-oss-1-13 kernel: LustreError: 32126:0:(ost_handler.c:829:ost_brw_read()) Skipped 21 previous similar messages Apr 22 03:45:56 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bc3d0f000 Apr 22 03:46:59 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8a293000 Apr 22 03:47:06 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81067a113000 Apr 22 03:47:06 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d50cd000 Apr 22 03:47:06 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107d50cd000 Apr 22 03:47:12 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108afa52000 Apr 22 03:47:12 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810065f78000 Apr 22 03:47:12 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa66e0000 Apr 22 03:47:55 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810025b12000 Apr 22 03:47:57 lfs-oss-1-13 kernel: LustreError: 32109:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810a4aae9000 x1398900886041903/t0 o3->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 448/400 e 0 to 0 dl 1335066477 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:48:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089d936000 Apr 22 03:48:52 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81012a3b3000 Apr 22 03:49:50 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d5f22000 Apr 22 03:49:50 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b273ea000 Apr 22 03:49:50 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b273ea000 Apr 22 03:49:55 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81056b989000 Apr 22 03:49:59 lfs-oss-1-13 kernel: Lustre: 32242:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-72), not sending early reply Apr 22 03:49:59 lfs-oss-1-13 kernel: req@ffff810c1e40d400 x1398900875178125/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335066604 ref 2 fl Interpret:/0/0 rc 0/0 Apr 22 03:50:10 lfs-oss-1-13 kernel: LustreError: 32241:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff81088c922800 x1398900872518273/t0 o4->8e1eaa91-77c8-5dde-16c0-c8940fd656f0@NET_0x500000aae0623_UUID:0/0 lens 448/416 e 0 to 0 dl 1335067365 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 03:50:35 lfs-oss-1-13 kernel: LustreError: 32025:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff8105ab4e4c00 x1398900872540133/t0 o4->d0ecf08c-717c-9267-79c4-6f1da861df0b@NET_0x500000aae0625_UUID:0/0 lens 448/416 e 0 to 0 dl 1335067390 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 03:50:59 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e818c000 Apr 22 03:51:40 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.6.37@o2ib ns: filter-scratch1-OST008b_UUID lock: ffff810069125e00/0xcca1a6f6c2944a7c lrc: 3/0,0 mode: PR/PR res: 32841353/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x58fdbc2816b5beb5 expref: 16 pid: 31766 timeout 5273185689 Apr 22 03:51:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff81082e4a9400 x1398897314025446/t0 o105->@NET_0x500000aae0625_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 03:51:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.37@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008b_UUID lock: ffff810712b7ce00/0xcca1a6f6c29709d2 lrc: 3/0,0 mode: PW/PW res: 32841353/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->315391) flags: 0x0 remote: 0x58fdbc2816b5c76e expref: 10 pid: 31755 timeout 0 Apr 22 03:52:07 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108d9e1c000 Apr 22 03:52:07 lfs-oss-1-13 kernel: Lustre: 31874:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 03:52:07 lfs-oss-1-13 kernel: Lustre: 31874:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 315 previous similar messages Apr 22 03:52:14 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024fbe6000 Apr 22 03:52:23 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81013848f000 Apr 22 03:52:23 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a1a97000 Apr 22 03:52:23 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8101a1a97000 Apr 22 03:52:32 lfs-oss-1-13 kernel: Lustre: scratch1-OST008b: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 03:52:32 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 03:53:10 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81052ac97000 Apr 22 03:53:11 lfs-oss-1-13 kernel: Lustre: 32079:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 03:53:11 lfs-oss-1-13 kernel: Lustre: 32079:0:(ost_handler.c:887:ost_brw_read()) Skipped 41 previous similar messages Apr 22 03:53:50 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dd2c5000 Apr 22 03:53:50 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4c404000 Apr 22 03:53:50 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a4c404000 Apr 22 03:54:07 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810996cbe000 Apr 22 03:54:40 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81061e528000 Apr 22 03:54:40 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a469e4000 Apr 22 03:55:04 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a76589000 Apr 22 03:55:04 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 03:55:04 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 22 03:55:19 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103676d2000 Apr 22 03:55:19 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a3af8000 Apr 22 03:55:19 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108a3af8000 Apr 22 03:55:19 lfs-oss-1-13 kernel: LustreError: 32143:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff8105a3295800 x1399132033956161/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335067126 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:55:19 lfs-oss-1-13 kernel: LustreError: 32143:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 03:55:19 lfs-oss-1-13 kernel: Lustre: 32143:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 03:55:19 lfs-oss-1-13 kernel: Lustre: 32143:0:(ost_handler.c:1224:ost_brw_write()) Skipped 6 previous similar messages Apr 22 03:55:23 lfs-oss-1-13 kernel: Lustre: 31984:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from 9c5b2e9f-7628-0434-e306-fe0f7f302694@10.174.14.49@o2ib to 0xffff810c24797a00; still busy with 3 active RPCs Apr 22 03:55:23 lfs-oss-1-13 kernel: Lustre: 31984:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 44 previous similar messages Apr 22 03:55:23 lfs-oss-1-13 kernel: LustreError: 31984:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105e9747800 x1398900875195577/t0 o8->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 368/264 e 0 to 0 dl 1335067023 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 03:55:23 lfs-oss-1-13 kernel: LustreError: 31984:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 45 previous similar messages Apr 22 03:56:05 lfs-oss-1-13 kernel: LustreError: 32206:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810837d8ec00 x1398901148459151/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335067055 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:56:05 lfs-oss-1-13 kernel: LustreError: 32206:0:(ost_handler.c:829:ost_brw_read()) Skipped 30 previous similar messages Apr 22 03:56:59 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b50926000 Apr 22 03:56:59 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0c1d4000 Apr 22 03:57:11 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103df10e000 Apr 22 03:57:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 03:57:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 9 previous similar messages Apr 22 03:57:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (45) Apr 22 03:57:30 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 9 previous similar messages Apr 22 03:57:30 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104b6cb5000 Apr 22 03:57:47 lfs-oss-1-13 kernel: LustreError: 32196:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c25469800 x1398900875196381/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335067067 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:57:47 lfs-oss-1-13 kernel: LustreError: 32196:0:(ost_handler.c:822:ost_brw_read()) Skipped 6 previous similar messages Apr 22 03:57:51 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810212d25000 Apr 22 03:57:51 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108403ce000 Apr 22 03:57:51 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108403ce000 Apr 22 03:57:51 lfs-oss-1-13 kernel: Lustre: 31756:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST0087: exp ffff81059f708800 already connecting Apr 22 03:58:19 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104844b0000 Apr 22 03:58:53 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2613f000 Apr 22 03:58:53 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100804c4000 Apr 22 03:58:53 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8100804c4000 Apr 22 03:58:58 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034e1e8000 Apr 22 03:59:23 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810674398000 Apr 22 03:59:24 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810233da4000 Apr 22 03:59:24 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109baa8a000 Apr 22 03:59:24 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065f3fa000 Apr 22 03:59:24 lfs-oss-1-13 kernel: LustreError: 31738:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 16 < 18 req@ffff8107f07eb400 x1398900886054330/t0 o8->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 368/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 03:59:32 lfs-oss-1-13 kernel: LustreError: 32090:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c358ca050 x1399132033960580/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335067409 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 03:59:56 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81040b430000 Apr 22 03:59:56 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073149a000 Apr 22 03:59:56 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b9005c000 Apr 22 03:59:56 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e9920000 Apr 22 03:59:56 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027ec0a000 Apr 22 03:59:56 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b8ef2a000 Apr 22 03:59:58 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e004b000 Apr 22 03:59:58 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810155913000 Apr 22 03:59:58 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810155913000 Apr 22 04:00:26 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065f3fa000 Apr 22 04:00:40 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81005c6cc000 Apr 22 04:00:40 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008a1d6000 Apr 22 04:00:52 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810785c9d000 Apr 22 04:01:12 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c3ec1000 Apr 22 04:01:12 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106373b2000 Apr 22 04:01:12 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8106373b2000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b3014000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108c0844000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106f3184000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065f3fa000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089bfa8000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a5d346000 Apr 22 04:01:37 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b05d76000 Apr 22 04:01:41 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81007657c000 Apr 22 04:01:43 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089e060000 Apr 22 04:01:43 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b8deae000 Apr 22 04:01:43 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107b792a000 Apr 22 04:02:15 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108bfb6d000 Apr 22 04:02:15 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a4691000 Apr 22 04:02:15 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107a4691000 Apr 22 04:02:15 lfs-oss-1-13 kernel: Lustre: 31958:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 04:02:15 lfs-oss-1-13 kernel: Lustre: 31958:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 323 previous similar messages Apr 22 04:02:20 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81097ecda000 Apr 22 04:02:45 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e6d34000 Apr 22 04:03:05 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810188604000 Apr 22 04:03:05 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b97e8000 Apr 22 04:03:13 lfs-oss-1-13 kernel: Lustre: 32226:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008b: ignoring bulk IO comm error with f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID id 12345-10.174.0.202@o2ib - client will retry Apr 22 04:03:13 lfs-oss-1-13 kernel: Lustre: 32226:0:(ost_handler.c:887:ost_brw_read()) Skipped 49 previous similar messages Apr 22 04:03:18 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81039a493000 Apr 22 04:03:18 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810839372000 Apr 22 04:03:18 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810839372000 Apr 22 04:03:48 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107f59ec000 Apr 22 04:03:56 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108039d0000 Apr 22 04:03:56 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106e8be4000 Apr 22 04:03:56 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104811ce000 Apr 22 04:04:00 lfs-oss-1-13 kernel: Lustre: Service thread pid 32183 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 04:04:00 lfs-oss-1-13 kernel: Pid: 32183, comm: ll_ost_io_190 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Call Trace: Apr 22 04:04:00 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 04:04:00 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] class_handle2object+0xe0/0x170 [obdclass] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Pid: 32156, comm: ll_ost_io_163 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Call Trace: Apr 22 04:04:00 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 04:04:00 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Pid: 32153, comm: ll_ost_io_160 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Call Trace: Apr 22 04:04:00 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 04:04:00 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] class_handle2object+0xe0/0x170 [obdclass] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Pid: 32001, comm: ll_ost_io_11 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Call Trace: Apr 22 04:04:00 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 04:04:00 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Lustre: Service thread pid 32143 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 04:04:00 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 04:04:00 lfs-oss-1-13 kernel: Pid: 32143, comm: ll_ost_io_150 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:00 lfs-oss-1-13 kernel: Call Trace: Apr 22 04:04:00 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 04:04:00 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 04:04:00 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 04:04:00 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 04:04:00 lfs-oss-1-13 kernel: Apr 22 04:04:04 lfs-oss-1-13 kernel: LustreError: 32003:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c2b18ac00 x1398900872646896/t0 o4->8e1eaa91-77c8-5dde-16c0-c8940fd656f0@NET_0x500000aae0623_UUID:0/0 lens 448/416 e 0 to 0 dl 1335067582 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 04:04:27 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810524a6b000 Apr 22 04:04:34 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810387d9a000 Apr 22 04:04:34 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029e5fe000 Apr 22 04:04:34 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81029e5fe000 Apr 22 04:04:49 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.6.35@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff8105017e2800/0xcca1a6f6c2b44175 lrc: 3/0,0 mode: PR/PR res: 32849865/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xa4beaf95e9af193f expref: 8 pid: 31781 timeout 5273974261 Apr 22 04:04:49 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c18bc2000 x1398897314172396/t0 o105->@NET_0x500000aae0623_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 04:04:49 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.6.35@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0086_UUID lock: ffff81080141cc00/0xcca1a6f6c2b63a2b lrc: 3/0,0 mode: PW/PW res: 32852269/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->315391) flags: 0x0 remote: 0xa4beaf95e9af23b1 expref: 12 pid: 31907 timeout 0 Apr 22 04:05:05 lfs-oss-1-13 kernel: Lustre: scratch1-OST008b: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 04:05:05 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 04:05:16 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9222c000 Apr 22 04:05:16 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 04:05:16 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 31 previous similar messages Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa51ac000 Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b62993000 Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b62993000 Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 32063:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810348882000 x1399132033966225/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335067748 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 32063:0:(ost_handler.c:1073:ost_brw_write()) Skipped 6 previous similar messages Apr 22 04:05:37 lfs-oss-1-13 kernel: Lustre: 32063:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 04:05:37 lfs-oss-1-13 kernel: Lustre: 32063:0:(ost_handler.c:1224:ost_brw_write()) Skipped 9 previous similar messages Apr 22 04:05:37 lfs-oss-1-13 kernel: Lustre: 31888:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c0f8a6c00; still busy with 1 active RPCs Apr 22 04:05:37 lfs-oss-1-13 kernel: Lustre: 31888:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 47 previous similar messages Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 31888:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c12634800 x1399132033967174/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335067637 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 04:05:37 lfs-oss-1-13 kernel: LustreError: 31888:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 50 previous similar messages Apr 22 04:05:49 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810126cc6000 Apr 22 04:05:49 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81021a4fe000 Apr 22 04:05:49 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bddc12000 Apr 22 04:06:40 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108bde91000 Apr 22 04:06:40 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810509286000 Apr 22 04:06:40 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810509286000 Apr 22 04:06:57 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d9a40000 Apr 22 04:06:57 lfs-oss-1-13 kernel: LustreError: 32049:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105e3d9cc00 x1398901148546647/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335067822 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:06:57 lfs-oss-1-13 kernel: LustreError: 32049:0:(ost_handler.c:829:ost_brw_read()) Skipped 50 previous similar messages Apr 22 04:07:11 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a2631000 Apr 22 04:07:30 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106f3184000 Apr 22 04:07:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 04:07:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 04:07:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.43@o2ib (17) Apr 22 04:07:35 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 04:07:35 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109fb3be000 Apr 22 04:08:01 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810188604000 Apr 22 04:08:21 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0b40e000 Apr 22 04:08:21 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a01b4000 Apr 22 04:08:21 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102a01b4000 Apr 22 04:08:57 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81066b897000 Apr 22 04:09:09 lfs-oss-1-13 kernel: LustreError: 32034:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c2655c400 x1399132033969184/t0 o3->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/400 e 0 to 0 dl 1335067749 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:09:09 lfs-oss-1-13 kernel: LustreError: 32034:0:(ost_handler.c:822:ost_brw_read()) Skipped 6 previous similar messages Apr 22 04:09:16 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b82a0000 Apr 22 04:09:41 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810551650000 Apr 22 04:09:41 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81007ae2c000 Apr 22 04:10:04 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029e5fe000 Apr 22 04:10:04 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e7c73000 Apr 22 04:10:04 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8101e7c73000 Apr 22 04:10:45 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a47720000 Apr 22 04:11:10 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81036f55e000 Apr 22 04:11:44 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109cb058000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d67f8000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff81068201e000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ba1ab0000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104f85f2000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b244fc000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b9005c000 Apr 22 04:11:49 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006cbf2000 Apr 22 04:12:09 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100bd527000 Apr 22 04:12:09 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8b49c000 Apr 22 04:12:09 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a8b49c000 Apr 22 04:12:14 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104143b7000 Apr 22 04:12:34 lfs-oss-1-13 kernel: Lustre: 31783:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 04:12:34 lfs-oss-1-13 kernel: Lustre: 31783:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 301 previous similar messages Apr 22 04:12:38 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b0febe000 Apr 22 04:13:10 lfs-oss-1-13 kernel: Lustre: 32065:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 04:13:10 lfs-oss-1-13 kernel: req@ffff810c34e90050 x1398900875200878/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335067995 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 04:13:10 lfs-oss-1-13 kernel: Lustre: 32065:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 22 04:13:12 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810312508000 Apr 22 04:13:15 lfs-oss-1-13 kernel: Lustre: 32156:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 04:13:15 lfs-oss-1-13 kernel: Lustre: 32156:0:(ost_handler.c:887:ost_brw_read()) Skipped 43 previous similar messages Apr 22 04:13:15 lfs-oss-1-13 kernel: Lustre: Service thread pid 32156 completed after 755.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 04:13:15 lfs-oss-1-13 kernel: Lustre: Service thread pid 32143 completed after 755.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 04:13:29 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8c352000 Apr 22 04:14:02 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108592ce000 Apr 22 04:14:02 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a69be000 Apr 22 04:14:02 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e9c9a000 Apr 22 04:14:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aef0f0000 Apr 22 04:14:41 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a477d5000 Apr 22 04:14:41 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810adb650000 Apr 22 04:14:41 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810adb650000 Apr 22 04:14:44 lfs-oss-1-13 kernel: Lustre: Service thread pid 32183 completed after 844.03s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 04:14:44 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 04:14:53 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be9348000 Apr 22 04:14:58 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81001d2a4000 Apr 22 04:15:09 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089d936000 Apr 22 04:16:05 lfs-oss-1-13 kernel: Lustre: 31972:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff810c22411400; still busy with 2 active RPCs Apr 22 04:16:05 lfs-oss-1-13 kernel: Lustre: 31972:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 34 previous similar messages Apr 22 04:16:05 lfs-oss-1-13 kernel: LustreError: 31972:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102eeab6400 x1398900876032389/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335068265 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 04:16:05 lfs-oss-1-13 kernel: LustreError: 31972:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 35 previous similar messages Apr 22 04:16:09 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ac5f25000 Apr 22 04:16:09 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81052cb7e000 Apr 22 04:16:09 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81052cb7e000 Apr 22 04:16:09 lfs-oss-1-13 kernel: LustreError: 32245:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c358cac50 x1399132033976426/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335068377 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:16:09 lfs-oss-1-13 kernel: LustreError: 32245:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 04:16:09 lfs-oss-1-13 kernel: Lustre: 32245:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 04:16:09 lfs-oss-1-13 kernel: Lustre: 32245:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 04:16:09 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.68@o2ib Apr 22 04:16:09 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 20 previous similar messages Apr 22 04:16:26 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81035081d000 Apr 22 04:16:33 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107f59ec000 Apr 22 04:16:33 lfs-oss-1-13 kernel: Lustre: 32001:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1398900875214489 took longer than estimated (100+2s); client may timeout. req@ffff8105e9cab800 x1398900875214489/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335068191 ref 1 fl Complete:/2/0 rc 0/0 Apr 22 04:16:33 lfs-oss-1-13 kernel: Lustre: 32001:0:(service.c:1434:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Apr 22 04:16:33 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810551650000 Apr 22 04:16:33 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3d822000 Apr 22 04:16:33 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810223aba000 Apr 22 04:16:33 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104b9278000 Apr 22 04:16:59 lfs-oss-1-13 kernel: Lustre: scratch1-OST0085: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 04:16:59 lfs-oss-1-13 kernel: Lustre: Skipped 8 previous similar messages Apr 22 04:17:10 lfs-oss-1-13 kernel: LustreError: 32223:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff81073d733400 x1399132013301561/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335068235 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:17:10 lfs-oss-1-13 kernel: LustreError: 32223:0:(ost_handler.c:829:ost_brw_read()) Skipped 41 previous similar messages Apr 22 04:17:54 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027ec08000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 5 seconds Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.49@o2ib (23) Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810870922000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be9348000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff8101243f4000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b40a66000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023532c000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e4b28000 Apr 22 04:17:55 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107f59ec000 Apr 22 04:18:02 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810884d05000 Apr 22 04:18:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006b2ee000 Apr 22 04:18:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81006b2ee000 Apr 22 04:18:28 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a8117000 Apr 22 04:18:38 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810315580000 Apr 22 04:19:44 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c05770000 Apr 22 04:20:21 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109bdc95000 Apr 22 04:20:21 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810163bd1000 Apr 22 04:20:21 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810163bd1000 Apr 22 04:20:38 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c53a2000 Apr 22 04:21:17 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e99ae000 Apr 22 04:21:22 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81069d1f0000 Apr 22 04:21:50 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb9f55000 Apr 22 04:22:02 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106661b3000 Apr 22 04:22:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106f9e7c000 Apr 22 04:22:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8106f9e7c000 Apr 22 04:22:07 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c53a2000 Apr 22 04:22:36 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 04:22:36 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 296 previous similar messages Apr 22 04:22:58 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108dae02000 Apr 22 04:23:22 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102eef98000 Apr 22 04:23:33 lfs-oss-1-13 kernel: Lustre: 32098:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008e: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 04:23:33 lfs-oss-1-13 kernel: Lustre: 32098:0:(ost_handler.c:887:ost_brw_read()) Skipped 39 previous similar messages Apr 22 04:23:47 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81072f016000 Apr 22 04:23:48 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81099177b000 Apr 22 04:24:21 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81084e519000 Apr 22 04:24:21 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dd650000 Apr 22 04:24:21 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104dd650000 Apr 22 04:24:51 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102da459000 Apr 22 04:25:03 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101eff26000 Apr 22 04:25:54 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104af753000 Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810270fec000 Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2ef6c000 Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a10494000 Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102bfa86000 Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093322a000 Apr 22 04:26:14 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.202@o2ib Apr 22 04:26:14 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 25 previous similar messages Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c53a2000 Apr 22 04:26:14 lfs-oss-1-13 kernel: Lustre: 31879:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from f173065c-735f-3a51-420b-55cb767a3a60@10.174.0.202@o2ib to 0xffff810c28fda800; still busy with 1 active RPCs Apr 22 04:26:14 lfs-oss-1-13 kernel: Lustre: 31879:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 40 previous similar messages Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 31879:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105b7320000 x1398900886099327/t0 o8->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 368/264 e 0 to 0 dl 1335068874 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 04:26:14 lfs-oss-1-13 kernel: LustreError: 31879:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 40 previous similar messages Apr 22 04:26:31 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2be6a000 Apr 22 04:26:52 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810658b20000 Apr 22 04:26:52 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050200a000 Apr 22 04:26:52 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81050200a000 Apr 22 04:26:52 lfs-oss-1-13 kernel: LustreError: 32041:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c1a453800 x1399132033987255/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335069014 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:26:52 lfs-oss-1-13 kernel: LustreError: 32041:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 04:26:52 lfs-oss-1-13 kernel: Lustre: 32041:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 04:26:52 lfs-oss-1-13 kernel: Lustre: 32041:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 04:27:09 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810969567000 Apr 22 04:27:23 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81099a1cd000 Apr 22 04:27:23 lfs-oss-1-13 kernel: LustreError: 32096:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810803385000 x1399132013312031/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335069469 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:27:23 lfs-oss-1-13 kernel: LustreError: 32096:0:(ost_handler.c:829:ost_brw_read()) Skipped 33 previous similar messages Apr 22 04:27:34 lfs-oss-1-13 kernel: Lustre: scratch1-OST008e: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 04:27:34 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 04:27:34 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a10494000 Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.202@o2ib (24) Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b4a96000 Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81019665c000 Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81035e054000 Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033a760000 Apr 22 04:28:26 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81014bf7c000 Apr 22 04:28:31 lfs-oss-1-13 kernel: Lustre: 32117:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-91), not sending early reply Apr 22 04:28:31 lfs-oss-1-13 kernel: req@ffff810c1ec14c00 x1398900875216580/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335068916 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 04:28:31 lfs-oss-1-13 kernel: Lustre: 32117:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 22 04:28:36 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 696+0s req@ffff810b28a10000 x1398900875216579/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335068916 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:28:36 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) Skipped 12 previous similar messages Apr 22 04:28:37 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101ccb48000 Apr 22 04:28:38 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c2c12000 Apr 22 04:29:16 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a34d3e000 Apr 22 04:29:33 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810742190000 Apr 22 04:29:33 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2767e000 Apr 22 04:29:33 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a2767e000 Apr 22 04:29:54 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81048a771000 Apr 22 04:30:06 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105f8924000 Apr 22 04:30:19 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be9348000 Apr 22 04:30:19 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065bed6000 Apr 22 04:30:19 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105150f8000 Apr 22 04:30:19 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d08c6000 Apr 22 04:30:19 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2be6a000 Apr 22 04:30:19 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b2e0b0000 Apr 22 04:31:09 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be7438000 Apr 22 04:31:10 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108f3888000 Apr 22 04:31:10 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aaea82000 Apr 22 04:31:10 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106bab2e000 Apr 22 04:31:22 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810690d7a000 Apr 22 04:31:28 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108f5d12000 Apr 22 04:31:48 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2be6a000 Apr 22 04:31:48 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a545e2000 Apr 22 04:31:48 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104bea8e000 Apr 22 04:31:48 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a1030000 Apr 22 04:32:21 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109fff65000 Apr 22 04:32:21 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810932d45000 Apr 22 04:32:21 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810932d45000 Apr 22 04:32:25 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104336f4000 Apr 22 04:32:38 lfs-oss-1-13 kernel: Lustre: 31721:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 04:32:38 lfs-oss-1-13 kernel: Lustre: 31721:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 317 previous similar messages Apr 22 04:32:38 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037f898000 Apr 22 04:32:44 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103dc724000 Apr 22 04:33:11 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6f90c000 Apr 22 04:33:11 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105525e8000 Apr 22 04:33:11 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065bed6000 Apr 22 04:33:11 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810527124000 Apr 22 04:33:11 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4942e000 Apr 22 04:33:11 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108faf18000 Apr 22 04:33:23 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089efdc000 Apr 22 04:33:41 lfs-oss-1-13 kernel: Lustre: 32084:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 04:33:41 lfs-oss-1-13 kernel: Lustre: 32084:0:(ost_handler.c:887:ost_brw_read()) Skipped 64 previous similar messages Apr 22 04:33:50 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109b5dda000 Apr 22 04:34:06 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101af480000 Apr 22 04:34:38 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b81b69000 Apr 22 04:34:56 lfs-oss-1-13 kernel: LustreError: 137-5: UUID 'scratch1-OST008f_UUID' is not available for connect (no target) Apr 22 04:34:56 lfs-oss-1-13 kernel: LustreError: Skipped 43 previous similar messages Apr 22 04:35:09 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810697b9e000 Apr 22 04:35:10 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81079aeea000 Apr 22 04:35:10 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a6fcfc000 Apr 22 04:35:10 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027d858000 Apr 22 04:35:10 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81007d8a0000 Apr 22 04:35:10 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108e156e000 Apr 22 04:35:30 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102c1173000 Apr 22 04:35:30 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810433e65000 Apr 22 04:35:30 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810433e65000 Apr 22 04:36:33 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015fc14000 Apr 22 04:36:33 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105525e8000 Apr 22 04:36:33 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104bea8e000 Apr 22 04:36:33 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810527124000 Apr 22 04:36:33 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.49@o2ib Apr 22 04:36:33 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 25 previous similar messages Apr 22 04:36:37 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810188604000 Apr 22 04:36:37 lfs-oss-1-13 kernel: Lustre: 31799:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff810b369ea400; still busy with 1 active RPCs Apr 22 04:36:37 lfs-oss-1-13 kernel: Lustre: 31799:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 34 previous similar messages Apr 22 04:36:37 lfs-oss-1-13 kernel: LustreError: 31799:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105f15fdc00 x1398901148576328/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335069497 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 04:36:37 lfs-oss-1-13 kernel: LustreError: 31761:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a0faa400 x1398901148576332/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335069497 ref 2 fl Interpret:/0/0 rc -16/0 Apr 22 04:36:37 lfs-oss-1-13 kernel: LustreError: 31761:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 78 previous similar messages Apr 22 04:36:37 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81094356a000 Apr 22 04:36:56 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810982a00000 Apr 22 04:37:24 lfs-oss-1-13 kernel: LustreError: 32147:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff81085ecedc00 x1399132013321811/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070039 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:37:24 lfs-oss-1-13 kernel: LustreError: 32147:0:(ost_handler.c:829:ost_brw_read()) Skipped 59 previous similar messages Apr 22 04:37:40 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100ab93c000 Apr 22 04:37:42 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101217b2000 Apr 22 04:38:27 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020d07a000 Apr 22 04:38:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 04:38:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 11 previous similar messages Apr 22 04:38:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (46) Apr 22 04:38:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 11 previous similar messages Apr 22 04:38:44 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ba75cd000 Apr 22 04:38:56 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2be6a000 Apr 22 04:39:17 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810791ae5000 Apr 22 04:39:17 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb4b8e000 Apr 22 04:39:17 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810bb4b8e000 Apr 22 04:39:17 lfs-oss-1-13 kernel: LustreError: 32011:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810bbc6df000 x1399132034000537/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335069760 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:39:17 lfs-oss-1-13 kernel: LustreError: 32011:0:(ost_handler.c:1073:ost_brw_write()) Skipped 3 previous similar messages Apr 22 04:39:17 lfs-oss-1-13 kernel: Lustre: 32011:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 04:39:17 lfs-oss-1-13 kernel: Lustre: 32011:0:(ost_handler.c:1224:ost_brw_write()) Skipped 3 previous similar messages Apr 22 04:40:25 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102cb867000 Apr 22 04:40:50 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810591692000 Apr 22 04:41:47 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81047965d000 Apr 22 04:41:55 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810530636000 Apr 22 04:41:55 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027d858000 Apr 22 04:42:37 lfs-oss-1-13 kernel: Lustre: 11781:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314861398 sent from scratch1-OST0086 to NID 10.174.14.43@o2ib 10s ago has timed out (10s prior to deadline). Apr 22 04:42:37 lfs-oss-1-13 kernel: req@ffff810c177ec000 x1398897314861398/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 1335069757 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 04:42:37 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0086: A client on nid 10.174.14.43@o2ib was evicted due to a lock completion callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 04:42:43 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2ef6c000 Apr 22 04:42:43 lfs-oss-1-13 kernel: Lustre: 31733:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0088: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 04:42:43 lfs-oss-1-13 kernel: Lustre: 31733:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 330 previous similar messages Apr 22 04:43:04 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314871927 sent from scratch1-OST0087 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:43:04 lfs-oss-1-13 kernel: req@ffff8105bcdc1c00 x1398897314871927/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335069784 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 04:43:11 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314871927 sent from scratch1-OST0087 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:43:11 lfs-oss-1-13 kernel: req@ffff8105bcdc1c00 x1398897314871927/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335069791 ref 3 fl Rpc:/2/0 rc 0/0 Apr 22 04:43:11 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 04:43:17 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a5ae8b000 Apr 22 04:43:17 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102e2b7c000 Apr 22 04:43:17 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102e2b7c000 Apr 22 04:43:18 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314871927 sent from scratch1-OST0087 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:43:18 lfs-oss-1-13 kernel: req@ffff8105bcdc1c00 x1398897314871927/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335069798 ref 4 fl Rpc:/2/0 rc 0/0 Apr 22 04:43:18 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 04:43:25 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314871927 sent from scratch1-OST0087 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:43:25 lfs-oss-1-13 kernel: req@ffff8105bcdc1c00 x1398897314871927/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335069805 ref 5 fl Rpc:/2/0 rc 0/0 Apr 22 04:43:25 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 04:43:32 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314871927 sent from scratch1-OST0087 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:43:32 lfs-oss-1-13 kernel: req@ffff8105bcdc1c00 x1398897314871927/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335069812 ref 6 fl Rpc:/2/0 rc 0/0 Apr 22 04:43:32 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 04:43:46 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314871927 sent from scratch1-OST0087 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:43:46 lfs-oss-1-13 kernel: req@ffff8105bcdc1c00 x1398897314871927/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335069826 ref 8 fl Rpc:/2/0 rc 0/0 Apr 22 04:43:46 lfs-oss-1-13 kernel: Lustre: 31902:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 04:43:46 lfs-oss-1-13 kernel: Lustre: 32131:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 04:43:46 lfs-oss-1-13 kernel: Lustre: 32131:0:(ost_handler.c:887:ost_brw_read()) Skipped 34 previous similar messages Apr 22 04:43:47 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ae1c9000 Apr 22 04:43:56 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092c44c000 Apr 22 04:43:56 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d4cbe000 Apr 22 04:43:56 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810188604000 Apr 22 04:43:56 lfs-oss-1-13 kernel: LustreError: 32065:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105b8323c00 x1398900876644942/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335070142 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 04:44:38 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008e_UUID lock: ffff81089f3e6c00/0xcca1a6f6c3407125 lrc: 3/0,0 mode: PW/PW res: 32892619/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b7758bb2 expref: 42 pid: 31877 timeout 5276363210 Apr 22 04:44:38 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 1 previous similar message Apr 22 04:44:38 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105d2ccac00 x1398900876645757/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070600 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:44:38 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 04:44:38 lfs-oss-1-13 kernel: LustreError: 32235:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff810c1b937000 x1398900876645758/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335070003 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:44:38 lfs-oss-1-13 kernel: LustreError: 32235:0:(ost_handler.c:1060:ost_brw_write()) Skipped 1 previous similar message Apr 22 04:44:59 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105cb7fb080 Apr 22 04:44:59 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104fe980000 Apr 22 04:44:59 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105cb7fb080 Apr 22 04:45:27 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad9fcc000 Apr 22 04:45:41 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b493d1000 Apr 22 04:46:27 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a5171000 Apr 22 04:46:27 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810324ef7000 Apr 22 04:46:27 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810324ef7000 Apr 22 04:46:52 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810256f4e000 Apr 22 04:46:52 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b1ffd2000 Apr 22 04:46:55 lfs-oss-1-13 kernel: Lustre: 31993:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-91), not sending early reply Apr 22 04:46:55 lfs-oss-1-13 kernel: req@ffff810b1b811800 x1398900875234395/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070020 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 04:46:55 lfs-oss-1-13 kernel: Lustre: 31993:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 22 04:47:00 lfs-oss-1-13 kernel: LustreError: 32117:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 696+0s req@ffff810b1b811800 x1398900875234395/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070020 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:47:00 lfs-oss-1-13 kernel: LustreError: 32117:0:(ost_handler.c:822:ost_brw_read()) Skipped 5 previous similar messages Apr 22 04:47:01 lfs-oss-1-13 kernel: Lustre: 31779:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff810c0f8a6c00; still busy with 1 active RPCs Apr 22 04:47:01 lfs-oss-1-13 kernel: Lustre: 31779:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 36 previous similar messages Apr 22 04:47:01 lfs-oss-1-13 kernel: LustreError: 31779:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81094ba04000 x1399132034008556/t0 o8->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 368/264 e 0 to 0 dl 1335070121 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 04:47:01 lfs-oss-1-13 kernel: LustreError: 31779:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 36 previous similar messages Apr 22 04:47:02 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108e40f7000 Apr 22 04:47:02 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 04:47:02 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 23 previous similar messages Apr 22 04:47:21 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810399266000 Apr 22 04:47:29 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810aca263c00 x1398900875234394/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070079 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:47:29 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:829:ost_brw_read()) Skipped 21 previous similar messages Apr 22 04:47:30 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015f48b000 Apr 22 04:48:16 lfs-oss-1-13 kernel: Lustre: 31786:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897314950625 sent from scratch1-OST0084 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:48:16 lfs-oss-1-13 kernel: req@ffff8105de270c00 x1398897314950625/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335070096 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 04:48:16 lfs-oss-1-13 kernel: Lustre: 31786:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Apr 22 04:48:34 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810122256000 Apr 22 04:48:34 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103efa02000 Apr 22 04:48:34 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103efa02000 Apr 22 04:48:38 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810145334000 Apr 22 04:49:16 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 04:49:16 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 5 previous similar messages Apr 22 04:49:16 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.49@o2ib (30) Apr 22 04:49:16 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 5 previous similar messages Apr 22 04:49:16 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810443772000 Apr 22 04:49:16 lfs-oss-1-13 kernel: Lustre: 32115:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1398900875246002 took longer than estimated (100+1s); client may timeout. req@ffff810a14c5e400 x1398900875246002/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070155 ref 1 fl Complete:/2/0 rc 0/0 Apr 22 04:49:16 lfs-oss-1-13 kernel: Lustre: 32115:0:(service.c:1434:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 22 04:49:16 lfs-oss-1-13 kernel: Lustre: 31713:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST008c: exp ffff8105f4203400 already connecting Apr 22 04:49:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810689962000 Apr 22 04:49:28 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81081cc68000 Apr 22 04:49:41 lfs-oss-1-13 kernel: Lustre: scratch1-OST008d: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 194 seconds. I think it's dead, and I am evicting it. Apr 22 04:49:41 lfs-oss-1-13 kernel: Lustre: Skipped 8 previous similar messages Apr 22 04:49:59 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109dacea000 Apr 22 04:50:01 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810423097000 Apr 22 04:50:01 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109b51b4000 Apr 22 04:50:01 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8109b51b4000 Apr 22 04:50:01 lfs-oss-1-13 kernel: LustreError: 32024:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(933120) req@ffff810c2ea5d850 x1399132034010953/t0 o4->e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID:0/0 lens 448/416 e 0 to 0 dl 1335070410 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:50:01 lfs-oss-1-13 kernel: LustreError: 32024:0:(ost_handler.c:1073:ost_brw_write()) Skipped 3 previous similar messages Apr 22 04:50:01 lfs-oss-1-13 kernel: Lustre: 32024:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 04:50:01 lfs-oss-1-13 kernel: Lustre: 32024:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 04:50:19 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b558d4000 Apr 22 04:50:19 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810097278000 Apr 22 04:50:19 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810069b86000 Apr 22 04:50:19 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075c6ee000 Apr 22 04:50:19 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105525e8000 Apr 22 04:51:08 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810402c42000 Apr 22 04:51:22 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d6b03000 Apr 22 04:51:29 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d38b2000 Apr 22 04:51:36 lfs-oss-1-13 kernel: Lustre: 31895:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897315030080 sent from scratch1-OST0088 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:51:36 lfs-oss-1-13 kernel: req@ffff810c158d6800 x1398897315030080/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335070296 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 04:51:36 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 04:51:36 lfs-oss-1-13 kernel: LustreError: 32198:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c22478800 x1398900876696713/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070347 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:51:42 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075c6ee000 Apr 22 04:51:42 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810069c64000 Apr 22 04:51:49 lfs-oss-1-13 kernel: LustreError: 11792:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335070309 with bad export cookie 14745250233752675445 Apr 22 04:52:33 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105e86d5000 Apr 22 04:52:33 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810323e49000 Apr 22 04:52:33 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810323e49000 Apr 22 04:52:43 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810123d1d000 Apr 22 04:52:43 lfs-oss-1-13 kernel: Lustre: 31833:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: bbf47932-5717-d616-b310-f6e93e74d9a1 reconnecting Apr 22 04:52:43 lfs-oss-1-13 kernel: Lustre: 31833:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 360 previous similar messages Apr 22 04:52:53 lfs-oss-1-13 kernel: Lustre: 31760:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897315046135 sent from scratch1-OST008b to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 04:52:53 lfs-oss-1-13 kernel: req@ffff810812d84000 x1398897315046135/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335070373 ref 3 fl Rpc:/2/0 rc 0/0 Apr 22 04:52:53 lfs-oss-1-13 kernel: Lustre: 31760:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 04:52:57 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075c6ee000 Apr 22 04:52:57 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102f327c000 Apr 22 04:53:12 lfs-oss-1-13 kernel: LustreError: 32153:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105bd314400 x1398900874077561/t0 o4->bba5ab1a-e6a7-99a6-3ef6-f62d345ecdcf@NET_0x500000aae048d_UUID:0/0 lens 448/416 e 0 to 0 dl 1335071147 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 04:54:06 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b348c000 Apr 22 04:54:08 lfs-oss-1-13 kernel: Lustre: 32079:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0085: ignoring bulk IO comm error with e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@NET_0x500000aae0044_UUID id 12345-10.174.0.68@o2ib - client will retry Apr 22 04:54:08 lfs-oss-1-13 kernel: Lustre: 32079:0:(ost_handler.c:887:ost_brw_read()) Skipped 51 previous similar messages Apr 22 04:54:38 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107bc56c000 Apr 22 04:54:38 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81007536e000 Apr 22 04:54:38 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810207008000 Apr 22 04:54:38 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81001b494000 Apr 22 04:54:38 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108aeb16000 Apr 22 04:54:54 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015b158000 Apr 22 04:55:05 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105525e8000 Apr 22 04:55:05 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c1e3cd480 Apr 22 04:55:05 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a10d8e000 Apr 22 04:55:05 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81002c642000 Apr 22 04:56:07 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106ebd7c000 Apr 22 04:56:07 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810875490000 Apr 22 04:56:07 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810875490000 Apr 22 04:57:33 lfs-oss-1-13 kernel: Lustre: 31724:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff8105d744a600; still busy with 6 active RPCs Apr 22 04:57:33 lfs-oss-1-13 kernel: LustreError: 31727:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105ee4c8c00 x1398900876735134/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335070753 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 04:57:33 lfs-oss-1-13 kernel: Lustre: 31724:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 42 previous similar messages Apr 22 04:57:33 lfs-oss-1-13 kernel: LustreError: 31727:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 43 previous similar messages Apr 22 04:57:34 lfs-oss-1-13 kernel: LustreError: 31997:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105a43f0400 x1398900876715495/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071166 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 04:57:34 lfs-oss-1-13 kernel: LustreError: 31997:0:(ost_handler.c:829:ost_brw_read()) Skipped 41 previous similar messages Apr 22 04:57:40 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81052a350000 Apr 22 04:57:40 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 04:57:40 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 21 previous similar messages Apr 22 04:57:56 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 (at 10.174.0.68@o2ib) in 153 seconds. I think it's dead, and I am evicting it. Apr 22 04:57:56 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81059f6f9800 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104692b4000 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006a0b4000 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090ac68000 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a7dc08000 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107bc56c000 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e8692000 Apr 22 04:58:27 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093848c000 Apr 22 04:58:43 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107b665c000 Apr 22 04:59:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 4 seconds Apr 22 04:59:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 4 previous similar messages Apr 22 04:59:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.43@o2ib (32) Apr 22 04:59:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 4 previous similar messages Apr 22 04:59:21 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810080290000 Apr 22 04:59:41 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81002c642000 Apr 22 04:59:41 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109d784a000 Apr 22 04:59:41 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81028d0ea000 Apr 22 04:59:41 lfs-oss-1-13 kernel: LustreError: 32087:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105ab5ad000 x1398900886130913/t0 o3->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 448/400 e 0 to 0 dl 1335070781 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 04:59:41 lfs-oss-1-13 kernel: LustreError: 32087:0:(ost_handler.c:822:ost_brw_read()) Skipped 4 previous similar messages Apr 22 04:59:46 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae9d1a000 Apr 22 05:00:24 lfs-oss-1-13 kernel: LustreError: 32206:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105abfe8000 x1398900873861582/t0 o4->4db51705-0d59-6cfa-d515-bcf3e4b99e08@NET_0x500000aae02ef_UUID:0/0 lens 448/416 e 0 to 0 dl 1335071012 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:00:24 lfs-oss-1-13 kernel: Lustre: 32206:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008d: ignoring bulk IO comm error with 4db51705-0d59-6cfa-d515-bcf3e4b99e08@NET_0x500000aae02ef_UUID id 12345-10.174.2.239@o2ib - client will retry Apr 22 05:00:24 lfs-oss-1-13 kernel: Lustre: 32206:0:(ost_handler.c:1224:ost_brw_write()) Skipped 3 previous similar messages Apr 22 05:00:49 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a319e000 Apr 22 05:00:53 lfs-oss-1-13 kernel: Lustre: Service thread pid 32102 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 05:00:53 lfs-oss-1-13 kernel: Pid: 32102, comm: ll_ost_io_109 Apr 22 05:00:53 lfs-oss-1-13 kernel: Apr 22 05:00:53 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:00:53 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:00:53 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:00:53 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:00:53 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 05:00:53 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 05:00:53 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 05:00:53 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:00:53 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:00:53 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:00:53 lfs-oss-1-13 kernel: Apr 22 05:01:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32102 completed after 218.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 05:01:11 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 05:01:21 lfs-oss-1-13 kernel: Lustre: Service thread pid 32185 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 05:01:21 lfs-oss-1-13 kernel: Pid: 32185, comm: ll_ost_io_192 Apr 22 05:01:21 lfs-oss-1-13 kernel: Apr 22 05:01:21 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:01:21 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:01:21 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:01:21 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:01:21 lfs-oss-1-13 kernel: Apr 22 05:01:21 lfs-oss-1-13 kernel: Pid: 32166, comm: ll_ost_io_173 Apr 22 05:01:21 lfs-oss-1-13 kernel: Apr 22 05:01:21 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:01:21 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:01:21 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:01:21 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:01:21 lfs-oss-1-13 kernel: Apr 22 05:01:21 lfs-oss-1-13 kernel: Pid: 32192, comm: ll_ost_io_199 Apr 22 05:01:21 lfs-oss-1-13 kernel: Apr 22 05:01:21 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:01:21 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:01:21 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:01:21 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] class_handle2object+0xe0/0x170 [obdclass] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:01:21 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:01:21 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:01:21 lfs-oss-1-13 kernel: Apr 22 05:01:31 lfs-oss-1-13 kernel: Lustre: 31877:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897315254747 sent from scratch1-OST008d to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 05:01:31 lfs-oss-1-13 kernel: req@ffff8105cdcc4c00 x1398897315254747/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335070891 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 05:01:31 lfs-oss-1-13 kernel: Lustre: 31877:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Apr 22 05:01:31 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 05:01:31 lfs-oss-1-13 kernel: LustreError: 32060:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff8105aeb5dc00 x1398900876796353/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335071076 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:01:32 lfs-oss-1-13 kernel: LustreError: 31992:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c14c80c00 x1398900876786779/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071072 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:02:14 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client f173065c-735f-3a51-420b-55cb767a3a60 (at 10.174.0.202@o2ib) in 153 seconds. I think it's dead, and I am evicting it. Apr 22 05:02:14 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 05:02:14 lfs-oss-1-13 kernel: LustreError: 32185:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c352dcc50 x1398900886130915/t0 o3->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071436 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:02:14 lfs-oss-1-13 kernel: LustreError: 32185:0:(ost_handler.c:825:ost_brw_read()) Skipped 9 previous similar messages Apr 22 05:02:14 lfs-oss-1-13 kernel: Lustre: Service thread pid 32185 completed after 253.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 05:02:17 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810032fa6000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020d76e000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810868e7e000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090ac68000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810078c3c000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810407640000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810777df8000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b1660000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b5ec4a000 Apr 22 05:02:18 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff810c1aede780 Apr 22 05:02:49 lfs-oss-1-13 kernel: Lustre: 31966:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 05:02:49 lfs-oss-1-13 kernel: Lustre: 31966:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 202 previous similar messages Apr 22 05:03:17 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81007d272000 Apr 22 05:03:17 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81071184a000 Apr 22 05:03:58 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ffcee000 Apr 22 05:04:17 lfs-oss-1-13 kernel: Lustre: 32210:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 05:04:17 lfs-oss-1-13 kernel: req@ffff8105a96a2800 x1398900875250172/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071062 ref 2 fl Interpret:/0/0 rc 0/0 Apr 22 05:04:22 lfs-oss-1-13 kernel: Lustre: 31999:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 05:04:22 lfs-oss-1-13 kernel: Lustre: 31999:0:(ost_handler.c:887:ost_brw_read()) Skipped 45 previous similar messages Apr 22 05:04:23 lfs-oss-1-13 kernel: Lustre: 32078:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-72), not sending early reply Apr 22 05:04:23 lfs-oss-1-13 kernel: req@ffff8107f612d000 x1399132013338323/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071068 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 05:04:23 lfs-oss-1-13 kernel: Lustre: 32078:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106e7c1c000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105f6b52000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81039e190000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ffcee000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810714b68000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810135264000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810230dc8000 Apr 22 05:05:36 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106c369a000 Apr 22 05:05:40 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ed2ae000 Apr 22 05:06:06 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093c472000 Apr 22 05:06:06 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810635326000 Apr 22 05:06:23 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.14.43@o2ib was evicted due to a lock completion callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 05:06:23 lfs-oss-1-13 kernel: LustreError: 32012:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105a0e70000 x1398900876805229/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071925 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:06:23 lfs-oss-1-13 kernel: LustreError: 32012:0:(ost_handler.c:825:ost_brw_read()) Skipped 2 previous similar messages Apr 22 05:06:24 lfs-oss-1-13 kernel: LustreError: 32157:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff8105dfa79400 x1398900876805599/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335071926 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:06:37 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a352b9000 Apr 22 05:06:55 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101ffbaa000 Apr 22 05:07:04 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810802caa000 Apr 22 05:07:04 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81091620a000 Apr 22 05:07:04 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81091620a000 Apr 22 05:07:40 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810602783000 Apr 22 05:07:40 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.178@o2ib Apr 22 05:07:40 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 12 previous similar messages Apr 22 05:07:45 lfs-oss-1-13 kernel: Lustre: scratch1-OST008b: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 05:07:45 lfs-oss-1-13 kernel: Lustre: Skipped 8 previous similar messages Apr 22 05:07:53 lfs-oss-1-13 kernel: Lustre: 31912:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from bbf47932-5717-d616-b310-f6e93e74d9a1@10.174.6.178@o2ib to 0xffff8105aacd0400; still busy with 1 active RPCs Apr 22 05:07:53 lfs-oss-1-13 kernel: Lustre: 31912:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 30 previous similar messages Apr 22 05:07:53 lfs-oss-1-13 kernel: LustreError: 31912:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105aeb4f000 x1399132013353290/t0 o8->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 368/264 e 0 to 0 dl 1335071373 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 05:07:53 lfs-oss-1-13 kernel: LustreError: 31912:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 33 previous similar messages Apr 22 05:07:53 lfs-oss-1-13 kernel: LustreError: 32017:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105ceebb000 x1399132013352388/t0 o3->bbf47932-5717-d616-b310-f6e93e74d9a1@NET_0x500000aae06b2_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071306 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:07:53 lfs-oss-1-13 kernel: LustreError: 32017:0:(ost_handler.c:829:ost_brw_read()) Skipped 37 previous similar messages Apr 22 05:07:55 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108f8e42000 Apr 22 05:07:55 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b07584000 Apr 22 05:07:55 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810714b68000 Apr 22 05:07:55 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107346f2000 Apr 22 05:07:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109ce57c000 Apr 22 05:08:11 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810adf27c000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81042e928000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810078c3c000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810415c8e000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108db22a000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff810888706000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810802caa000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c12f4000 Apr 22 05:08:12 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aeca58000 Apr 22 05:08:55 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810602783000 Apr 22 05:09:27 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810039370000 Apr 22 05:09:38 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.2.239@o2ib ns: filter-scratch1-OST008a_UUID lock: ffff810842f32000/0xcca1a6f6c3a81a3a lrc: 3/0,0 mode: PR/PR res: 32925292/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xbdd7b40837a7796e expref: 15 pid: 31840 timeout 5277863640 Apr 22 05:09:38 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c16b1b400 x1398897315435086/t0 o105->@NET_0x500000aae02ef_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 05:09:38 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 22 05:09:38 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.2.239@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008a_UUID lock: ffff8103ae487000/0xcca1a6f6c3ae16a0 lrc: 3/0,0 mode: PW/PW res: 32925292/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->12287) flags: 0x0 remote: 0xbdd7b40837a90676 expref: 9 pid: 31819 timeout 0 Apr 22 05:09:38 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 05:09:43 lfs-oss-1-13 kernel: LustreError: 32102:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810589b90400 x1398900876809850/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071383 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:09:43 lfs-oss-1-13 kernel: LustreError: 32102:0:(ost_handler.c:822:ost_brw_read()) Skipped 4 previous similar messages Apr 22 05:09:47 lfs-oss-1-13 kernel: Lustre: 32247:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1398900876809849 took longer than estimated (100+4s); client may timeout. req@ffff8108506e0800 x1398900876809849/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335071383 ref 1 fl Complete:/2/0 rc 0/0 Apr 22 05:09:48 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a88640000 Apr 22 05:09:48 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab7334000 Apr 22 05:10:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 05:10:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 4 previous similar messages Apr 22 05:10:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (44) Apr 22 05:10:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 4 previous similar messages Apr 22 05:10:06 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dfb0c000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81046327a000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c9dd6000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a7bd00000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81087f13a000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810065dec000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050838a000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a30eae000 Apr 22 05:10:13 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810adf27c000 Apr 22 05:10:42 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81004af22000 Apr 22 05:11:09 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81014e471000 Apr 22 05:12:13 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff8107db1f1a00/0xcca1a6f6c3a76115 lrc: 3/0,0 mode: PR/PR res: 32923900/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b7a5b00b expref: 8 pid: 31782 timeout 5278018747 Apr 22 05:12:13 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c147b3400 x1398897315488630/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 05:12:13 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0084_UUID lock: ffff8101c5840400/0xcca1a6f6c3c12fc5 lrc: 3/0,0 mode: PW/PW res: 32923900/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b7a784f6 expref: 6 pid: 31851 timeout 0 Apr 22 05:12:19 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810901976000 Apr 22 05:12:19 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037a1e6000 Apr 22 05:12:19 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c20a4000 Apr 22 05:12:19 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b5f6d4000 Apr 22 05:12:24 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810086b1e000 Apr 22 05:12:24 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81079db74000 Apr 22 05:12:36 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c12f4000 Apr 22 05:13:00 lfs-oss-1-13 kernel: Lustre: 31939:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 9c5b2e9f-7628-0434-e306-fe0f7f302694 reconnecting Apr 22 05:13:00 lfs-oss-1-13 kernel: Lustre: 31939:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 287 previous similar messages Apr 22 05:13:31 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092a3f6000 Apr 22 05:13:40 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c20a4000 Apr 22 05:13:40 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab39a8000 Apr 22 05:13:40 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033cb37000 Apr 22 05:13:40 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016c568000 Apr 22 05:14:22 lfs-oss-1-13 kernel: Lustre: 32155:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 05:14:22 lfs-oss-1-13 kernel: Lustre: 32155:0:(ost_handler.c:887:ost_brw_read()) Skipped 61 previous similar messages Apr 22 05:14:42 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf4386000 Apr 22 05:15:02 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023a998000 Apr 22 05:15:04 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f0182000 Apr 22 05:15:04 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c1b144580 Apr 22 05:15:04 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81096015e000 Apr 22 05:15:04 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81096015e000 Apr 22 05:15:04 lfs-oss-1-13 kernel: LustreError: 32111:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff81061534c400 x1398900876824025/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 1 to 0 dl 1335071776 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:15:04 lfs-oss-1-13 kernel: LustreError: 32111:0:(ost_handler.c:1073:ost_brw_write()) Skipped 2 previous similar messages Apr 22 05:15:04 lfs-oss-1-13 kernel: Lustre: 32111:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008b: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 05:15:04 lfs-oss-1-13 kernel: Lustre: 32111:0:(ost_handler.c:1224:ost_brw_write()) Skipped 6 previous similar messages Apr 22 05:16:23 lfs-oss-1-13 kernel: Lustre: 31993:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897315490712 sent from scratch1-OST0085 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 05:16:23 lfs-oss-1-13 kernel: req@ffff8105ba23b800 x1398897315490712/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335071783 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 05:16:23 lfs-oss-1-13 kernel: Lustre: 31993:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 05:16:23 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0085: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 05:16:31 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c12f4000 Apr 22 05:16:31 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81056563a000 Apr 22 05:16:36 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81087b5bc000 Apr 22 05:16:45 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c20a4000 Apr 22 05:17:02 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810586c5d000 Apr 22 05:17:39 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a616d6000 Apr 22 05:17:39 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106a06b6000 Apr 22 05:18:32 lfs-oss-1-13 kernel: Lustre: 31899:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff8105a163dc00; still busy with 2 active RPCs Apr 22 05:18:32 lfs-oss-1-13 kernel: Lustre: 31899:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 32 previous similar messages Apr 22 05:18:32 lfs-oss-1-13 kernel: LustreError: 31899:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a8b50400 x1398900876829409/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335072012 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 05:18:32 lfs-oss-1-13 kernel: LustreError: 31899:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 33 previous similar messages Apr 22 05:18:33 lfs-oss-1-13 kernel: LustreError: 32146:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c1acbb000 x1398900876827639/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 1 to 0 dl 1335071932 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:18:33 lfs-oss-1-13 kernel: LustreError: 32146:0:(ost_handler.c:829:ost_brw_read()) Skipped 49 previous similar messages Apr 22 05:18:42 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810119842000 Apr 22 05:18:42 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 05:18:42 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 26 previous similar messages Apr 22 05:18:43 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81053182d000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b95ba0000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810534ef6000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a30246000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810635488000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092b45a000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b20320000 Apr 22 05:19:16 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff810ab39a8000 Apr 22 05:19:54 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106790ea000 Apr 22 05:19:54 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015bbb6000 Apr 22 05:20:11 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c12f4000 Apr 22 05:20:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 05:20:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 8 previous similar messages Apr 22 05:20:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.178@o2ib (30) Apr 22 05:20:24 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 8 previous similar messages Apr 22 05:20:24 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81076815d000 Apr 22 05:21:29 lfs-oss-1-13 kernel: Lustre: scratch1-OST008e: haven't heard from client bbf47932-5717-d616-b310-f6e93e74d9a1 (at 10.174.6.178@o2ib) in 211 seconds. I think it's dead, and I am evicting it. Apr 22 05:21:29 lfs-oss-1-13 kernel: Lustre: Skipped 9 previous similar messages Apr 22 05:21:41 lfs-oss-1-13 kernel: LustreError: 32011:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c15a08c00 x1398900875377152/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335072101 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:21:41 lfs-oss-1-13 kernel: LustreError: 32011:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 05:21:47 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034f4be000 Apr 22 05:21:47 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092b45a000 Apr 22 05:21:47 lfs-oss-1-13 kernel: Lustre: 32011:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1398900875377152 took longer than estimated (100+6s); client may timeout. req@ffff810c15a08c00 x1398900875377152/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335072101 ref 1 fl Complete:/2/0 rc 0/0 Apr 22 05:21:47 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81097ca16000 Apr 22 05:21:47 lfs-oss-1-13 kernel: Lustre: 32183:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1398900875377151 took longer than estimated (100+6s); client may timeout. req@ffff8105e7730800 x1398900875377151/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335072101 ref 1 fl Complete:/2/0 rc 0/0 Apr 22 05:21:47 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b20320000 Apr 22 05:21:47 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810876a0e000 Apr 22 05:21:48 lfs-oss-1-13 kernel: Lustre: 32119:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 05:21:48 lfs-oss-1-13 kernel: req@ffff810c35a9bc50 x1398900886160028/t0 o3->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 448/400 e 0 to 0 dl 1335072113 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 05:22:04 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a719c0000 Apr 22 05:22:17 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81058a3f8000 Apr 22 05:22:40 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 05:22:41 lfs-oss-1-13 kernel: LustreError: 32087:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105f199b000 x1398900876833592/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335072342 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:23:02 lfs-oss-1-13 kernel: Lustre: 31736:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 05:23:02 lfs-oss-1-13 kernel: Lustre: 31785:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 05:23:02 lfs-oss-1-13 kernel: Lustre: 31785:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 212 previous similar messages Apr 22 05:23:02 lfs-oss-1-13 kernel: Lustre: 31736:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 212 previous similar messages Apr 22 05:23:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102394d4000 Apr 22 05:23:20 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810937d98000 Apr 22 05:23:20 lfs-oss-1-13 kernel: LustreError: 32229:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from blocking AST ns: filter-scratch1-OST008c_UUID lock: ffff810bd7d7e800/0xcca1a6f6c3c4a537 lrc: 4/0,0 mode: PR/PR res: 32886290/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x10020 remote: 0x382dcc42b7a7cb58 expref: 4 pid: 31963 timeout 5278779352 Apr 22 05:23:20 lfs-oss-1-13 kernel: LustreError: 32229:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 05:23:34 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b53da2000 Apr 22 05:23:34 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015bbb6000 Apr 22 05:23:34 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810479b80000 Apr 22 05:23:34 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810997c60000 Apr 22 05:23:34 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b77fa8000 Apr 22 05:23:34 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a4a8c000 Apr 22 05:23:53 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b20320000 Apr 22 05:23:53 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c9dd6000 Apr 22 05:24:31 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d4012000 Apr 22 05:24:31 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bea9ec000 Apr 22 05:24:31 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b53da2000 Apr 22 05:24:31 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106d4c00000 Apr 22 05:24:31 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109b61f6000 Apr 22 05:24:31 lfs-oss-1-13 kernel: Lustre: 32156:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008e: ignoring bulk IO comm error with f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID id 12345-10.174.0.202@o2ib - client will retry Apr 22 05:24:31 lfs-oss-1-13 kernel: Lustre: 32156:0:(ost_handler.c:887:ost_brw_read()) Skipped 52 previous similar messages Apr 22 05:25:14 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810adf27c000 Apr 22 05:25:26 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810134a8c000 Apr 22 05:25:26 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d4012000 Apr 22 05:25:26 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102394d4000 Apr 22 05:25:26 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81068b2c0000 Apr 22 05:25:26 lfs-oss-1-13 kernel: LustreError: 32045:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c15392400 x1398900876837648/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335073031 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:25:26 lfs-oss-1-13 kernel: LustreError: 32045:0:(ost_handler.c:1064:ost_brw_write()) Skipped 4 previous similar messages Apr 22 05:25:26 lfs-oss-1-13 kernel: Lustre: 32045:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008d: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 05:26:25 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105458ff000 Apr 22 05:26:25 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81083da4c000 Apr 22 05:26:25 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81083da4c000 Apr 22 05:26:25 lfs-oss-1-13 kernel: LustreError: 32062:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(937440) req@ffff8105a870a000 x1398900876840905/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335072408 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:26:30 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810249eb4000 Apr 22 05:26:30 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810534ef6000 Apr 22 05:26:30 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109b98be000 Apr 22 05:27:59 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad2f48000 Apr 22 05:27:59 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102beb98000 Apr 22 05:27:59 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9e5a2000 Apr 22 05:27:59 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102c4d60000 Apr 22 05:28:48 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81028f0ca000 Apr 22 05:28:48 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 05:28:48 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 12 previous similar messages Apr 22 05:29:19 lfs-oss-1-13 kernel: Lustre: 31738:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0087: refuse reconnection from ac6d63f1-a9ff-ca88-174b-46e023f19123@10.174.0.206@o2ib to 0xffff81065fa4dc00; still busy with 1 active RPCs Apr 22 05:29:19 lfs-oss-1-13 kernel: Lustre: 31738:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 26 previous similar messages Apr 22 05:29:19 lfs-oss-1-13 kernel: LustreError: 31738:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810605146c00 x1398900895442062/t0 o8->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 368/264 e 0 to 0 dl 1335072659 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 05:29:19 lfs-oss-1-13 kernel: LustreError: 31738:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 27 previous similar messages Apr 22 05:29:19 lfs-oss-1-13 kernel: LustreError: 32228:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105e8ba0800 x1398900895441141/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335073268 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:29:19 lfs-oss-1-13 kernel: LustreError: 32228:0:(ost_handler.c:829:ost_brw_read()) Skipped 47 previous similar messages Apr 22 05:29:22 lfs-oss-1-13 kernel: Lustre: 31793:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897315677063 sent from scratch1-OST008b to NID 10.174.14.43@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 05:29:22 lfs-oss-1-13 kernel: req@ffff8105aaa3f000 x1398897315677063/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335072562 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 05:29:22 lfs-oss-1-13 kernel: Lustre: 31793:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Apr 22 05:29:22 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008b: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 05:29:22 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 05:29:52 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c20a4000 Apr 22 05:29:52 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090deac000 Apr 22 05:29:52 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810324244000 Apr 22 05:29:52 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a7bd00000 Apr 22 05:29:52 lfs-oss-1-13 kernel: LustreError: 31773:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from blocking AST ns: filter-scratch1-OST0084_UUID lock: ffff810945223000/0xcca1a6f6c3e0f364 lrc: 4/0,0 mode: PW/PW res: 32941730/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->229375) flags: 0x10020 remote: 0x382dcc42b7ad3814 expref: 6 pid: 31965 timeout 5279168061 Apr 22 05:29:52 lfs-oss-1-13 kernel: LustreError: 31773:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 2 previous similar messages Apr 22 05:30:18 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810adf27c000 Apr 22 05:30:23 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034e230000 Apr 22 05:30:23 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f68be000 Apr 22 05:30:23 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a73c12000 Apr 22 05:30:23 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b00e6000 Apr 22 05:30:23 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108541ea000 Apr 22 05:30:23 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015bbb6000 Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: tx_queue, 2 seconds Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 8 previous similar messages Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.202@o2ib (22) Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 8 previous similar messages Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107bf536000 Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c4b7a000 Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050838a000 Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810415c8e000 Apr 22 05:30:56 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102c4d60000 Apr 22 05:32:18 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b68de000 Apr 22 05:32:32 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020bf7a000 Apr 22 05:32:32 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81030ea66000 Apr 22 05:32:32 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf9a38000 Apr 22 05:32:32 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102af5d2000 Apr 22 05:32:49 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81012d8d6000 Apr 22 05:32:49 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810134a8c000 Apr 22 05:32:49 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105ee0c0000 Apr 22 05:32:49 lfs-oss-1-13 kernel: LustreError: 32039:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff810954c1f680 Apr 22 05:32:50 lfs-oss-1-13 kernel: LustreError: 32039:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810b03132000 x1398900886234721/t0 o4->f173065c-735f-3a51-420b-55cb767a3a60@NET_0x500000aae00ca_UUID:0/0 lens 448/416 e 0 to 0 dl 1335072812 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:33:07 lfs-oss-1-13 kernel: Lustre: 31879:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: f173065c-735f-3a51-420b-55cb767a3a60 reconnecting Apr 22 05:33:07 lfs-oss-1-13 kernel: Lustre: 31879:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 263 previous similar messages Apr 22 05:33:18 lfs-oss-1-13 kernel: LustreError: 32053:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c2ea5fc50 x1398900876846815/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335072798 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:33:18 lfs-oss-1-13 kernel: LustreError: 32053:0:(ost_handler.c:822:ost_brw_read()) Skipped 8 previous similar messages Apr 22 05:33:39 lfs-oss-1-13 kernel: Lustre: scratch1-OST0084: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 05:33:39 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 05:34:16 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105ee0c0000 Apr 22 05:34:37 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0089_UUID lock: ffff810915289800/0xcca1a6f6c3c557ef lrc: 3/0,0 mode: PR/PR res: 32929688/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x10020 remote: 0x382dcc42b7abe7c0 expref: 4 pid: 31738 timeout 5279362311 Apr 22 05:34:55 lfs-oss-1-13 kernel: Lustre: 31992:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 05:34:55 lfs-oss-1-13 kernel: Lustre: 31992:0:(ost_handler.c:887:ost_brw_read()) Skipped 38 previous similar messages Apr 22 05:35:16 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb2fea000 Apr 22 05:35:40 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81097de36000 Apr 22 05:35:46 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81032e720000 Apr 22 05:35:46 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a4094000 Apr 22 05:35:46 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109535ae000 Apr 22 05:37:01 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a4094000 Apr 22 05:37:18 lfs-oss-1-13 kernel: Lustre: Service thread pid 32130 was inactive for 436.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 05:37:18 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 05:37:18 lfs-oss-1-13 kernel: Pid: 32130, comm: ll_ost_io_137 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:37:18 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:37:18 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Pid: 32248, comm: ll_ost_io_255 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:37:18 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:37:18 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] class_handle2object+0xe0/0x170 [obdclass] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Pid: 31996, comm: ll_ost_io_06 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:37:18 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:37:18 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Pid: 32058, comm: ll_ost_io_66 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:37:18 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lock_timer_base+0x1b/0x3c Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __mod_timer+0x100/0x10f Apr 22 05:37:18 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:37:18 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] class_handle2object+0xe0/0x170 [obdclass] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Pid: 32061, comm: ll_ost_io_69 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Call Trace: Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lock_timer_base+0x1b/0x3c Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __mod_timer+0x100/0x10f Apr 22 05:37:18 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 05:37:18 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] class_handle2object+0xe0/0x170 [obdclass] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] lock_res_and_lock+0xba/0xd0 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __ldlm_handle2lock+0x2f8/0x360 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 05:37:18 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 05:37:18 lfs-oss-1-13 kernel: Apr 22 05:37:18 lfs-oss-1-13 kernel: Lustre: Service thread pid 32217 was inactive for 436.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 22 05:37:18 lfs-oss-1-13 kernel: Lustre: Skipped 5 previous similar messages Apr 22 05:38:00 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107264be000 Apr 22 05:38:00 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81097de36000 Apr 22 05:38:37 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810412f12000 Apr 22 05:38:37 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81068f8d0000 Apr 22 05:38:54 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810515e96000 Apr 22 05:38:54 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a0cfb6000 Apr 22 05:38:54 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e0cb0000 Apr 22 05:39:20 lfs-oss-1-13 kernel: LustreError: 32182:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c367b9850 x1398900876853407/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335073836 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:39:20 lfs-oss-1-13 kernel: LustreError: 32182:0:(ost_handler.c:829:ost_brw_read()) Skipped 30 previous similar messages Apr 22 05:39:40 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015b29e000 Apr 22 05:39:57 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103f38b6000 Apr 22 05:39:57 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 05:39:57 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 22 05:39:57 lfs-oss-1-13 kernel: Lustre: 31824:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 05:39:57 lfs-oss-1-13 kernel: Lustre: 31824:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 39 previous similar messages Apr 22 05:39:57 lfs-oss-1-13 kernel: LustreError: 31824:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105c2053000 x1398901148639250/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335073297 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 05:39:57 lfs-oss-1-13 kernel: LustreError: 31824:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 43 previous similar messages Apr 22 05:40:32 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054aec6000 Apr 22 05:40:32 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810515e96000 Apr 22 05:40:32 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e9cd4000 Apr 22 05:41:00 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034d7a8000 Apr 22 05:42:00 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f982a000 Apr 22 05:42:00 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b7b9d6000 Apr 22 05:42:16 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102075fa000 Apr 22 05:42:32 lfs-oss-1-13 kernel: Lustre: 32124:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 05:42:32 lfs-oss-1-13 kernel: req@ffff810c3e439800 x1398900875387399/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 0 to 0 dl 1335073357 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 05:42:32 lfs-oss-1-13 kernel: Lustre: 32124:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 22 05:42:37 lfs-oss-1-13 kernel: Lustre: Service thread pid 32058 completed after 755.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 05:42:37 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 05:43:02 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088e666000 Apr 22 05:43:06 lfs-oss-1-13 kernel: Lustre: 31731:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897315975034 sent from scratch1-OST0088 to NID 10.174.0.206@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 05:43:06 lfs-oss-1-13 kernel: req@ffff810c29865c00 x1398897315975034/t0 o106->@NET_0x500000aae00ce_UUID:15/16 lens 296/424 e 0 to 1 dl 1335073386 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 05:43:06 lfs-oss-1-13 kernel: Lustre: 31731:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 05:43:32 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b00e6000 Apr 22 05:43:32 lfs-oss-1-13 kernel: Lustre: 31847:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0086: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 05:43:32 lfs-oss-1-13 kernel: Lustre: 31847:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 170 previous similar messages Apr 22 05:43:49 lfs-oss-1-13 kernel: Lustre: 32231:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-227), not sending early reply Apr 22 05:43:49 lfs-oss-1-13 kernel: req@ffff810be4b84400 x1398900875387397/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 2 to 0 dl 1335073434 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 05:43:49 lfs-oss-1-13 kernel: Lustre: 32231:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 22 05:43:54 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102e09ce000 Apr 22 05:43:54 lfs-oss-1-13 kernel: LustreError: 32217:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 832+0s req@ffff810c352ef450 x1398900875387396/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 2 to 0 dl 1335073434 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:43:54 lfs-oss-1-13 kernel: LustreError: 32217:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 05:43:54 lfs-oss-1-13 kernel: Lustre: Service thread pid 32217 completed after 832.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 05:43:54 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 05:45:00 lfs-oss-1-13 kernel: Lustre: 32044:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0087: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 05:45:00 lfs-oss-1-13 kernel: Lustre: 32044:0:(ost_handler.c:887:ost_brw_read()) Skipped 35 previous similar messages Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.43@o2ib (21) Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6b53a000 Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103291e0000 Apr 22 05:45:01 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c06308000 Apr 22 05:45:13 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107264be000 Apr 22 05:45:46 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa63a0000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085e8f2000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81056563a000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810515e96000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102af5d2000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81077db66000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81077b0be000 Apr 22 05:46:04 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102f6b16000 Apr 22 05:46:50 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810412f12000 Apr 22 05:46:50 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf9a38000 Apr 22 05:46:50 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103a52f0000 Apr 22 05:47:07 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034d7a8000 Apr 22 05:48:10 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aaa5b2000 Apr 22 05:48:44 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810175bf4000 Apr 22 05:49:00 lfs-oss-1-13 kernel: Lustre: scratch1-OST008a: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 05:49:00 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 05:50:03 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088a696000 Apr 22 05:50:03 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 05:50:03 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 10 previous similar messages Apr 22 05:50:03 lfs-oss-1-13 kernel: Lustre: 31767:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 05:50:03 lfs-oss-1-13 kernel: Lustre: 31767:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 17 previous similar messages Apr 22 05:50:03 lfs-oss-1-13 kernel: LustreError: 31767:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810884dcf000 x1398901148648919/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335073903 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 05:50:03 lfs-oss-1-13 kernel: LustreError: 31767:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 18 previous similar messages Apr 22 05:50:03 lfs-oss-1-13 kernel: LustreError: 32170:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810982090400 x1398901148648115/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335074010 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:50:03 lfs-oss-1-13 kernel: LustreError: 32170:0:(ost_handler.c:829:ost_brw_read()) Skipped 32 previous similar messages Apr 22 05:50:50 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81025159e000 Apr 22 05:50:50 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4c632000 Apr 22 05:51:06 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e1f3c000 Apr 22 05:52:44 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be85b0000 Apr 22 05:52:44 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ddcfc000 Apr 22 05:52:44 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bdce14000 Apr 22 05:53:00 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102ccda6000 Apr 22 05:53:40 lfs-oss-1-13 kernel: Lustre: 31763:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 05:53:40 lfs-oss-1-13 kernel: Lustre: 31763:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 173 previous similar messages Apr 22 05:53:51 lfs-oss-1-13 kernel: LustreError: 32225:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105dc843000 x1398900876863894/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335074501 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 05:53:51 lfs-oss-1-13 kernel: LustreError: 32225:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 05:55:06 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810980b44000 Apr 22 05:55:07 lfs-oss-1-13 kernel: Lustre: 32180:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 05:55:07 lfs-oss-1-13 kernel: Lustre: 32180:0:(ost_handler.c:887:ost_brw_read()) Skipped 23 previous similar messages Apr 22 05:55:07 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 05:55:07 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 1 previous similar message Apr 22 05:55:07 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.43@o2ib (27) Apr 22 05:55:07 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 1 previous similar message Apr 22 05:55:07 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107f37fa000 Apr 22 05:56:19 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107aa38c000 Apr 22 05:56:35 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810408e4c000 Apr 22 05:57:18 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff8109fbeb7800 x1398900876874454/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335074993 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:57:18 lfs-oss-1-13 kernel: LustreError: 32162:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c35a9c450 x1398900876874450/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335074993 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:57:18 lfs-oss-1-13 kernel: Lustre: 32162:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0084: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 05:57:18 lfs-oss-1-13 kernel: Lustre: 32162:0:(ost_handler.c:1224:ost_brw_write()) Skipped 3 previous similar messages Apr 22 05:57:23 lfs-oss-1-13 kernel: Lustre: 31990:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 05:57:23 lfs-oss-1-13 kernel: req@ffff810c248b9c00 x1398900877760359/t0 o3->d43d5b1a-2631-5ef2-250b-610f6d3fb2da@NET_0x500000aae0fc5_UUID:0/0 lens 448/400 e 0 to 0 dl 1335074248 ref 2 fl Interpret:/0/0 rc 0/0 Apr 22 05:57:23 lfs-oss-1-13 kernel: Lustre: 31990:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 22 05:57:28 lfs-oss-1-13 kernel: LustreError: 32072:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 755+0s req@ffff8105e4861800 x1398900877760317/t0 o3->d43d5b1a-2631-5ef2-250b-610f6d3fb2da@NET_0x500000aae0fc5_UUID:0/0 lens 448/400 e 0 to 0 dl 1335074248 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 05:57:28 lfs-oss-1-13 kernel: LustreError: 32072:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 05:57:50 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810166cc2000 Apr 22 05:58:46 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.2.101@o2ib ns: filter-scratch1-OST0085_UUID lock: ffff810a66304000/0xcca1a6f6c45f4de6 lrc: 3/0,0 mode: PR/PR res: 32980489/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xb3f1db8d2f785fc3 expref: 12 pid: 31882 timeout 5280811674 Apr 22 05:58:46 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 1 previous similar message Apr 22 05:58:46 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c233ed400 x1398897316364903/t0 o105->@NET_0x500000aae0265_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 05:58:46 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.2.101@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff8109f5d62200/0xcca1a6f6c461b4ff lrc: 3/0,0 mode: PW/PW res: 32977572/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->315391) flags: 0x0 remote: 0xb3f1db8d2f7872a1 expref: 11 pid: 31765 timeout 0 Apr 22 05:58:58 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0087_UUID lock: ffff81088f58a000/0xcca1a6f6c45e4528 lrc: 3/0,0 mode: PW/PW res: 32981231/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b7b55577 expref: 5 pid: 31824 timeout 5280823900 Apr 22 05:58:58 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 2 previous similar messages Apr 22 05:59:15 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ab282000 Apr 22 05:59:22 lfs-oss-1-13 kernel: Lustre: 31879:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897316377969 sent from scratch1-OST0084 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 05:59:22 lfs-oss-1-13 kernel: req@ffff8105b39d7000 x1398897316377969/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335074362 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 05:59:32 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a29fd2000 Apr 22 05:59:32 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ddcfc000 Apr 22 05:59:32 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810980b44000 Apr 22 05:59:32 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109b2676000 Apr 22 05:59:57 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107264be000 Apr 22 06:00:12 lfs-oss-1-13 kernel: Lustre: 31754:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff810c22028200; still busy with 1 active RPCs Apr 22 06:00:12 lfs-oss-1-13 kernel: Lustre: 31754:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 19 previous similar messages Apr 22 06:00:12 lfs-oss-1-13 kernel: LustreError: 31754:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105dd084800 x1398900876879615/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335074512 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 06:00:12 lfs-oss-1-13 kernel: LustreError: 31754:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 24 previous similar messages Apr 22 06:00:45 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008a: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 06:01:01 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065c9c2000 Apr 22 06:01:01 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810922a7c000 Apr 22 06:01:01 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810626c9e000 Apr 22 06:01:01 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810579982000 Apr 22 06:01:11 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 06:01:11 lfs-oss-1-13 kernel: Lustre: Skipped 5 previous similar messages Apr 22 06:01:12 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e1f3c000 Apr 22 06:01:12 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 06:01:12 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 7 previous similar messages Apr 22 06:02:00 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c1a4cd800 x1398901148658969/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335074683 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:02:00 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:829:ost_brw_read()) Skipped 22 previous similar messages Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810886963400 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810515e96000 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81059dbac000 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810685d64000 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810886963400 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81059dbac000 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810685d64000 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 31999:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(49152) req@ffff8105d1ecc800 x1398900876880132/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335075189 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:02:16 lfs-oss-1-13 kernel: Lustre: 31999:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008d: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 06:02:16 lfs-oss-1-13 kernel: Lustre: 31999:0:(ost_handler.c:1224:ost_brw_write()) Skipped 1 previous similar message Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 23519:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335074536 with bad export cookie 14745250233766367165 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 32197:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff810c30b25450 x1398900876881081/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335075291 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:02:16 lfs-oss-1-13 kernel: LustreError: 32197:0:(ost_handler.c:1078:ost_brw_write()) Skipped 1 previous similar message Apr 22 06:03:19 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810515e96000 Apr 22 06:04:07 lfs-oss-1-13 kernel: Lustre: 31773:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 06:04:07 lfs-oss-1-13 kernel: Lustre: 31773:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 191 previous similar messages Apr 22 06:05:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 06:05:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 06:05:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (40) Apr 22 06:05:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 06:05:25 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d137a000 Apr 22 06:05:26 lfs-oss-1-13 kernel: Lustre: 32006:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 06:05:26 lfs-oss-1-13 kernel: Lustre: 32006:0:(ost_handler.c:887:ost_brw_read()) Skipped 32 previous similar messages Apr 22 06:06:50 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107f37fa000 Apr 22 06:08:09 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107f37fa000 Apr 22 06:10:28 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100a162c000 Apr 22 06:11:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810336dcd000 Apr 22 06:11:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c02cc0880 Apr 22 06:11:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c138e73c0 Apr 22 06:11:15 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810503498cc0 Apr 22 06:11:15 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.43@o2ib Apr 22 06:11:15 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 7 previous similar messages Apr 22 06:11:22 lfs-oss-1-13 kernel: Lustre: 31888:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 06:11:22 lfs-oss-1-13 kernel: Lustre: 31888:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 9 previous similar messages Apr 22 06:11:22 lfs-oss-1-13 kernel: LustreError: 31888:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105c2382400 x1398901148669802/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335075182 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 06:11:22 lfs-oss-1-13 kernel: LustreError: 31888:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 12 previous similar messages Apr 22 06:12:47 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810340e48000 Apr 22 06:13:09 lfs-oss-1-13 kernel: LustreError: 32076:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c367ad050 x1398900877015730/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 1 to 0 dl 1335075309 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:13:09 lfs-oss-1-13 kernel: LustreError: 32076:0:(ost_handler.c:829:ost_brw_read()) Skipped 7 previous similar messages Apr 22 06:13:49 lfs-oss-1-13 kernel: Lustre: Service thread pid 32140 was inactive for 206.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 06:13:49 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 06:13:49 lfs-oss-1-13 kernel: Pid: 32140, comm: ll_ost_io_147 Apr 22 06:13:49 lfs-oss-1-13 kernel: Apr 22 06:13:49 lfs-oss-1-13 kernel: Call Trace: Apr 22 06:13:49 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 06:13:49 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 06:13:49 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 06:13:49 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 06:13:49 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 06:13:49 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 06:13:49 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 06:13:49 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 06:13:49 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 06:13:49 lfs-oss-1-13 kernel: Apr 22 06:14:07 lfs-oss-1-13 kernel: Lustre: 31923:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 06:14:07 lfs-oss-1-13 kernel: Lustre: 31923:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 70 previous similar messages Apr 22 06:14:13 lfs-oss-1-13 kernel: Lustre: 11774:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897316590703 sent from scratch1-OST008c to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 06:14:13 lfs-oss-1-13 kernel: req@ffff810c21987c00 x1398897316590703/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 1335075253 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 06:14:13 lfs-oss-1-13 kernel: Lustre: 11774:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Apr 22 06:14:13 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008c: A client on nid 10.174.14.43@o2ib was evicted due to a lock completion callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 06:14:16 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100584aa000 Apr 22 06:14:16 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101af3fe000 Apr 22 06:15:06 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b13960000 Apr 22 06:15:42 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106da846000 Apr 22 06:15:42 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810143dde000 Apr 22 06:15:42 lfs-oss-1-13 kernel: Lustre: 32207:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0086: ignoring bulk IO comm error with d43d5b1a-2631-5ef2-250b-610f6d3fb2da@NET_0x500000aae0fc5_UUID id 12345-10.174.15.197@o2ib - client will retry Apr 22 06:15:42 lfs-oss-1-13 kernel: Lustre: 32207:0:(ost_handler.c:887:ost_brw_read()) Skipped 10 previous similar messages Apr 22 06:17:20 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100584aa000 Apr 22 06:17:21 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109cb255000 Apr 22 06:17:25 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81030629e000 Apr 22 06:19:52 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108b0abe000 Apr 22 06:19:56 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102fe6a4000 Apr 22 06:21:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 06:21:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 1 previous similar message Apr 22 06:21:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (41) Apr 22 06:21:13 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 1 previous similar message Apr 22 06:21:13 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ac46a9000 Apr 22 06:21:25 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093078e000 Apr 22 06:21:25 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81028c33a000 Apr 22 06:21:25 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb0e64000 Apr 22 06:21:25 lfs-oss-1-13 kernel: Lustre: 31847:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0088: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff810c28d10600; still busy with 1 active RPCs Apr 22 06:21:25 lfs-oss-1-13 kernel: Lustre: 31847:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 23 previous similar messages Apr 22 06:21:25 lfs-oss-1-13 kernel: LustreError: 31847:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105cdcc4c00 x1398900877116873/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335075785 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 06:21:25 lfs-oss-1-13 kernel: LustreError: 31847:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 25 previous similar messages Apr 22 06:21:26 lfs-oss-1-13 kernel: Lustre: Service thread pid 32140 completed after 663.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 06:21:26 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 06:22:28 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a1732000 Apr 22 06:22:28 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 06:22:28 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 12 previous similar messages Apr 22 06:23:07 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81078ebcd000 Apr 22 06:23:44 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106d640c000 Apr 22 06:23:56 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810431c6a000 Apr 22 06:24:19 lfs-oss-1-13 kernel: Lustre: 31971:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0088: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 06:24:19 lfs-oss-1-13 kernel: Lustre: 31971:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 175 previous similar messages Apr 22 06:24:27 lfs-oss-1-13 kernel: LustreError: 32007:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105ba251000 x1398901148681653/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335075879 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:24:27 lfs-oss-1-13 kernel: LustreError: 32007:0:(ost_handler.c:829:ost_brw_read()) Skipped 20 previous similar messages Apr 22 06:24:29 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810acf2fb000 Apr 22 06:25:18 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c20677c00 x1398900888049688/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335075918 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:25:18 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) Skipped 11 previous similar messages Apr 22 06:25:50 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106a9e3a000 Apr 22 06:25:50 lfs-oss-1-13 kernel: Lustre: 32203:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 06:25:50 lfs-oss-1-13 kernel: Lustre: 32203:0:(ost_handler.c:887:ost_brw_read()) Skipped 18 previous similar messages Apr 22 06:26:23 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf0942000 Apr 22 06:27:14 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026a890000 Apr 22 06:27:14 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810865e3a000 Apr 22 06:27:18 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810558dec000 Apr 22 06:28:04 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103a7c05000 Apr 22 06:29:08 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106cccf7000 Apr 22 06:29:26 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085c2fd000 Apr 22 06:29:26 lfs-oss-1-13 kernel: LustreError: 31936:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 18 < 19 req@ffff810812d84000 x1398900888055046/t0 o400->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 192/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 06:29:37 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102f66d8000 Apr 22 06:30:11 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81071fadd000 Apr 22 06:30:29 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a48266000 Apr 22 06:31:18 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81047250e000 Apr 22 06:31:26 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81040b58b000 Apr 22 06:31:26 lfs-oss-1-13 kernel: Lustre: 31724:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from c49d8140-06a7-779c-f541-694bd8aab9b4@10.174.0.204@o2ib to 0xffff8105eb2f0200; still busy with 1 active RPCs Apr 22 06:31:26 lfs-oss-1-13 kernel: Lustre: 31724:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 17 previous similar messages Apr 22 06:31:26 lfs-oss-1-13 kernel: LustreError: 31724:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810617047c50 x1398900888056940/t0 o8->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 368/264 e 0 to 0 dl 1335076386 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 06:31:26 lfs-oss-1-13 kernel: LustreError: 31724:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 17 previous similar messages Apr 22 06:31:40 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d8672000 Apr 22 06:31:57 lfs-oss-1-13 kernel: Lustre: scratch1-OST0087: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 06:32:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 06:32:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 5 previous similar messages Apr 22 06:32:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (36) Apr 22 06:32:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 5 previous similar messages Apr 22 06:32:23 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81058d1cf000 Apr 22 06:32:59 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102e2aa8000 Apr 22 06:32:59 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 06:32:59 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 06:33:46 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810386c04000 Apr 22 06:33:57 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810477089000 Apr 22 06:34:02 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b3a57c000 Apr 22 06:34:21 lfs-oss-1-13 kernel: Lustre: 31843:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 06:34:21 lfs-oss-1-13 kernel: Lustre: 31843:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 199 previous similar messages Apr 22 06:35:13 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810523f5f000 Apr 22 06:35:13 lfs-oss-1-13 kernel: LustreError: 32201:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c217ab400 x1398900888060167/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335076557 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:35:13 lfs-oss-1-13 kernel: LustreError: 32201:0:(ost_handler.c:829:ost_brw_read()) Skipped 23 previous similar messages Apr 22 06:35:16 lfs-oss-1-13 kernel: Lustre: scratch1-OST008a: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 06:35:18 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c39b6000 Apr 22 06:35:46 lfs-oss-1-13 kernel: Lustre: Service thread pid 32139 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 06:35:46 lfs-oss-1-13 kernel: Pid: 32139, comm: ll_ost_io_146 Apr 22 06:35:46 lfs-oss-1-13 kernel: Apr 22 06:35:46 lfs-oss-1-13 kernel: Call Trace: Apr 22 06:35:46 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 06:35:46 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 06:35:46 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 06:35:46 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 06:35:46 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 06:35:46 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 06:35:46 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 06:35:46 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 06:35:46 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 06:35:46 lfs-oss-1-13 kernel: Apr 22 06:35:56 lfs-oss-1-13 kernel: Lustre: 32145:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 06:35:56 lfs-oss-1-13 kernel: Lustre: 32145:0:(ost_handler.c:887:ost_brw_read()) Skipped 20 previous similar messages Apr 22 06:36:07 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0087_UUID lock: ffff8104d41df200/0xcca1a6f6c4f249fd lrc: 3/0,0 mode: PW/PW res: 33026655/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->106495) flags: 0x20 remote: 0x382dcc42b7f4f30c expref: 5 pid: 31939 timeout 5283052918 Apr 22 06:36:22 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81096b893000 Apr 22 06:37:11 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81069b180000 Apr 22 06:37:51 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810425805000 Apr 22 06:38:52 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8b2fa000 Apr 22 06:39:00 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a7c26000 Apr 22 06:40:54 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810189958000 Apr 22 06:41:24 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105805b4000 Apr 22 06:41:49 lfs-oss-1-13 kernel: Lustre: 31965:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008e: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff8105d6f75e00; still busy with 1 active RPCs Apr 22 06:41:49 lfs-oss-1-13 kernel: Lustre: 31965:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 19 previous similar messages Apr 22 06:41:49 lfs-oss-1-13 kernel: LustreError: 31965:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810afd4b8000 x1398900887596374/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335077009 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 06:41:49 lfs-oss-1-13 kernel: LustreError: 31965:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 20 previous similar messages Apr 22 06:41:58 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075ab24000 Apr 22 06:42:06 lfs-oss-1-13 kernel: Lustre: 31907:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897317216227 sent from scratch1-OST008e to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 06:42:06 lfs-oss-1-13 kernel: req@ffff810c18f2c000 x1398897317216227/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335076926 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 06:42:06 lfs-oss-1-13 kernel: Lustre: 31907:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 06:42:06 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008e: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 06:42:06 lfs-oss-1-13 kernel: LustreError: 31907:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff81084baa7c00 ns: filter-scratch1-OST008e_UUID lock: ffff810877858800/0xcca1a6f6c515c6c2 lrc: 3/0,0 mode: --/PW res: 33038337/0 rrc: 3 type: EXT [0->4095] (req 0->4095) flags: 0x0 remote: 0x382dcc42b80429c8 expref: 22 pid: 31907 timeout 0 Apr 22 06:42:24 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client c49d8140-06a7-779c-f541-694bd8aab9b4 (at 10.174.0.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 06:42:24 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 2 previous similar messages Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (33) Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 2 previous similar messages Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810992c6a000 Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109dd6ee000 Apr 22 06:42:29 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810879f5e000 Apr 22 06:42:49 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81083d076000 Apr 22 06:42:49 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa808e000 Apr 22 06:42:52 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c39b6000 Apr 22 06:43:43 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810352b1c000 Apr 22 06:43:43 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.43@o2ib Apr 22 06:43:43 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 06:43:44 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065f9cc000 Apr 22 06:44:21 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106898d4000 Apr 22 06:44:22 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810396216000 Apr 22 06:44:22 lfs-oss-1-13 kernel: Lustre: 31905:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008a: ac6d63f1-a9ff-ca88-174b-46e023f19123 reconnecting Apr 22 06:44:22 lfs-oss-1-13 kernel: Lustre: 31905:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 240 previous similar messages Apr 22 06:44:42 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b96658000 Apr 22 06:44:55 lfs-oss-1-13 kernel: Lustre: Service thread pid 32139 completed after 749.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 06:45:00 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81044ad78000 Apr 22 06:45:17 lfs-oss-1-13 kernel: LustreError: 32219:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105cd935c00 x1398900877217122/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 1 to 0 dl 1335077229 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:45:17 lfs-oss-1-13 kernel: LustreError: 32219:0:(ost_handler.c:829:ost_brw_read()) Skipped 22 previous similar messages Apr 22 06:45:20 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810765c9c000 Apr 22 06:45:33 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.0.200@o2ib ns: filter-scratch1-OST008e_UUID lock: ffff81008da51a00/0xcca1a6f6c515c10b lrc: 3/0,0 mode: PW/PW res: 33039097/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->540671) flags: 0x20 remote: 0x6de46e971d25944b expref: 25 pid: 31727 timeout 5283618917 Apr 22 06:45:33 lfs-oss-1-13 kernel: LustreError: 32102:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c3536fc50 x1398900887600088/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335077280 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:45:33 lfs-oss-1-13 kernel: LustreError: 32102:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 06:45:58 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810074945000 Apr 22 06:46:16 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810356b9c000 Apr 22 06:46:16 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034a814000 Apr 22 06:46:16 lfs-oss-1-13 kernel: Lustre: 31991:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID id 12345-10.174.0.206@o2ib - client will retry Apr 22 06:46:16 lfs-oss-1-13 kernel: Lustre: 31991:0:(ost_handler.c:887:ost_brw_read()) Skipped 26 previous similar messages Apr 22 06:46:27 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0088_UUID lock: ffff810556ef1200/0xcca1a6f6c515cadc lrc: 3/0,0 mode: PW/PW res: 33037615/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b804b018 expref: 20 pid: 31898 timeout 5283672620 Apr 22 06:46:28 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054b69c000 Apr 22 06:46:28 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81077d8a6000 Apr 22 06:46:52 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109c8d5c000 Apr 22 06:46:52 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810638d64000 Apr 22 06:46:52 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810087ff0000 Apr 22 06:47:05 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0ab4c000 Apr 22 06:47:38 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103b1d4f000 Apr 22 06:47:39 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008e_UUID lock: ffff810168b7e800/0xcca1a6f6c515d700 lrc: 3/0,0 mode: PW/PW res: 33038337/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b804b69a expref: 5 pid: 31823 timeout 5283744901 Apr 22 06:48:04 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c5eec000 Apr 22 06:48:04 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109aa49e000 Apr 22 06:48:35 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf1252000 Apr 22 06:48:35 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81089c8e2000 Apr 22 06:48:38 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008c_UUID lock: ffff810a37d24a00/0xcca1a6f6c515d7a8 lrc: 3/0,0 mode: PW/PW res: 33033581/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b805363e expref: 16 pid: 31773 timeout 5283803898 Apr 22 06:48:52 lfs-oss-1-13 kernel: LustreError: 32095:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105e3a33800 x1398900887602137/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335077332 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:48:55 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ed7e4000 Apr 22 06:48:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109c8d5c000 Apr 22 06:48:55 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a57f44000 Apr 22 06:49:24 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108afef6000 Apr 22 06:49:50 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab50c5000 Apr 22 06:50:27 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100aeed6000 Apr 22 06:50:36 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b1a9ac000 Apr 22 06:50:36 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105d69908c0 Apr 22 06:50:36 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105d69908c0 Apr 22 06:50:36 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810365bbc000 Apr 22 06:50:36 lfs-oss-1-13 kernel: LustreError: 32225:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(4096) req@ffff8105a3b3c800 x1398900887604151/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335077531 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:50:36 lfs-oss-1-13 kernel: LustreError: 32225:0:(ost_handler.c:1073:ost_brw_write()) Skipped 2 previous similar messages Apr 22 06:50:36 lfs-oss-1-13 kernel: Lustre: 32225:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008e: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 06:50:36 lfs-oss-1-13 kernel: Lustre: 32225:0:(ost_handler.c:1224:ost_brw_write()) Skipped 13 previous similar messages Apr 22 06:50:40 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106891f6000 Apr 22 06:51:13 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810475129000 Apr 22 06:51:14 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104abe84000 Apr 22 06:51:14 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104b6766000 Apr 22 06:51:14 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810707f54000 Apr 22 06:51:14 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101d21aa000 Apr 22 06:51:14 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102dd2be000 Apr 22 06:51:14 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103cedde000 Apr 22 06:51:50 lfs-oss-1-13 kernel: Lustre: 31971:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008e: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff810833651400; still busy with 2 active RPCs Apr 22 06:51:50 lfs-oss-1-13 kernel: Lustre: 31971:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 37 previous similar messages Apr 22 06:51:50 lfs-oss-1-13 kernel: LustreError: 31971:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8108d327d000 x1398900887606547/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335077610 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 06:51:50 lfs-oss-1-13 kernel: LustreError: 31971:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Apr 22 06:51:55 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c55d6000 Apr 22 06:51:55 lfs-oss-1-13 kernel: LustreError: 31735:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 523 < 524 req@ffff8105b978b400 x1398901148709081/t0 o400->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 192/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 06:52:17 lfs-oss-1-13 kernel: Lustre: scratch1-OST008e: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 06:52:17 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81054b69c000 Apr 22 06:52:17 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a1538000 Apr 22 06:52:17 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101987ae000 Apr 22 06:52:17 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810534d3e000 Apr 22 06:52:17 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810534d3e000 Apr 22 06:52:17 lfs-oss-1-13 kernel: LustreError: 32066:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1045440) req@ffff810a7f4c9c00 x1398900877242707/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335077613 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:52:17 lfs-oss-1-13 kernel: Lustre: 32066:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0089: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 06:52:58 lfs-oss-1-13 kernel: Lustre: 31844:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897317411239 sent from scratch1-OST008b to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 06:52:58 lfs-oss-1-13 kernel: req@ffff8105bd66d000 x1398897317411239/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335077578 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 06:52:58 lfs-oss-1-13 kernel: Lustre: 31844:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 06:52:58 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008b: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 06:52:58 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 06:53:11 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810952112000 Apr 22 06:53:14 lfs-oss-1-13 kernel: LustreError: 781:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335077594 with bad export cookie 14745250233767166978 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b65d4000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810812fc6000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b3cdc4000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b3cdc4000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810520552000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 32058:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1044480) req@ffff8105c2793400 x1398900887606956/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335077698 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810520552000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085593a000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c22aaeec0 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81085593a000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810c22aaeec0 Apr 22 06:53:20 lfs-oss-1-13 kernel: Lustre: 32058:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008e: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81009c2c4000 Apr 22 06:53:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81009c2c4000 Apr 22 06:53:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 06:53:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 13 previous similar messages Apr 22 06:53:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (47) Apr 22 06:53:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 13 previous similar messages Apr 22 06:53:25 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b71d83000 Apr 22 06:54:14 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109da3c6000 Apr 22 06:54:23 lfs-oss-1-13 kernel: Lustre: 31761:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0088: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 06:54:23 lfs-oss-1-13 kernel: Lustre: 31891:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 06:54:23 lfs-oss-1-13 kernel: Lustre: 31891:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 266 previous similar messages Apr 22 06:54:23 lfs-oss-1-13 kernel: Lustre: 31761:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 266 previous similar messages Apr 22 06:54:28 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810874198000 Apr 22 06:54:28 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.206@o2ib Apr 22 06:54:28 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 28 previous similar messages Apr 22 06:54:48 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c281a000 Apr 22 06:54:48 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810534d3e000 Apr 22 06:54:48 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810534d3e000 Apr 22 06:54:48 lfs-oss-1-13 kernel: LustreError: 32128:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff8105ae59c000 x1398900887608585/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335077787 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:54:48 lfs-oss-1-13 kernel: LustreError: 32128:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 06:54:48 lfs-oss-1-13 kernel: Lustre: 32128:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008e: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 06:54:48 lfs-oss-1-13 kernel: Lustre: 32128:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 06:54:52 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c6557000 Apr 22 06:54:52 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81077d8a6000 Apr 22 06:54:52 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810520552000 Apr 22 06:54:52 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103cedde000 Apr 22 06:55:18 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104379a2000 Apr 22 06:55:18 lfs-oss-1-13 kernel: LustreError: 32043:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c1a9ab000 x1398901148711757/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335077767 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:55:18 lfs-oss-1-13 kernel: LustreError: 32043:0:(ost_handler.c:829:ost_brw_read()) Skipped 51 previous similar messages Apr 22 06:55:19 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a200ac000 Apr 22 06:55:36 lfs-oss-1-13 kernel: LustreError: 32184:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105ced2b400 x1398900887608581/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335077736 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:55:36 lfs-oss-1-13 kernel: LustreError: 32184:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 06:56:20 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a51b2000 Apr 22 06:56:41 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be1b33000 Apr 22 06:56:42 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b01152000 Apr 22 06:56:59 lfs-oss-1-13 kernel: Lustre: 32076:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 06:56:59 lfs-oss-1-13 kernel: Lustre: 32076:0:(ost_handler.c:887:ost_brw_read()) Skipped 53 previous similar messages Apr 22 06:57:00 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81041ca7c000 Apr 22 06:57:00 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810945cbe000 Apr 22 06:57:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810874198000 Apr 22 06:57:25 lfs-oss-1-13 kernel: LustreError: 32082:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c244f4000 x1398900877250687/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335077845 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:57:26 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff810b24c2e400/0xcca1a6f6c5344910 lrc: 3/0,0 mode: PR/PR res: 33044793/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b80702cb expref: 28 pid: 31865 timeout 5284331640 Apr 22 06:57:42 lfs-oss-1-13 kernel: Lustre: 31916:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897317529429 sent from scratch1-OST0085 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 06:57:42 lfs-oss-1-13 kernel: req@ffff810af8d30400 x1398897317529429/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335077862 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 06:57:42 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0085: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 06:57:42 lfs-oss-1-13 kernel: LustreError: 32134:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c289e1400 x1398900877252841/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078609 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:58:06 lfs-oss-1-13 kernel: LustreError: 32147:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105bd66d000 x1398900896169612/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335077886 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 06:58:26 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ba082000 Apr 22 06:58:27 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b65d4000 Apr 22 06:59:06 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108bdbc9000 Apr 22 06:59:09 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008a: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 06:59:30 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b1a9ac000 Apr 22 06:59:30 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81003d8a2000 Apr 22 06:59:30 lfs-oss-1-13 kernel: LustreError: 32185:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c17b81c00 x1398900877256130/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335078163 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 06:59:30 lfs-oss-1-13 kernel: Lustre: 32185:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 06:59:57 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810517cdc000 Apr 22 07:00:08 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008b_UUID lock: ffff81080168a400/0xcca1a6f6c543df1e lrc: 3/0,0 mode: PW/PW res: 33047935/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b80a0423 expref: 5 pid: 31950 timeout 5284493158 Apr 22 07:00:08 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 1 previous similar message Apr 22 07:00:28 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bf32f000 Apr 22 07:01:11 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106ffcb4000 Apr 22 07:01:20 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d4388000 Apr 22 07:01:45 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108dc203000 Apr 22 07:01:58 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810927f40000 Apr 22 07:01:58 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81044d44a000 Apr 22 07:02:21 lfs-oss-1-13 kernel: Lustre: 31930:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 07:02:21 lfs-oss-1-13 kernel: Lustre: 31930:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 29 previous similar messages Apr 22 07:02:21 lfs-oss-1-13 kernel: LustreError: 31930:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810946c6ac00 x1398901148719383/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335078241 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 07:02:21 lfs-oss-1-13 kernel: LustreError: 31930:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 33 previous similar messages Apr 22 07:02:53 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810827895000 Apr 22 07:03:13 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810819404000 Apr 22 07:03:13 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103fd37c000 Apr 22 07:03:13 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810338684000 Apr 22 07:03:55 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107b4192000 Apr 22 07:04:15 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81072f0ce000 Apr 22 07:04:40 lfs-oss-1-13 kernel: Lustre: 31949:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008b: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 07:04:40 lfs-oss-1-13 kernel: Lustre: 31949:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 210 previous similar messages Apr 22 07:05:05 lfs-oss-1-13 kernel: LustreError: 31802:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 187 < 188 req@ffff810c1b45fc00 x1398900877265552/t0 o101->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 296/0 e 0 to 0 dl 1335078318 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:05:07 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c20c0000 Apr 22 07:05:07 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810819404000 Apr 22 07:05:07 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810702698000 Apr 22 07:05:07 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81072aad6000 Apr 22 07:05:07 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ddf06000 Apr 22 07:05:07 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.206@o2ib Apr 22 07:05:07 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 20 previous similar messages Apr 22 07:05:07 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: denying duplicate export for ac6d63f1-a9ff-ca88-174b-46e023f19123, -114 Apr 22 07:05:07 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 07:05:10 lfs-oss-1-13 kernel: LustreError: 32046:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c352e4450 x1398900896182020/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078310 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:05:13 lfs-oss-1-13 kernel: Lustre: 31805:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897317686489 sent from scratch1-OST0084 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 07:05:13 lfs-oss-1-13 kernel: req@ffff8108f72e6800 x1398897317686489/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335078313 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 07:05:13 lfs-oss-1-13 kernel: Lustre: 31805:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 07:05:14 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.0.206@o2ib was evicted due to a lock blocking callback to 10.174.0.206@o2ib timed out: rc -107 Apr 22 07:05:14 lfs-oss-1-13 kernel: LustreError: 31987:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff810b0cdeac00 ns: filter-scratch1-OST008d_UUID lock: ffff810261911200/0xcca1a6f6c571b0a1 lrc: 3/0,0 mode: --/PW res: 33064944/0 rrc: 3 type: EXT [0->1048575] (req 0->1048575) flags: 0x0 remote: 0xee91158efee7a811 expref: 20 pid: 31987 timeout 0 Apr 22 07:05:16 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 07:05:16 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 07:05:17 lfs-oss-1-13 kernel: LustreError: 32122:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c2499ec00 x1398900877266872/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078530 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:05:20 lfs-oss-1-13 kernel: LustreError: 32217:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105bd83f800 x1398900896182025/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078435 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:05:20 lfs-oss-1-13 kernel: LustreError: 32217:0:(ost_handler.c:829:ost_brw_read()) Skipped 30 previous similar messages Apr 22 07:05:22 lfs-oss-1-13 kernel: LustreError: 32504:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335078322 with bad export cookie 14745250233790585324 Apr 22 07:05:31 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0085: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 07:05:48 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 07:05:49 lfs-oss-1-13 kernel: LustreError: 32029:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810b0e74bc00 x1398900887619668/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078556 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:05:53 lfs-oss-1-13 kernel: LustreError: 32033:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff81066344e000 x1398900887619660/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078547 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:05:53 lfs-oss-1-13 kernel: LustreError: 32033:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 07:06:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 07:06:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 07:06:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.206@o2ib (43) Apr 22 07:06:03 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 07:06:03 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107be63d000 Apr 22 07:06:40 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ecb71000 Apr 22 07:06:48 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810abb9ec000 Apr 22 07:06:48 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101335a8000 Apr 22 07:06:48 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810891dda000 Apr 22 07:06:48 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810515ba8000 Apr 22 07:06:48 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101441be000 Apr 22 07:07:04 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105ea620080 Apr 22 07:07:04 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810927f40000 Apr 22 07:07:04 lfs-oss-1-13 kernel: Lustre: 32004:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 07:07:04 lfs-oss-1-13 kernel: Lustre: 32004:0:(ost_handler.c:887:ost_brw_read()) Skipped 45 previous similar messages Apr 22 07:07:29 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81072aad6000 Apr 22 07:07:50 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0086_UUID lock: ffff810acc612e00/0xcca1a6f6c571bfe3 lrc: 3/0,0 mode: PR/PR res: 33066980/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b81a87ba expref: 9 pid: 31815 timeout 5284955135 Apr 22 07:07:51 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103925d6000 Apr 22 07:08:03 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810457789000 Apr 22 07:08:25 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.0.206@o2ib was evicted due to a lock blocking callback to 10.174.0.206@o2ib timed out: rc -107 Apr 22 07:08:25 lfs-oss-1-13 kernel: LustreError: Skipped 1 previous similar message Apr 22 07:08:33 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810206e4e000 Apr 22 07:08:33 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81060671d6c0 Apr 22 07:08:34 lfs-oss-1-13 kernel: LustreError: 32197:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105d9d77c00 x1398900877307959/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335078577 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:08:34 lfs-oss-1-13 kernel: Lustre: 32197:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0089: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 07:08:42 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6ac84000 Apr 22 07:10:03 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d38cc000 Apr 22 07:10:28 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client ac6d63f1-a9ff-ca88-174b-46e023f19123 (at 10.174.0.206@o2ib) in 157 seconds. I think it's dead, and I am evicting it. Apr 22 07:10:28 lfs-oss-1-13 kernel: Lustre: Skipped 6 previous similar messages Apr 22 07:10:28 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105aae46000 Apr 22 07:10:52 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81070f2c3000 Apr 22 07:10:52 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024b74e000 Apr 22 07:10:52 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101ace42000 Apr 22 07:11:18 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104fbcf9000 Apr 22 07:11:26 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109555f6ac0 Apr 22 07:11:26 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810beb4dc000 Apr 22 07:11:26 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb6ab0000 Apr 22 07:11:26 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810bb6ab0000 Apr 22 07:11:26 lfs-oss-1-13 kernel: LustreError: 32243:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff81073b364800 x1398900877318448/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335078946 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:11:42 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026ad16000 Apr 22 07:11:46 lfs-oss-1-13 kernel: LustreError: 32015:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105d886fc00 x1398900896185831/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 1 to 0 dl 1335078848 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:11:46 lfs-oss-1-13 kernel: LustreError: 32015:0:(ost_handler.c:825:ost_brw_read()) Skipped 3 previous similar messages Apr 22 07:11:50 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810175fb3000 Apr 22 07:12:07 lfs-oss-1-13 kernel: LustreError: 32109:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105b3854000 x1398900877315517/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335078727 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:12:21 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810088b90000 Apr 22 07:12:21 lfs-oss-1-13 kernel: Lustre: 31733:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff810592dd1c00; still busy with 3 active RPCs Apr 22 07:12:21 lfs-oss-1-13 kernel: Lustre: 31733:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 29 previous similar messages Apr 22 07:12:21 lfs-oss-1-13 kernel: LustreError: 31839:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c2870cc00 x1398900887626673/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335078841 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 07:12:21 lfs-oss-1-13 kernel: LustreError: 31839:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 39 previous similar messages Apr 22 07:12:34 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073c090000 Apr 22 07:12:59 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81081a22d000 Apr 22 07:13:20 lfs-oss-1-13 kernel: LustreError: 32008:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c2a755800 x1398900877315515/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 1 to 0 dl 1335078967 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:13:27 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008e: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 07:13:49 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810696004000 Apr 22 07:13:50 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6b491000 Apr 22 07:14:01 lfs-oss-1-13 kernel: LustreError: 11768:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335078841 with bad export cookie 14745250233788411537 Apr 22 07:14:01 lfs-oss-1-13 kernel: LustreError: 11768:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 07:14:02 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108d347d000 Apr 22 07:14:07 lfs-oss-1-13 kernel: LustreError: 32178:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810849afe400 x1398900876154241/t0 o4->aa75375e-3b2e-e00f-af4e-4a886362fa75@NET_0x500000aae0af9_UUID:0/0 lens 448/416 e 0 to 0 dl 1335079164 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:14:14 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8107d65c7000 x1398897317882179/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:14:14 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 22 07:14:14 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008a_UUID lock: ffff8107c77e9800/0xcca1a6f6c595177f lrc: 3/0,0 mode: PW/PW res: 33075260/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b834c64e expref: 10 pid: 31761 timeout 0 Apr 22 07:14:14 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 07:14:43 lfs-oss-1-13 kernel: Lustre: 31722:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008d: 1662f6a0-94ac-b558-ad6c-555bd1b705c9 reconnecting Apr 22 07:14:43 lfs-oss-1-13 kernel: Lustre: 31722:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 287 previous similar messages Apr 22 07:15:00 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810beb4dc000 Apr 22 07:15:00 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81043c87e000 Apr 22 07:15:00 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109367f6000 Apr 22 07:15:00 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf3904000 Apr 22 07:15:00 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810bf3904000 Apr 22 07:15:00 lfs-oss-1-13 kernel: LustreError: 32027:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff8105e502b800 x1398900877324383/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335079163 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:15:06 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810964967000 Apr 22 07:15:18 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101bba54000 Apr 22 07:15:18 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 07:15:18 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 21 previous similar messages Apr 22 07:15:42 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c9d0c000 Apr 22 07:16:08 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 07:16:08 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 14 previous similar messages Apr 22 07:16:08 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.200@o2ib (36) Apr 22 07:16:08 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 14 previous similar messages Apr 22 07:16:08 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa8752000 Apr 22 07:16:08 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024a63e000 Apr 22 07:16:15 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102319f0000 Apr 22 07:16:15 lfs-oss-1-13 kernel: LustreError: 32101:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c344b0050 x1398900888100162/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335079313 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:16:15 lfs-oss-1-13 kernel: LustreError: 32101:0:(ost_handler.c:829:ost_brw_read()) Skipped 29 previous similar messages Apr 22 07:16:21 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102c4065000 Apr 22 07:17:18 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81004c440000 Apr 22 07:17:21 lfs-oss-1-13 kernel: Lustre: 32166:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 07:17:21 lfs-oss-1-13 kernel: Lustre: 32166:0:(ost_handler.c:887:ost_brw_read()) Skipped 31 previous similar messages Apr 22 07:17:34 lfs-oss-1-13 kernel: Lustre: 31887:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897317996344 sent from scratch1-OST008b to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 07:17:34 lfs-oss-1-13 kernel: req@ffff8105a9554400 x1398897317996344/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335079054 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 07:17:34 lfs-oss-1-13 kernel: Lustre: 31887:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 25 previous similar messages Apr 22 07:17:34 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008b: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 07:17:34 lfs-oss-1-13 kernel: LustreError: Skipped 3 previous similar messages Apr 22 07:17:35 lfs-oss-1-13 kernel: LustreError: 32196:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810909250c00 x1398900877333107/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335079364 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:17:35 lfs-oss-1-13 kernel: LustreError: 32196:0:(ost_handler.c:825:ost_brw_read()) Skipped 2 previous similar messages Apr 22 07:18:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6a65b000 Apr 22 07:18:26 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810840c52000 Apr 22 07:18:30 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008e_UUID lock: ffff8109f2c6b000/0xcca1a6f6c594bd3c lrc: 3/0,0 mode: PW/PW res: 33065446/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b834b013 expref: 9 pid: 31742 timeout 5285595406 Apr 22 07:18:59 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab90e0000 Apr 22 07:19:13 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6d16c000 Apr 22 07:19:13 lfs-oss-1-13 kernel: Lustre: 32039:0:(service.c:1434:ptlrpc_server_handle_request()) @@@ Request x1398900877333106 took longer than estimated (100+6s); client may timeout. req@ffff810c247c6800 x1398900877333106/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335079147 ref 1 fl Complete:/2/0 rc 0/0 Apr 22 07:19:13 lfs-oss-1-13 kernel: LustreError: 11787:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335079153 with bad export cookie 14745250233791545472 Apr 22 07:19:29 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f57e4000 Apr 22 07:19:31 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81031a1ec000 Apr 22 07:20:07 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81073b364c00 Apr 22 07:20:07 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81073b364c00 Apr 22 07:20:07 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105ab480000 Apr 22 07:20:07 lfs-oss-1-13 kernel: LustreError: 32009:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(192960) req@ffff8105f6188400 x1398900877336782/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335079470 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:20:07 lfs-oss-1-13 kernel: Lustre: 32009:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 07:20:07 lfs-oss-1-13 kernel: Lustre: 32009:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 07:20:16 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810270704000 Apr 22 07:20:29 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024a61a000 Apr 22 07:20:30 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81043c87e000 Apr 22 07:20:32 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0086: A client on nid 10.174.10.245@o2ib was evicted due to a lock blocking callback to 10.174.10.245@o2ib timed out: rc -107 Apr 22 07:20:32 lfs-oss-1-13 kernel: LustreError: 31970:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff81059fae7200 ns: filter-scratch1-OST0086_UUID lock: ffff8106aac97200/0xcca1a6f6c5af9417 lrc: 3/0,0 mode: --/PW res: 33085817/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x57569c14c555b3f3 expref: 12 pid: 31970 timeout 0 Apr 22 07:20:32 lfs-oss-1-13 kernel: LustreError: 31871:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff8107c867d800 ns: filter-scratch1-OST0088_UUID lock: ffff8106a7133200/0xcca1a6f6c5af9528 lrc: 3/0,0 mode: --/PW res: 33084648/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x57569c14c555b4e8 expref: 15 pid: 31871 timeout 0 Apr 22 07:20:32 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8107ac184400 x1398897318013817/t0 o105->@NET_0x500000aae0af5_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:20:32 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.10.245@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008d_UUID lock: ffff810658f53200/0xcca1a6f6c5af95a6 lrc: 3/0,0 mode: PW/PW res: 33083896/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x57569c14c555b5f9 expref: 8 pid: 31820 timeout 0 Apr 22 07:20:37 lfs-oss-1-13 kernel: LustreError: 11779:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.10.245@o2ib arrived at 1335079237 with bad export cookie 14745250233367612998 Apr 22 07:20:37 lfs-oss-1-13 kernel: LustreError: 11779:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 2 previous similar messages Apr 22 07:21:24 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810636e53000 Apr 22 07:21:48 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050b1a4000 Apr 22 07:21:50 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810936f1f000 Apr 22 07:22:13 lfs-oss-1-13 kernel: Lustre: scratch1-OST008d: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 07:22:13 lfs-oss-1-13 kernel: Lustre: Skipped 9 previous similar messages Apr 22 07:22:22 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810661ee8000 Apr 22 07:22:29 lfs-oss-1-13 kernel: Lustre: 31898:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 07:22:29 lfs-oss-1-13 kernel: Lustre: 31898:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 38 previous similar messages Apr 22 07:22:29 lfs-oss-1-13 kernel: LustreError: 31898:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8108ef2e9c00 x1398901148740206/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335079449 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 07:22:29 lfs-oss-1-13 kernel: LustreError: 31898:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 57 previous similar messages Apr 22 07:23:31 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.0.206@o2ib ns: filter-scratch1-OST0088_UUID lock: ffff81026b779400/0xcca1a6f6c5b63eab lrc: 3/0,0 mode: PW/PW res: 33086924/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->81919) flags: 0x20 remote: 0xee91158efeed5b28 expref: 6 pid: 31780 timeout 5285896159 Apr 22 07:23:37 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81057fbe9000 Apr 22 07:23:55 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a02cc000 Apr 22 07:24:21 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c21d44800 Apr 22 07:24:47 lfs-oss-1-13 kernel: Lustre: 31786:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0088: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 07:24:47 lfs-oss-1-13 kernel: Lustre: 31786:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 257 previous similar messages Apr 22 07:24:54 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075b450000 Apr 22 07:24:54 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100441c8000 Apr 22 07:24:54 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810081b88000 Apr 22 07:25:04 lfs-oss-1-13 kernel: LustreError: 32065:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c352f3450 x1398900877349643/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335079657 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:25:04 lfs-oss-1-13 kernel: LustreError: 32065:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 07:25:50 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093facb000 Apr 22 07:25:56 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd61c5000 Apr 22 07:25:56 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 07:25:56 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 23 previous similar messages Apr 22 07:25:56 lfs-oss-1-13 kernel: LustreError: 31933:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 93 < 94 req@ffff8105ddc00400 x1398900888109946/t0 o400->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 192/0 e 0 to 0 dl 1335079582 ref 2 fl Interpret:H/0/0 rc 0/0 Apr 22 07:26:00 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ce1e0000 Apr 22 07:27:02 lfs-oss-1-13 kernel: Lustre: Service thread pid 32181 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 07:27:02 lfs-oss-1-13 kernel: Pid: 32181, comm: ll_ost_io_188 Apr 22 07:27:02 lfs-oss-1-13 kernel: Apr 22 07:27:02 lfs-oss-1-13 kernel: Call Trace: Apr 22 07:27:02 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 07:27:02 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 07:27:02 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 07:27:02 lfs-oss-1-13 kernel: Apr 22 07:27:02 lfs-oss-1-13 kernel: Lustre: Service thread pid 32081 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 07:27:02 lfs-oss-1-13 kernel: Pid: 32081, comm: ll_ost_io_89 Apr 22 07:27:02 lfs-oss-1-13 kernel: Apr 22 07:27:02 lfs-oss-1-13 kernel: Call Trace: Apr 22 07:27:02 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 07:27:02 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 07:27:02 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 07:27:02 lfs-oss-1-13 kernel: Apr 22 07:27:02 lfs-oss-1-13 kernel: Pid: 32235, comm: ll_ost_io_242 Apr 22 07:27:02 lfs-oss-1-13 kernel: Apr 22 07:27:02 lfs-oss-1-13 kernel: Call Trace: Apr 22 07:27:02 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 07:27:02 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 07:27:02 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 07:27:02 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 07:27:02 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 07:27:02 lfs-oss-1-13 kernel: Apr 22 07:27:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 07:27:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 9 previous similar messages Apr 22 07:27:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.206@o2ib (40) Apr 22 07:27:05 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 9 previous similar messages Apr 22 07:27:05 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f170a000 Apr 22 07:27:12 lfs-oss-1-13 kernel: LustreError: 32081:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c2ea67050 x1398900887775426/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 2 to 0 dl 1335079784 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:27:12 lfs-oss-1-13 kernel: LustreError: 32081:0:(ost_handler.c:829:ost_brw_read()) Skipped 24 previous similar messages Apr 22 07:27:12 lfs-oss-1-13 kernel: Lustre: Service thread pid 32081 completed after 210.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 07:27:12 lfs-oss-1-13 kernel: Lustre: Service thread pid 32235 completed after 210.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 07:27:13 lfs-oss-1-13 kernel: Lustre: Service thread pid 32181 completed after 211.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 07:27:15 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 07:27:15 lfs-oss-1-13 kernel: LustreError: Skipped 3 previous similar messages Apr 22 07:27:15 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105e59ccc00 x1398897318154465/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:27:15 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0089_UUID lock: ffff8108d9e59e00/0xcca1a6f6c5cd0aac lrc: 3/0,0 mode: PW/PW res: 33089315/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b83a96c3 expref: 6 pid: 31795 timeout 0 Apr 22 07:27:17 lfs-oss-1-13 kernel: LustreError: 793:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335079637 with bad export cookie 14745250233798419773 Apr 22 07:27:29 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3cc9a000 Apr 22 07:27:29 lfs-oss-1-13 kernel: Lustre: 32035:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 07:27:29 lfs-oss-1-13 kernel: Lustre: 32035:0:(ost_handler.c:887:ost_brw_read()) Skipped 27 previous similar messages Apr 22 07:27:40 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0086_UUID lock: ffff810b7f036800/0xcca1a6f6c5b67776 lrc: 3/0,0 mode: PR/PR res: 33069021/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x10020 remote: 0x382dcc42b8379dde expref: 13 pid: 31800 timeout 5286145150 Apr 22 07:27:40 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 14 previous similar messages Apr 22 07:28:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105b3ec1000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b3bad8000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8674e000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a8674e000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 32240:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c2ef93050 x1398900877383622/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335079924 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810978140000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810978140000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ec786000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104ec786000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100282c0000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8100282c0000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81051ac0a000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81051ac0a000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810796450000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810796450000 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 32199:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff810c04467000 x1398900877383698/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335079975 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:28:45 lfs-oss-1-13 kernel: LustreError: 32199:0:(ost_handler.c:1078:ost_brw_write()) Skipped 8 previous similar messages Apr 22 07:28:46 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81040be7b000 Apr 22 07:28:57 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81096530a000 Apr 22 07:29:44 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107b13d0000 Apr 22 07:29:44 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c0fbec000 Apr 22 07:29:44 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944da4000 Apr 22 07:29:44 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101dbe76000 Apr 22 07:29:49 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a753c000 Apr 22 07:30:01 lfs-oss-1-13 kernel: LustreError: 32075:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810b0bfe8800 x1398900887779853/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335079801 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:30:01 lfs-oss-1-13 kernel: LustreError: 32075:0:(ost_handler.c:822:ost_brw_read()) Skipped 2 previous similar messages Apr 22 07:30:47 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810833062000 Apr 22 07:31:03 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106a88ea000 Apr 22 07:32:08 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105eab44800 Apr 22 07:32:30 lfs-oss-1-13 kernel: Lustre: 31877:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008c: refuse reconnection from ac6d63f1-a9ff-ca88-174b-46e023f19123@10.174.0.206@o2ib to 0xffff8105bdd1f200; still busy with 1 active RPCs Apr 22 07:32:30 lfs-oss-1-13 kernel: Lustre: 31877:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 25 previous similar messages Apr 22 07:32:30 lfs-oss-1-13 kernel: LustreError: 31877:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8102eeab6800 x1398900896214472/t0 o8->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 368/264 e 0 to 0 dl 1335080050 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 07:32:30 lfs-oss-1-13 kernel: LustreError: 31877:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 35 previous similar messages Apr 22 07:32:45 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3743e000 Apr 22 07:33:35 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109d9be4000 Apr 22 07:33:37 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106c8b61000 Apr 22 07:34:00 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810833062000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81002c5b6000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810946366000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81041fd8e000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bad588000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810946366000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106eec88000 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 32215:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(298522) req@ffff810c19fe2800 x1398900887785062/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335080152 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:34:34 lfs-oss-1-13 kernel: LustreError: 32215:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 07:34:34 lfs-oss-1-13 kernel: Lustre: 32215:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008d: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 07:34:34 lfs-oss-1-13 kernel: Lustre: 32215:0:(ost_handler.c:1224:ost_brw_write()) Skipped 17 previous similar messages Apr 22 07:35:03 lfs-oss-1-13 kernel: Lustre: 31851:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 07:35:03 lfs-oss-1-13 kernel: Lustre: 31851:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 233 previous similar messages Apr 22 07:35:05 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ad81c000 Apr 22 07:35:25 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108103f4000 Apr 22 07:35:27 lfs-oss-1-13 kernel: Lustre: Service thread pid 32089 was inactive for 392.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 07:35:27 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 07:35:27 lfs-oss-1-13 kernel: Pid: 32089, comm: ll_ost_io_96 Apr 22 07:35:27 lfs-oss-1-13 kernel: Apr 22 07:35:27 lfs-oss-1-13 kernel: Call Trace: Apr 22 07:35:27 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 07:35:27 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 07:35:27 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 07:35:27 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 07:35:27 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 07:35:27 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 07:35:27 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 07:35:27 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 07:35:27 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 07:35:27 lfs-oss-1-13 kernel: Apr 22 07:35:41 lfs-oss-1-13 kernel: Lustre: 31750:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897318389915 sent from scratch1-OST008d to NID 10.174.14.43@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 07:35:41 lfs-oss-1-13 kernel: req@ffff810c1a9ab000 x1398897318389915/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335080141 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 07:35:41 lfs-oss-1-13 kernel: Lustre: 31750:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 150929 previous similar messages Apr 22 07:36:03 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810952b16800 Apr 22 07:36:19 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102336fe000 Apr 22 07:36:44 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b351c2000 Apr 22 07:36:44 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 07:36:44 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 25 previous similar messages Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810754a8e000 Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81038a30e000 Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81051ac0a000 Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81063a4aa800 Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b2464000 Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81063a4aa800 Apr 22 07:36:53 lfs-oss-1-13 kernel: LustreError: 32004:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(298522) req@ffff810c15760c00 x1398900887787471/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335080289 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:37:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 07:37:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 9 previous similar messages Apr 22 07:37:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.206@o2ib (35) Apr 22 07:37:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 9 previous similar messages Apr 22 07:37:11 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a2aec2000 Apr 22 07:37:17 lfs-oss-1-13 kernel: LustreError: 32154:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c1c93c000 x1398900877394164/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335080375 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:37:17 lfs-oss-1-13 kernel: LustreError: 32154:0:(ost_handler.c:829:ost_brw_read()) Skipped 29 previous similar messages Apr 22 07:37:44 lfs-oss-1-13 kernel: Lustre: 32211:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0086: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 07:37:44 lfs-oss-1-13 kernel: Lustre: 32211:0:(ost_handler.c:887:ost_brw_read()) Skipped 28 previous similar messages Apr 22 07:38:00 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0089_UUID lock: ffff8101e965ac00/0xcca1a6f6c5dcf3d3 lrc: 3/0,0 mode: PR/PR res: 33059144/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b84a9163 expref: 8 pid: 31971 timeout 5286765194 Apr 22 07:38:27 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c17bb000 Apr 22 07:39:43 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810334287000 Apr 22 07:39:54 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101f6566000 Apr 22 07:40:03 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810331d0c000 Apr 22 07:40:03 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b121c000 Apr 22 07:40:03 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bad588000 Apr 22 07:40:03 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085f2c0000 Apr 22 07:40:03 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a8674e000 Apr 22 07:40:53 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c26c72000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026a892000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b57eac000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81017cfdc000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81049ea64000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c17e4c000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810c17e4c000 Apr 22 07:40:58 lfs-oss-1-13 kernel: LustreError: 32216:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(298522) req@ffff810a1e4a1000 x1398900887791070/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335080537 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:41:28 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client ac6d63f1-a9ff-ca88-174b-46e023f19123 (at 10.174.0.206@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 07:41:28 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 07:41:47 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008c: A client on nid 10.174.0.206@o2ib was evicted due to a lock blocking callback to 10.174.0.206@o2ib timed out: rc -107 Apr 22 07:41:47 lfs-oss-1-13 kernel: LustreError: 31956:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff8105bdd1f200 ns: filter-scratch1-OST008c_UUID lock: ffff810333a85800/0xcca1a6f6c601d9ab lrc: 3/0,0 mode: --/PW res: 33099287/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x80000000 remote: 0xee91158efeeda7cd expref: 28 pid: 31956 timeout 0 Apr 22 07:41:47 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105a9306c00 x1398897318566671/t0 o105->@NET_0x500000aae00ce_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:41:47 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.206@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff810569146400/0xcca1a6f6c601d8ee lrc: 3/0,0 mode: PW/PW res: 33099352/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0xee91158efeeda6ed expref: 15 pid: 31931 timeout 0 Apr 22 07:41:48 lfs-oss-1-13 kernel: LustreError: 32109:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810aa71a8400 x1398900896225543/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335080750 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:41:56 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810116cec000 Apr 22 07:41:56 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81051ac0a000 Apr 22 07:41:56 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810340744000 Apr 22 07:41:56 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e3936000 Apr 22 07:41:56 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104e3936000 Apr 22 07:42:16 lfs-oss-1-13 kernel: Lustre: 32155:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-201), not sending early reply Apr 22 07:42:16 lfs-oss-1-13 kernel: req@ffff810a00ec0c00 x1398900888114006/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 1 to 0 dl 1335080541 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 07:42:16 lfs-oss-1-13 kernel: Lustre: 32155:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Apr 22 07:42:21 lfs-oss-1-13 kernel: LustreError: 32089:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 806+0s req@ffff810a00ec0c00 x1398900888114006/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 1 to 0 dl 1335080541 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:42:21 lfs-oss-1-13 kernel: Lustre: Service thread pid 32089 completed after 806.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 07:42:34 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810931e10000 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102320da000 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b57eac000 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810833062000 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae6a65000 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810ae6a65000 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6684c000 Apr 22 07:42:39 lfs-oss-1-13 kernel: Lustre: 31770:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff810c23243400; still busy with 4 active RPCs Apr 22 07:42:39 lfs-oss-1-13 kernel: Lustre: 31770:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 31 previous similar messages Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 31770:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c232b4400 x1398900887794579/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335080659 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 07:42:39 lfs-oss-1-13 kernel: LustreError: 31770:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 34 previous similar messages Apr 22 07:43:03 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810891b5a000 Apr 22 07:43:25 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81074ba72000 Apr 22 07:43:25 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a82bdc000 Apr 22 07:43:25 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029a652000 Apr 22 07:43:25 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810340744000 Apr 22 07:44:06 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81068bc08000 Apr 22 07:44:14 lfs-oss-1-13 kernel: LustreError: 32027:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105b457b400 x1398900896226378/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335080654 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81052c864000 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034bf90000 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050940a000 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810799730000 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810626f00000 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810626f00000 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 32149:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(298522) req@ffff8105f08a0400 x1398900887795384/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335080751 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:44:32 lfs-oss-1-13 kernel: LustreError: 32149:0:(ost_handler.c:1073:ost_brw_write()) Skipped 2 previous similar messages Apr 22 07:44:47 lfs-oss-1-13 kernel: LustreError: 32215:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105bff8d000 x1398900896231182/t0 o3->ac6d63f1-a9ff-ca88-174b-46e023f19123@NET_0x500000aae00ce_UUID:0/0 lens 448/400 e 0 to 0 dl 1335080904 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:44:53 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104468b8000 Apr 22 07:44:53 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015d2d6000 Apr 22 07:44:53 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810923a2e000 Apr 22 07:44:53 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c587c000 Apr 22 07:44:53 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103c587c000 Apr 22 07:44:53 lfs-oss-1-13 kernel: Lustre: 32188:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0086: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 07:44:53 lfs-oss-1-13 kernel: Lustre: 32188:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 07:45:09 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b2464000 Apr 22 07:45:09 lfs-oss-1-13 kernel: Lustre: 31878:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 07:45:09 lfs-oss-1-13 kernel: Lustre: 31878:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 199 previous similar messages Apr 22 07:45:31 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad453a000 Apr 22 07:45:36 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81014ca03000 Apr 22 07:46:21 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101622e6000 Apr 22 07:46:21 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006007e000 Apr 22 07:46:21 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be4c30000 Apr 22 07:46:25 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bad588000 Apr 22 07:46:39 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c47a23c0 Apr 22 07:46:39 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093e0dc000 Apr 22 07:46:40 lfs-oss-1-13 kernel: LustreError: 32115:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c25934800 x1398900888364249/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335081509 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:46:40 lfs-oss-1-13 kernel: LustreError: 32115:0:(ost_handler.c:1064:ost_brw_write()) Skipped 1 previous similar message Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: tx_queue, 1 seconds Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.200@o2ib (38) Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81022e0a4000 Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 32129:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105d2cd2800 x1398900887797388/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335080986 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:47:17 lfs-oss-1-13 kernel: LustreError: 32129:0:(ost_handler.c:829:ost_brw_read()) Skipped 48 previous similar messages Apr 22 07:47:41 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81024fdc4000 Apr 22 07:47:41 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 07:47:41 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 17 previous similar messages Apr 22 07:47:42 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b93e142c0 Apr 22 07:47:42 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81019345a000 Apr 22 07:47:43 lfs-oss-1-13 kernel: LustreError: 32036:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105b00cbc00 x1398900888365937/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335080909 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:47:43 lfs-oss-1-13 kernel: Lustre: 31875:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897318660109 sent from scratch1-OST008d to NID 10.174.0.200@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 07:47:43 lfs-oss-1-13 kernel: req@ffff810849afec00 x1398897318660109/t0 o104->@NET_0x500000aae00c8_UUID:15/16 lens 296/384 e 0 to 1 dl 1335080863 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 07:47:43 lfs-oss-1-13 kernel: Lustre: 31875:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Apr 22 07:47:44 lfs-oss-1-13 kernel: LustreError: 32016:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff81060ab93800 x1398900887805121/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335080952 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:48:20 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.0.206@o2ib ns: filter-scratch1-OST0086_UUID lock: ffff810026de9000/0xcca1a6f6c60c92ca lrc: 3/0,0 mode: PR/PR res: 33111456/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xee91158efeef03e3 expref: 8 pid: 31743 timeout 5287385028 Apr 22 07:48:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105bd31b400 x1398897318660726/t0 o105->@NET_0x500000aae00ce_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:48:20 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.206@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0086_UUID lock: ffff810aaac6b200/0xcca1a6f6c61516f6 lrc: 3/0,0 mode: PW/PW res: 33111456/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xee91158efeef92fa expref: 5 pid: 31886 timeout 0 Apr 22 07:48:28 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075f75c000 Apr 22 07:48:28 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810188394000 Apr 22 07:48:28 lfs-oss-1-13 kernel: Lustre: 32016:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008d: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 07:48:28 lfs-oss-1-13 kernel: Lustre: 32016:0:(ost_handler.c:887:ost_brw_read()) Skipped 53 previous similar messages Apr 22 07:48:39 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104dc92a480 Apr 22 07:48:39 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b7f692000 Apr 22 07:48:39 lfs-oss-1-13 kernel: LustreError: 32105:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c273c1800 x1398900888366770/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335080964 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:48:44 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100378e0000 Apr 22 07:49:31 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a38824000 Apr 22 07:49:31 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81080feba000 Apr 22 07:49:42 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c281c15c0 Apr 22 07:49:42 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81017a986000 Apr 22 07:49:42 lfs-oss-1-13 kernel: LustreError: 32072:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8106050f8400 x1398900888367605/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335081026 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:50:00 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a3e38000 Apr 22 07:50:46 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c1d028ac0 Apr 22 07:50:46 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016f9c6000 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810af17b2000 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810146cf6000 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e9542000 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108cf11d000 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108cf11d000 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(622912) req@ffff810c16b1b400 x1398900877411805/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335081258 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:50:59 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:1073:ost_brw_write()) Skipped 1 previous similar message Apr 22 07:51:15 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a3e38000 Apr 22 07:51:27 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105c957c800 x1398897318662266/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:51:27 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff810b3676ce00/0xcca1a6f6c6159313 lrc: 3/0,0 mode: PW/PW res: 33108852/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x6de46e971d29e2cb expref: 4 pid: 31853 timeout 0 Apr 22 07:51:29 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81030f507000 Apr 22 07:51:29 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c139212c0 Apr 22 07:52:01 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107feb51000 Apr 22 07:52:44 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075f75e000 Apr 22 07:52:44 lfs-oss-1-13 kernel: Lustre: 31874:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 07:52:44 lfs-oss-1-13 kernel: Lustre: 31874:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 34 previous similar messages Apr 22 07:52:44 lfs-oss-1-13 kernel: LustreError: 31874:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105f199b000 x1398901148769532/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335081264 ref 2 fl Interpret:/0/0 rc -16/0 Apr 22 07:52:44 lfs-oss-1-13 kernel: LustreError: 31874:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 39 previous similar messages Apr 22 07:52:54 lfs-oss-1-13 kernel: LustreError: 31998:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c17b81c00 x1398900888368894/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335081746 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:53:17 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108d9e63000 Apr 22 07:53:18 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810365543000 Apr 22 07:53:18 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090fe96000 Apr 22 07:53:18 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81090fe96000 Apr 22 07:53:22 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aebde6000 Apr 22 07:53:22 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b2464000 Apr 22 07:53:29 lfs-oss-1-13 kernel: Lustre: scratch1-OST008e: haven't heard from client c49d8140-06a7-779c-f541-694bd8aab9b4 (at 10.174.0.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 07:53:29 lfs-oss-1-13 kernel: Lustre: Skipped 8 previous similar messages Apr 22 07:53:59 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102754be000 Apr 22 07:55:03 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6684c000 Apr 22 07:55:03 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108113d6000 Apr 22 07:55:11 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81012105c000 Apr 22 07:55:14 lfs-oss-1-13 kernel: Lustre: 31807:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0086: 1662f6a0-94ac-b558-ad6c-555bd1b705c9 reconnecting Apr 22 07:55:14 lfs-oss-1-13 kernel: Lustre: 31807:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 331 previous similar messages Apr 22 07:55:41 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad2bec000 Apr 22 07:56:14 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bee777000 Apr 22 07:56:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a491e0000 Apr 22 07:56:15 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81049e53e000 Apr 22 07:56:40 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 07:56:40 lfs-oss-1-13 kernel: LustreError: Skipped 9 previous similar messages Apr 22 07:56:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105ab22a400 x1398897318665282/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 07:56:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0088_UUID lock: ffff810784523000/0xcca1a6f6c616206a lrc: 3/0,0 mode: PW/PW res: 33113268/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1007615) flags: 0x0 remote: 0x6de46e971d3548ac expref: 10 pid: 31771 timeout 0 Apr 22 07:56:41 lfs-oss-1-13 kernel: LustreError: 32131:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff810c30b40850 x1398900887815197/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 1 to 0 dl 1335081675 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:56:41 lfs-oss-1-13 kernel: Lustre: 32131:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0088: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 07:56:41 lfs-oss-1-13 kernel: Lustre: 32131:0:(ost_handler.c:1224:ost_brw_write()) Skipped 7 previous similar messages Apr 22 07:56:56 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102862f8000 Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.200@o2ib (36) Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a36a14000 Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810281d32000 Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810281d32000 Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 32168:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c3052e450 x1398900887836252/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335081481 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 07:57:23 lfs-oss-1-13 kernel: LustreError: 32168:0:(ost_handler.c:829:ost_brw_read()) Skipped 27 previous similar messages Apr 22 07:57:30 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104f8372000 Apr 22 07:57:43 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b51c06000 Apr 22 07:57:43 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e075c000 Apr 22 07:57:43 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81094925c000 Apr 22 07:57:43 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034404a000 Apr 22 07:57:43 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81034404a000 Apr 22 07:58:05 lfs-oss-1-13 kernel: Lustre: 31780:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897318666422 sent from scratch1-OST0086 to NID 10.174.0.200@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 07:58:05 lfs-oss-1-13 kernel: req@ffff8109e4fea800 x1398897318666422/t0 o106->@NET_0x500000aae00c8_UUID:15/16 lens 296/424 e 0 to 1 dl 1335081485 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 07:58:05 lfs-oss-1-13 kernel: Lustre: 31780:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 07:58:18 lfs-oss-1-13 kernel: LustreError: 32185:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c1aa78c00 x1398900888375554/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335081498 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 07:58:39 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102fe74f000 Apr 22 07:58:50 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810411c54000 Apr 22 07:58:50 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 07:58:50 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 3099 previous similar messages Apr 22 07:58:51 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101337d6000 Apr 22 07:58:51 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105d7b18000 Apr 22 07:58:51 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105d7b18000 Apr 22 07:58:56 lfs-oss-1-13 kernel: Lustre: 32215:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0086: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 07:58:56 lfs-oss-1-13 kernel: Lustre: 32215:0:(ost_handler.c:887:ost_brw_read()) Skipped 30 previous similar messages Apr 22 07:59:04 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.0.200@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff81030d71b400/0xcca1a6f6c6162071 lrc: 3/0,0 mode: PR/PR res: 33112541/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x6de46e971d354954 expref: 16 pid: 31811 timeout 5288029321 Apr 22 07:59:04 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 2 previous similar messages Apr 22 07:59:54 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810915710000 Apr 22 08:00:06 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aebde6000 Apr 22 08:00:06 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810afb8ee000 Apr 22 08:00:39 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810af82e7000 Apr 22 08:00:39 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810388e0c380 Apr 22 08:00:39 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810388e0c380 Apr 22 08:00:56 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106930fe000 Apr 22 08:01:09 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037aca6000 Apr 22 08:01:18 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b2132000 Apr 22 08:01:24 lfs-oss-1-13 kernel: LustreError: 32203:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810a14c5e800 x1398900888378878/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335081684 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:01:35 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810bc7ffac00 x1398897318668423/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:01:35 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff810261660400/0xcca1a6f6c61692bd lrc: 3/0,0 mode: PW/PW res: 33109259/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->143359) flags: 0x0 remote: 0x6de46e971d3f78f7 expref: 7 pid: 31973 timeout 0 Apr 22 08:02:08 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109c018c000 Apr 22 08:02:12 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106d7b86000 Apr 22 08:02:19 lfs-oss-1-13 kernel: Lustre: Service thread pid 32027 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 08:02:19 lfs-oss-1-13 kernel: Pid: 32027, comm: ll_ost_io_35 Apr 22 08:02:19 lfs-oss-1-13 kernel: Apr 22 08:02:19 lfs-oss-1-13 kernel: Call Trace: Apr 22 08:02:19 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 08:02:19 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 08:02:19 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 08:02:19 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 08:02:19 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 08:02:19 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 08:02:19 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 08:02:19 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 08:02:19 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 08:02:19 lfs-oss-1-13 kernel: Apr 22 08:02:20 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b171c5000 Apr 22 08:02:32 lfs-oss-1-13 kernel: Lustre: Service thread pid 32027 completed after 213.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 08:02:51 lfs-oss-1-13 kernel: Lustre: 31822:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 08:02:51 lfs-oss-1-13 kernel: Lustre: 31822:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 27 previous similar messages Apr 22 08:02:51 lfs-oss-1-13 kernel: LustreError: 31822:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c228f0c00 x1398901148779161/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335081871 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 08:02:51 lfs-oss-1-13 kernel: LustreError: 31822:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 32 previous similar messages Apr 22 08:03:09 lfs-oss-1-13 kernel: LustreError: 32231:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff81060d2afc00 x1398900888380628/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335081789 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:03:11 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b261c6000 Apr 22 08:03:24 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810af5b40000 Apr 22 08:03:24 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b8a810000 Apr 22 08:03:24 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810411c54000 Apr 22 08:03:24 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006cd28000 Apr 22 08:03:24 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81095617e000 Apr 22 08:03:25 lfs-oss-1-13 kernel: LustreError: 32060:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8106d44be000 x1398900888381034/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 1 to 0 dl 1335081877 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:04:06 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810506464000 Apr 22 08:04:07 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103cd9dd000 Apr 22 08:04:14 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810bf73ce800 x1398897318670940/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:04:14 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008e_UUID lock: ffff810942dd1800/0xcca1a6f6c61719d8 lrc: 3/0,0 mode: PW/PW res: 33114893/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x382dcc42b84eda79 expref: 5 pid: 31839 timeout 0 Apr 22 08:04:40 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023bb60000 Apr 22 08:05:16 lfs-oss-1-13 kernel: Lustre: 31732:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: 1662f6a0-94ac-b558-ad6c-555bd1b705c9 reconnecting Apr 22 08:05:16 lfs-oss-1-13 kernel: Lustre: 31732:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 234 previous similar messages Apr 22 08:05:21 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bba12c000 Apr 22 08:05:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c1f094c00 x1398897318673143/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:05:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0088_UUID lock: ffff810150523800/0xcca1a6f6c61763eb lrc: 3/0,0 mode: PW/PW res: 33113412/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x6de46e971d6893ef expref: 8 pid: 31713 timeout 0 Apr 22 08:05:25 lfs-oss-1-13 kernel: LustreError: 32089:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c177a6800 x1398900887916658/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082672 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:05:29 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810724990000 Apr 22 08:05:33 lfs-oss-1-13 kernel: Lustre: scratch1-OST008a: haven't heard from client c49d8140-06a7-779c-f541-694bd8aab9b4 (at 10.174.0.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 08:05:33 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 08:05:56 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aebde6000 Apr 22 08:05:56 lfs-oss-1-13 kernel: LustreError: 793:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335081956 with bad export cookie 14745250233802476476 Apr 22 08:05:59 lfs-oss-1-13 kernel: LustreError: 23517:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.200@o2ib arrived at 1335081959 with bad export cookie 14745250233803224846 Apr 22 08:06:51 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105a186a000 Apr 22 08:06:51 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810481f5f000 Apr 22 08:07:12 lfs-oss-1-13 kernel: LustreError: 32016:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c2ca47850 x1398900888670111/t0 o4->80a70f4f-ada3-29a8-d275-fc40c0fcd93c@NET_0x500000aae0b83_UUID:0/0 lens 448/416 e 0 to 0 dl 1335082257 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:07:12 lfs-oss-1-13 kernel: Lustre: 32016:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with 80a70f4f-ada3-29a8-d275-fc40c0fcd93c@NET_0x500000aae0b83_UUID id 12345-10.174.11.131@o2ib - client will retry Apr 22 08:07:12 lfs-oss-1-13 kernel: Lustre: 32016:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 08:07:27 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bba12c000 Apr 22 08:07:28 lfs-oss-1-13 kernel: LustreError: 32040:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c34e96c50 x1398901148782768/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082256 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:07:28 lfs-oss-1-13 kernel: LustreError: 32040:0:(ost_handler.c:829:ost_brw_read()) Skipped 34 previous similar messages Apr 22 08:08:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 08:08:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 8 previous similar messages Apr 22 08:08:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.43@o2ib (29) Apr 22 08:08:06 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 8 previous similar messages Apr 22 08:08:06 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006cd28000 Apr 22 08:08:13 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a753c000 Apr 22 08:08:43 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bbf934000 Apr 22 08:08:52 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81021243c000 Apr 22 08:08:52 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023bb60000 Apr 22 08:08:52 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105a186a000 Apr 22 08:08:52 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.200@o2ib Apr 22 08:08:52 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 08:08:58 lfs-oss-1-13 kernel: LustreError: 32056:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105e5a08800 x1398900888386413/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082138 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:08:58 lfs-oss-1-13 kernel: Lustre: 32056:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 08:08:58 lfs-oss-1-13 kernel: Lustre: 32056:0:(ost_handler.c:887:ost_brw_read()) Skipped 40 previous similar messages Apr 22 08:09:47 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff81038d2cec00/0xcca1a6f6c61f52a1 lrc: 3/0,0 mode: PW/PW res: 33115354/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->49151) flags: 0x20 remote: 0x382dcc42b8569a90 expref: 5 pid: 31773 timeout 5288672180 Apr 22 08:09:47 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 3 previous similar messages Apr 22 08:09:59 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81072e3c4000 Apr 22 08:10:08 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bcc518000 Apr 22 08:10:08 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a463fc000 Apr 22 08:10:08 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037aca6000 Apr 22 08:10:08 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105a186a000 Apr 22 08:10:13 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101a753c000 Apr 22 08:10:50 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102db78e000 Apr 22 08:11:16 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106cb774000 Apr 22 08:11:48 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a5a2c4000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a37832000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810438ab2000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810438ab2000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106dd75e000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8106dd75e000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81049f020000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81049f020000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105396ec000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105396ec000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 32162:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c1ab02000 x1398900877455213/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335082485 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 32162:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a7998c000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a7998c000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a53c8000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8102a53c8000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a463fc000 Apr 22 08:11:53 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a463fc000 Apr 22 08:12:06 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810029220000 Apr 22 08:12:14 lfs-oss-1-13 kernel: Lustre: 31772:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897318828176 sent from scratch1-OST008a to NID 10.174.0.200@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 08:12:14 lfs-oss-1-13 kernel: req@ffff810b772dbc00 x1398897318828176/t0 o104->@NET_0x500000aae00c8_UUID:15/16 lens 296/384 e 0 to 1 dl 1335082334 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 08:12:14 lfs-oss-1-13 kernel: Lustre: 31772:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 48 previous similar messages Apr 22 08:12:14 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008a: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 08:12:14 lfs-oss-1-13 kernel: LustreError: Skipped 8 previous similar messages Apr 22 08:12:15 lfs-oss-1-13 kernel: LustreError: 32237:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105d8ae4400 x1398900888120579/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082478 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:12:36 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c1b8b0000 x1398900888390146/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082356 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:13:05 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d3f5c000 Apr 22 08:13:05 lfs-oss-1-13 kernel: LustreError: 31876:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107) req@ffff8105d45c2000 x1398900888121479/t0 o400->@:0/0 lens 192/0 e 0 to 0 dl 1335082444 ref 1 fl Interpret:H/0/0 rc -107/0 Apr 22 08:13:05 lfs-oss-1-13 kernel: LustreError: 31876:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 36 previous similar messages Apr 22 08:13:29 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c27667800 x1398897318847808/t0 o105->@NET_0x500000aae0b7d_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:13:29 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.11.125@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0085_UUID lock: ffff8104c8ace600/0xcca1a6f6c637822f lrc: 3/0,0 mode: PW/PW res: 33122256/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x2ba17f9997d5e4ed expref: 13 pid: 31971 timeout 0 Apr 22 08:13:30 lfs-oss-1-13 kernel: LustreError: 32241:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c1476b800 x1398900890037862/t0 o3->0457de7b-7957-e906-444f-42be4e500454@NET_0x500000aae0b7d_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082627 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:13:30 lfs-oss-1-13 kernel: LustreError: 5544:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.11.125@o2ib arrived at 1335082410 with bad export cookie 14745250233372448808 Apr 22 08:13:34 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006cd28000 Apr 22 08:13:47 lfs-oss-1-13 kernel: Lustre: 31750:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from f5113ad1-6459-ed48-61ee-836b1e538fff@10.174.11.127@o2ib to 0xffff810c25044000; still busy with 1 active RPCs Apr 22 08:13:47 lfs-oss-1-13 kernel: Lustre: 31750:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 30 previous similar messages Apr 22 08:14:07 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81012dd39000 Apr 22 08:14:46 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102846ca000 Apr 22 08:14:54 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8109e0fa7c00 x1398900888392769/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082494 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:14:54 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 08:15:11 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033a726000 Apr 22 08:15:11 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810557fdc000 Apr 22 08:15:11 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b665a0000 Apr 22 08:15:11 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bba9a5000 Apr 22 08:15:15 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e9a92000 Apr 22 08:15:19 lfs-oss-1-13 kernel: Lustre: 31730:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008e: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 08:15:19 lfs-oss-1-13 kernel: Lustre: 31711:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008d: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 08:15:19 lfs-oss-1-13 kernel: Lustre: 31711:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 281 previous similar messages Apr 22 08:15:19 lfs-oss-1-13 kernel: Lustre: 31730:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 281 previous similar messages Apr 22 08:16:08 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf73ce800 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c18e5eb80 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81044787a680 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810988ff6000 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81044787a680 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810c18e5eb80 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(5073) req@ffff8105bb7f6800 x1398900888395242/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335082622 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:16:26 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:1073:ost_brw_write()) Skipped 6 previous similar messages Apr 22 08:16:52 lfs-oss-1-13 kernel: Lustre: scratch1-OST008b: haven't heard from client 1662f6a0-94ac-b558-ad6c-555bd1b705c9 (at 10.174.0.200@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 08:16:52 lfs-oss-1-13 kernel: Lustre: Skipped 6 previous similar messages Apr 22 08:17:33 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037aca6000 Apr 22 08:17:34 lfs-oss-1-13 kernel: LustreError: 31993:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105a9bd5c00 x1398901148792406/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082856 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:17:34 lfs-oss-1-13 kernel: LustreError: 31993:0:(ost_handler.c:829:ost_brw_read()) Skipped 28 previous similar messages Apr 22 08:18:00 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bbf934000 Apr 22 08:18:00 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810244617000 Apr 22 08:18:08 lfs-oss-1-13 kernel: LustreError: 32173:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105b7a19400 x1398900888133768/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082905 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:18:08 lfs-oss-1-13 kernel: LustreError: 32109:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105b8baf400 x1398900888133771/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335082905 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:18:46 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c98a6000 Apr 22 08:18:46 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b84b14000 Apr 22 08:18:46 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b84b14000 Apr 22 08:18:46 lfs-oss-1-13 kernel: Lustre: 32236:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 08:18:46 lfs-oss-1-13 kernel: Lustre: 32236:0:(ost_handler.c:1224:ost_brw_write()) Skipped 10 previous similar messages Apr 22 08:19:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 08:19:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 9 previous similar messages Apr 22 08:19:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (40) Apr 22 08:19:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 9 previous similar messages Apr 22 08:19:02 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810229df4000 Apr 22 08:19:11 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810368052000 Apr 22 08:19:11 lfs-oss-1-13 kernel: Lustre: 32173:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008d: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 08:19:11 lfs-oss-1-13 kernel: Lustre: 32173:0:(ost_handler.c:887:ost_brw_read()) Skipped 31 previous similar messages Apr 22 08:19:11 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107132fa000 Apr 22 08:19:11 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037aca6000 Apr 22 08:19:11 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a843d4000 Apr 22 08:19:11 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81048c198000 Apr 22 08:19:11 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.200@o2ib Apr 22 08:19:11 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 08:19:11 lfs-oss-1-13 kernel: LustreError: 31793:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 78 < 79 req@ffff810c34bc4850 x1398900888134977/t0 o400->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 192/0 e 0 to 0 dl 1335082799 ref 1 fl Interpret:H/0/0 rc 0/0 Apr 22 08:19:35 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81069b063000 Apr 22 08:20:10 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107fd362000 Apr 22 08:20:43 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106f118a000 Apr 22 08:21:16 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c164d6b80 Apr 22 08:21:16 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104f5b1a000 Apr 22 08:21:17 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033a726000 Apr 22 08:21:17 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e0e26000 Apr 22 08:21:17 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810695099000 Apr 22 08:22:04 lfs-oss-1-13 kernel: LustreError: 32063:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8106230fa800 x1398900888398929/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083138 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:22:04 lfs-oss-1-13 kernel: LustreError: 32063:0:(ost_handler.c:1064:ost_brw_write()) Skipped 1 previous similar message Apr 22 08:22:11 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81018095a000 Apr 22 08:22:19 lfs-oss-1-13 kernel: Lustre: 31751:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897318984437 sent from scratch1-OST0085 to NID 10.174.0.200@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 08:22:19 lfs-oss-1-13 kernel: req@ffff810c1a9e0400 x1398897318984437/t0 o104->@NET_0x500000aae00c8_UUID:15/16 lens 296/384 e 0 to 1 dl 1335082939 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 08:22:19 lfs-oss-1-13 kernel: Lustre: 31751:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Apr 22 08:22:19 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0085: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 08:22:19 lfs-oss-1-13 kernel: LustreError: Skipped 7 previous similar messages Apr 22 08:22:19 lfs-oss-1-13 kernel: LustreError: 31751:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff8105d69cfe00 ns: filter-scratch1-OST0085_UUID lock: ffff8108a16a3600/0xcca1a6f6c6540d6b lrc: 3/0,0 mode: --/PW res: 33131050/0 rrc: 2 type: EXT [0->1048575] (req 0->1048575) flags: 0x0 remote: 0x6de46e971d9eb85f expref: 27 pid: 31751 timeout 0 Apr 22 08:22:20 lfs-oss-1-13 kernel: LustreError: 32017:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c35bfd050 x1398900888182002/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083157 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:22:20 lfs-oss-1-13 kernel: LustreError: 32017:0:(ost_handler.c:825:ost_brw_read()) Skipped 4 previous similar messages Apr 22 08:22:45 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103cf5c8000 Apr 22 08:22:45 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106d4561c80 Apr 22 08:23:03 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810424282000 Apr 22 08:23:05 lfs-oss-1-13 kernel: LustreError: 31767:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810bf2ec6800 x1398901148798819/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335083085 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 08:23:05 lfs-oss-1-13 kernel: LustreError: 31767:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 35 previous similar messages Apr 22 08:23:41 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090f767000 Apr 22 08:23:56 lfs-oss-1-13 kernel: Lustre: 31878:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008d: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff810c21578e00; still busy with 3 active RPCs Apr 22 08:23:56 lfs-oss-1-13 kernel: Lustre: 31878:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 28 previous similar messages Apr 22 08:24:19 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810581bb2000 Apr 22 08:24:30 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103409fe000 Apr 22 08:25:03 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b1b195000 Apr 22 08:25:19 lfs-oss-1-13 kernel: Lustre: 31755:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008a: 1662f6a0-94ac-b558-ad6c-555bd1b705c9 reconnecting Apr 22 08:25:19 lfs-oss-1-13 kernel: Lustre: 31755:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 179 previous similar messages Apr 22 08:25:45 lfs-oss-1-13 kernel: LustreError: 32230:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8104759b6800 x1398900888401342/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083247 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:26:07 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81013b5d2000 Apr 22 08:26:08 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108130f0800 Apr 22 08:26:49 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81004bcea000 Apr 22 08:27:41 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81044f8e8000 Apr 22 08:27:42 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c225bf000 x1398900888406159/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083311 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:27:42 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:829:ost_brw_read()) Skipped 22 previous similar messages Apr 22 08:27:49 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810277e4b000 Apr 22 08:28:01 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81099c432000 Apr 22 08:28:01 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff8106021e0ac0 Apr 22 08:28:43 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bf3438000 Apr 22 08:28:51 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109dc16a000 Apr 22 08:29:37 lfs-oss-1-13 kernel: LustreError: 32241:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105bd2ae400 x1398900888406972/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083377 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:29:37 lfs-oss-1-13 kernel: Lustre: 32241:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 08:29:37 lfs-oss-1-13 kernel: Lustre: 32241:0:(ost_handler.c:887:ost_brw_read()) Skipped 28 previous similar messages Apr 22 08:29:42 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c3279e000 Apr 22 08:29:42 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106aa88a000 Apr 22 08:29:55 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105f009f800 Apr 22 08:30:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 08:30:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 08:30:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (34) Apr 22 08:30:11 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 08:30:11 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109cc08a000 Apr 22 08:30:57 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff810355f345c0 Apr 22 08:30:57 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a6812f000 Apr 22 08:30:57 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 2, status -103, desc ffff810355f345c0 Apr 22 08:30:57 lfs-oss-1-13 kernel: LustreError: 32081:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(4473) req@ffff810b7b675400 x1398900888409363/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083614 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:30:57 lfs-oss-1-13 kernel: LustreError: 32081:0:(ost_handler.c:1073:ost_brw_write()) Skipped 2 previous similar messages Apr 22 08:30:57 lfs-oss-1-13 kernel: Lustre: 32081:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0089: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 08:30:57 lfs-oss-1-13 kernel: Lustre: 32081:0:(ost_handler.c:1224:ost_brw_write()) Skipped 2 previous similar messages Apr 22 08:30:57 lfs-oss-1-13 kernel: Lustre: 9207:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 08:30:57 lfs-oss-1-13 kernel: Lustre: 9207:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 08:30:57 lfs-oss-1-13 kernel: LustreError: 32213:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff810355f345c0 Apr 22 08:30:58 lfs-oss-1-13 kernel: Lustre: scratch1-OST0088: haven't heard from client 1662f6a0-94ac-b558-ad6c-555bd1b705c9 (at 10.174.0.200@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 08:30:58 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 08:31:03 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108d7abb800 Apr 22 08:31:27 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810be9922000 Apr 22 08:31:27 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b3964000 Apr 22 08:31:43 lfs-oss-1-13 kernel: LustreError: 32014:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c3053c850 x1398900888408967/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083503 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:31:59 lfs-oss-1-13 kernel: LustreError: 32213:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105a546a400 x1398900888409775/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083585 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:32:14 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bfe7ec000 Apr 22 08:32:32 lfs-oss-1-13 kernel: Lustre: Service thread pid 32001 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 08:32:32 lfs-oss-1-13 kernel: Pid: 32001, comm: ll_ost_io_11 Apr 22 08:32:32 lfs-oss-1-13 kernel: Apr 22 08:32:32 lfs-oss-1-13 kernel: Call Trace: Apr 22 08:32:32 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 08:32:32 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 08:32:32 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 08:32:32 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 08:32:32 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 08:32:32 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 08:32:32 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 08:32:32 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 08:32:32 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 08:32:32 lfs-oss-1-13 kernel: Apr 22 08:32:54 lfs-oss-1-13 kernel: Lustre: 32042:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897319143631 sent from scratch1-OST0084 to NID 10.174.0.200@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 08:32:54 lfs-oss-1-13 kernel: req@ffff8105b4381000 x1398897319143631/t0 o104->@NET_0x500000aae00c8_UUID:15/16 lens 296/384 e 0 to 1 dl 1335083574 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 08:32:54 lfs-oss-1-13 kernel: Lustre: 32042:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 100991 previous similar messages Apr 22 08:32:54 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0084: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 08:33:03 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81079f982000 Apr 22 08:33:29 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab9256000 Apr 22 08:33:29 lfs-oss-1-13 kernel: LustreError: 31810:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107) req@ffff810c1922a800 x1398900888295346/t0 o400->@:0/0 lens 192/0 e 0 to 0 dl 1335083649 ref 2 fl Interpret:H/0/0 rc -107/0 Apr 22 08:33:29 lfs-oss-1-13 kernel: LustreError: 31810:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 28 previous similar messages Apr 22 08:33:33 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107a1685000 Apr 22 08:33:33 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810931c9e000 Apr 22 08:33:47 lfs-oss-1-13 kernel: LustreError: 31992:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c352f5850 x1398900890643318/t0 o4->f205ab6a-55c2-7c53-a1d7-c8743cced2cf@NET_0x500000aae0b81_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083823 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:33:47 lfs-oss-1-13 kernel: LustreError: 31992:0:(ost_handler.c:1064:ost_brw_write()) Skipped 1 previous similar message Apr 22 08:33:47 lfs-oss-1-13 kernel: LustreError: 32186:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff8109daa47400 x1398900889069099/t0 o4->80a70f4f-ada3-29a8-d275-fc40c0fcd93c@NET_0x500000aae0b83_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083823 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:33:47 lfs-oss-1-13 kernel: LustreError: 32186:0:(ost_handler.c:1078:ost_brw_write()) Skipped 10 previous similar messages Apr 22 08:33:49 lfs-oss-1-13 kernel: LustreError: 32051:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105d9547c00 x1398900888410999/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083629 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810296fb0000 Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c1b349ac0 Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b19fea400 Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b19fea400 Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 32121:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(221504) req@ffff810c1c21d000 x1398900888295913/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083939 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:34:11 lfs-oss-1-13 kernel: Lustre: 31829:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff810c21e00200; still busy with 1 active RPCs Apr 22 08:34:11 lfs-oss-1-13 kernel: Lustre: 31829:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 35 previous similar messages Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 32207:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c04768800 x1398900888295908/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083985 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:34:11 lfs-oss-1-13 kernel: LustreError: 32207:0:(ost_handler.c:1064:ost_brw_write()) Skipped 2 previous similar messages Apr 22 08:34:18 lfs-oss-1-13 kernel: LustreError: 32046:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c367be050 x1398900888294368/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083658 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:35:09 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff810c138e73c0 Apr 22 08:35:09 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810542d2d000 Apr 22 08:35:14 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0086_UUID lock: ffff810be38cf400/0xcca1a6f6c672defe lrc: 3/0,0 mode: PW/PW res: 33142792/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20 remote: 0x382dcc42b8809702 expref: 5 pid: 31879 timeout 5290199158 Apr 22 08:35:14 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 3 previous similar messages Apr 22 08:35:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e4690000 Apr 22 08:35:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103d2dee000 Apr 22 08:35:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103d2dee000 Apr 22 08:35:15 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e6e1c000 Apr 22 08:35:15 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103e6e1c000 Apr 22 08:35:15 lfs-oss-1-13 kernel: LustreError: 32019:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810af8d30400 x1398900888297197/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335083982 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:35:19 lfs-oss-1-13 kernel: Lustre: 31837:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 1662f6a0-94ac-b558-ad6c-555bd1b705c9 reconnecting Apr 22 08:35:19 lfs-oss-1-13 kernel: Lustre: 31837:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 253 previous similar messages Apr 22 08:35:36 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810428de5000 Apr 22 08:35:55 lfs-oss-1-13 kernel: LustreError: 31992:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810606b1cc00 x1398900888413017/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083755 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:35:56 lfs-oss-1-13 kernel: LustreError: 32218:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c16bf9800 x1398900888413413/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335084026 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:36:06 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b74772000 Apr 22 08:36:06 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105df212b80 Apr 22 08:36:06 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c177b4c00 Apr 22 08:36:06 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810c177b4c00 Apr 22 08:36:06 lfs-oss-1-13 kernel: LustreError: 32015:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105b1957c00 x1398900888298691/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335084078 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:37:04 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105cb448000 Apr 22 08:37:04 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102edb20000 Apr 22 08:37:16 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c4337000 Apr 22 08:37:47 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.0.200@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff810b52449c00/0xcca1a6f6c67dd9bd lrc: 3/0,0 mode: PR/PR res: 33140763/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x6de46e971dacbc64 expref: 10 pid: 31716 timeout 5290352159 Apr 22 08:37:47 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c28baf800 x1398897319306065/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:37:47 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008d_UUID lock: ffff81080e376c00/0xcca1a6f6c68bac67 lrc: 3/0,0 mode: PW/PW res: 33140763/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x6de46e971dad2aff expref: 7 pid: 31865 timeout 0 Apr 22 08:37:47 lfs-oss-1-13 kernel: LustreError: 32016:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c040ce800 x1398900888300903/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084150 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:37:58 lfs-oss-1-13 kernel: LustreError: 32246:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c177b4c00 x1398900888416273/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084377 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:37:58 lfs-oss-1-13 kernel: LustreError: 32246:0:(ost_handler.c:829:ost_brw_read()) Skipped 23 previous similar messages Apr 22 08:38:07 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810244b2e000 Apr 22 08:38:25 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810268999000 Apr 22 08:38:32 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810717396000 Apr 22 08:38:58 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810734d5e000 Apr 22 08:39:11 lfs-oss-1-13 kernel: LustreError: 32035:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810b6c690800 x1398900877501491/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335083951 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:39:28 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810afd361000 Apr 22 08:39:29 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c2809b400 x1398897319308433/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:39:29 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff81054fc95600/0xcca1a6f6c6922163 lrc: 3/0,0 mode: PW/PW res: 33146965/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x6de46e971dc76a1f expref: 22 pid: 31873 timeout 0 Apr 22 08:39:30 lfs-oss-1-13 kernel: LustreError: 32002:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8107b648e400 x1398900888401814/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084062 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:39:40 lfs-oss-1-13 kernel: Lustre: 32120:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 08:39:40 lfs-oss-1-13 kernel: Lustre: 32120:0:(ost_handler.c:887:ost_brw_read()) Skipped 27 previous similar messages Apr 22 08:40:13 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104e9e8a800 Apr 22 08:40:13 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c01a2c5c0 Apr 22 08:40:23 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810938f4a000 Apr 22 08:40:43 lfs-oss-1-13 kernel: LustreError: 31781:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff8105bb7e1e00 ns: filter-scratch1-OST008d_UUID lock: ffff81078c829800/0xcca1a6f6c6971726 lrc: 3/0,0 mode: --/PW res: 33150227/0 rrc: 3 type: EXT [0->1007615] (req 0->1007615) flags: 0x0 remote: 0x382dcc42b8844706 expref: 13 pid: 31781 timeout 0 Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 7 previous similar messages Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (37) Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 7 previous similar messages Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81082c497000 Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81031ce9bcc0 Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 5541:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335084044 with bad export cookie 14745250233805115672 Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 5541:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 32052:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810837fd9400 x1398900888418720/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335084720 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:40:44 lfs-oss-1-13 kernel: LustreError: 32052:0:(ost_handler.c:1064:ost_brw_write()) Skipped 1 previous similar message Apr 22 08:41:05 lfs-oss-1-13 kernel: LustreError: 31908:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff810342aa2c00 ns: filter-scratch1-OST0085_UUID lock: ffff810a43b94200/0xcca1a6f6c699f688 lrc: 3/0,0 mode: --/PW res: 33149898/0 rrc: 2 type: EXT [0->4095] (req 0->4095) flags: 0x0 remote: 0x6de46e971ddd66f8 expref: 16 pid: 31908 timeout 0 Apr 22 08:41:06 lfs-oss-1-13 kernel: LustreError: 32076:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105a8e86000 x1398900888505203/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084158 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:41:41 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a760d8000 Apr 22 08:41:41 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 08:41:41 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 20 previous similar messages Apr 22 08:41:54 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81009757a000 Apr 22 08:42:20 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810431856000 Apr 22 08:42:37 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.0.200@o2ib ns: filter-scratch1-OST008d_UUID lock: ffff810560dbc600/0xcca1a6f6c691b220 lrc: 3/0,0 mode: PW/PW res: 33131475/0 rrc: 2 type: EXT [0->18446744073709551615] (req 2097152->2273279) flags: 0x20 remote: 0x6de46e971dad35f6 expref: 7 pid: 31916 timeout 5290643415 Apr 22 08:42:44 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810127f1b000 Apr 22 08:43:04 lfs-oss-1-13 kernel: Lustre: scratch1-OST008e: haven't heard from client 1662f6a0-94ac-b558-ad6c-555bd1b705c9 (at 10.174.0.200@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 08:43:04 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 08:43:35 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b9cdf4000 Apr 22 08:44:20 lfs-oss-1-13 kernel: Lustre: 31765:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008a: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff810c21ab5200; still busy with 1 active RPCs Apr 22 08:44:20 lfs-oss-1-13 kernel: Lustre: 31765:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 23 previous similar messages Apr 22 08:44:20 lfs-oss-1-13 kernel: LustreError: 31765:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c183c5000 x1398900877512108/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335084360 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 08:44:20 lfs-oss-1-13 kernel: LustreError: 31765:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 42 previous similar messages Apr 22 08:44:25 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102fa199000 Apr 22 08:44:58 lfs-oss-1-13 kernel: Lustre: 31843:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897319455163 sent from scratch1-OST0089 to NID 10.174.0.200@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 08:44:58 lfs-oss-1-13 kernel: req@ffff8108b05c9000 x1398897319455163/t0 o104->@NET_0x500000aae00c8_UUID:15/16 lens 296/384 e 0 to 1 dl 1335084298 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 08:44:58 lfs-oss-1-13 kernel: Lustre: 31843:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Apr 22 08:44:58 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 08:44:58 lfs-oss-1-13 kernel: LustreError: Skipped 18 previous similar messages Apr 22 08:44:58 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c28baf800 x1398897319455365/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:44:58 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0089_UUID lock: ffff8106f4fcd200/0xcca1a6f6c6aec4f5 lrc: 3/0,0 mode: PW/PW res: 33155294/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x6de46e971ee6477a expref: 9 pid: 31843 timeout 0 Apr 22 08:45:07 lfs-oss-1-13 kernel: LustreError: 32216:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105d25dd400 x1398900877515565/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084399 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:45:38 lfs-oss-1-13 kernel: Lustre: 31916:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008a: df2e0f6d-50a6-f345-0e51-0137be7a5fd1 reconnecting Apr 22 08:45:38 lfs-oss-1-13 kernel: Lustre: 31916:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 166 previous similar messages Apr 22 08:45:38 lfs-oss-1-13 kernel: LustreError: 5540:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335084338 with bad export cookie 14745250233803312402 Apr 22 08:45:41 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a038a1000 Apr 22 08:45:59 lfs-oss-1-13 kernel: LustreError: 11774:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.200@o2ib arrived at 1335084359 with bad export cookie 14745250233809538734 Apr 22 08:46:06 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8107cf831400 x1398897319455631/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:46:06 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff810937df1800/0xcca1a6f6c6aedd7c lrc: 3/0,0 mode: PW/PW res: 33155829/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x0 remote: 0x6de46e971ee7f97a expref: 7 pid: 31725 timeout 0 Apr 22 08:46:45 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810717d64000 Apr 22 08:47:15 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810244b2e000 Apr 22 08:47:18 lfs-oss-1-13 kernel: LustreError: 32046:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c35fcc450 x1398900877519117/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084438 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:47:18 lfs-oss-1-13 kernel: LustreError: 32046:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 08:47:28 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a6e96000 Apr 22 08:47:35 lfs-oss-1-13 kernel: Lustre: 32212:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-503), not sending early reply Apr 22 08:47:35 lfs-oss-1-13 kernel: req@ffff8108b5805800 x1398901148804829/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 2 to 0 dl 1335084460 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 08:47:40 lfs-oss-1-13 kernel: Lustre: Service thread pid 32001 completed after 1108.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 08:48:37 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810204536000 Apr 22 08:48:40 lfs-oss-1-13 kernel: LustreError: 32065:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105a2179c00 x1398900888426377/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335084990 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:48:40 lfs-oss-1-13 kernel: LustreError: 32065:0:(ost_handler.c:829:ost_brw_read()) Skipped 13 previous similar messages Apr 22 08:49:08 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105200a4000 Apr 22 08:49:08 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105b1cb6000 Apr 22 08:49:11 lfs-oss-1-13 kernel: LustreError: 31969:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff810b700d8000 ns: filter-scratch1-OST008b_UUID lock: ffff810a77703000/0xcca1a6f6c6bcfc93 lrc: 3/0,0 mode: --/PW res: 33159813/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x382dcc42b897bbd3 expref: 28 pid: 31969 timeout 0 Apr 22 08:49:11 lfs-oss-1-13 kernel: LustreError: 32217:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff810c352f3c50 x1398900877530753/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335085262 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:49:11 lfs-oss-1-13 kernel: LustreError: 32042:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff810c358c1c50 x1398900877530745/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335085261 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:49:12 lfs-oss-1-13 kernel: LustreError: 32179:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff810c344b6c50 x1398900877530741/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335085261 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 08:49:12 lfs-oss-1-13 kernel: LustreError: 32179:0:(ost_handler.c:1060:ost_brw_write()) Skipped 2 previous similar messages Apr 22 08:49:15 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c1529b400 x1398897319532565/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 08:49:15 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff8103bbac8200/0xcca1a6f6c6bd0473 lrc: 3/0,0 mode: PW/PW res: 33157054/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b897ead4 expref: 9 pid: 31750 timeout 0 Apr 22 08:49:17 lfs-oss-1-13 kernel: Lustre: 32028:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008b: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 08:49:17 lfs-oss-1-13 kernel: Lustre: 32028:0:(ost_handler.c:1224:ost_brw_write()) Skipped 15 previous similar messages Apr 22 08:49:17 lfs-oss-1-13 kernel: LustreError: 784:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335084557 with bad export cookie 14745250233802476462 Apr 22 08:49:53 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81023f0ee000 Apr 22 08:50:30 lfs-oss-1-13 kernel: Lustre: 32005:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 08:50:30 lfs-oss-1-13 kernel: Lustre: 32005:0:(ost_handler.c:887:ost_brw_read()) Skipped 23 previous similar messages Apr 22 08:50:48 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.0.200@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff810bb9fbd800/0xcca1a6f6c6b697dc lrc: 3/0,0 mode: PR/PR res: 33161097/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x6de46e971ef09ebc expref: 53 pid: 31914 timeout 5291134213 Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.200@o2ib (21) Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81034328c000 Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109c8732000 Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8109c8732000 Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 32184:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(992160) req@ffff8105dba6e400 x1398900888703682/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335085323 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:51:02 lfs-oss-1-13 kernel: LustreError: 32184:0:(ost_handler.c:1073:ost_brw_write()) Skipped 2 previous similar messages Apr 22 08:51:39 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bae00a000 Apr 22 08:51:40 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a7760000 Apr 22 08:52:42 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103f889c000 Apr 22 08:52:50 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104fa694000 Apr 22 08:52:50 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 08:52:50 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 14 previous similar messages Apr 22 08:52:58 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b37d2e000 Apr 22 08:52:58 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b1360000 Apr 22 08:52:58 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b37a4000 Apr 22 08:53:00 lfs-oss-1-13 kernel: LustreError: 32490:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.200@o2ib arrived at 1335084780 with bad export cookie 14745250233811878421 Apr 22 08:53:00 lfs-oss-1-13 kernel: LustreError: 32490:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 08:53:58 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103fd792000 Apr 22 08:54:44 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016d668000 Apr 22 08:54:44 lfs-oss-1-13 kernel: Lustre: 31890:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from c49d8140-06a7-779c-f541-694bd8aab9b4@10.174.0.204@o2ib to 0xffff8105eb2f0200; still busy with 1 active RPCs Apr 22 08:54:44 lfs-oss-1-13 kernel: Lustre: 31890:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 22 previous similar messages Apr 22 08:54:44 lfs-oss-1-13 kernel: LustreError: 31890:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810adf641400 x1398900888432796/t0 o8->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 368/264 e 0 to 0 dl 1335084984 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 08:54:44 lfs-oss-1-13 kernel: LustreError: 31890:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 43 previous similar messages Apr 22 08:55:01 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027a894000 Apr 22 08:55:27 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 198 seconds. I think it's dead, and I am evicting it. Apr 22 08:55:27 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 08:55:46 lfs-oss-1-13 kernel: Lustre: 31873:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008b: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 08:55:46 lfs-oss-1-13 kernel: Lustre: 31922:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008e: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 08:55:46 lfs-oss-1-13 kernel: Lustre: 31922:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 205 previous similar messages Apr 22 08:55:46 lfs-oss-1-13 kernel: Lustre: 31873:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 205 previous similar messages Apr 22 08:56:04 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101ad32a000 Apr 22 08:56:05 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810320108000 Apr 22 08:56:18 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aa62d8000 Apr 22 08:57:00 lfs-oss-1-13 kernel: Lustre: 31847:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897319726503 sent from scratch1-OST008e to NID 10.174.14.43@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 08:57:00 lfs-oss-1-13 kernel: req@ffff8105bd321400 x1398897319726503/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335085020 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 08:57:00 lfs-oss-1-13 kernel: Lustre: 31847:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 15 previous similar messages Apr 22 08:57:03 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b70c90000 Apr 22 08:57:21 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81009a65c000 Apr 22 08:57:57 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810253874000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103ad676000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100224a0000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81028ba8e000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810afd87a000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103bfedc000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109033d4000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae992c000 Apr 22 08:58:07 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810196d1c000 Apr 22 08:58:37 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81047bba9000 Apr 22 08:58:47 lfs-oss-1-13 kernel: LustreError: 32186:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810aff486000 x1398900888436005/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335085589 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:58:47 lfs-oss-1-13 kernel: LustreError: 32186:0:(ost_handler.c:829:ost_brw_read()) Skipped 23 previous similar messages Apr 22 08:58:47 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085ec84000 Apr 22 08:58:49 lfs-oss-1-13 kernel: LustreError: 32168:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c217ba400 x1398900877625979/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 464/400 e 0 to 0 dl 1335085129 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 08:58:49 lfs-oss-1-13 kernel: LustreError: 32168:0:(ost_handler.c:822:ost_brw_read()) Skipped 2 previous similar messages Apr 22 08:59:51 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092cd78000 Apr 22 08:59:59 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102dd175000 Apr 22 09:00:29 lfs-oss-1-13 kernel: Lustre: Service thread pid 32070 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 09:00:29 lfs-oss-1-13 kernel: Pid: 32070, comm: ll_ost_io_78 Apr 22 09:00:29 lfs-oss-1-13 kernel: Apr 22 09:00:29 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:00:29 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:00:29 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:00:29 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:00:29 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:00:29 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:00:29 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:00:29 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:00:29 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:00:29 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:00:29 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Lustre: Service thread pid 32016 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 09:00:33 lfs-oss-1-13 kernel: Pid: 32016, comm: ll_ost_io_25 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:00:33 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:00:33 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:00:33 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Pid: 32107, comm: ll_ost_io_114 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:00:33 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:00:33 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:00:33 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Pid: 32089, comm: ll_ost_io_96 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:00:33 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:00:33 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:00:33 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Pid: 32137, comm: ll_ost_io_144 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:00:33 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] lock_timer_base+0x1b/0x3c Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __mod_timer+0x100/0x10f Apr 22 09:00:33 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:00:33 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:00:33 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:00:33 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:00:33 lfs-oss-1-13 kernel: Apr 22 09:00:33 lfs-oss-1-13 kernel: Lustre: Service thread pid 32138 was inactive for 200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 22 09:00:34 lfs-oss-1-13 kernel: Lustre: 32025:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008c: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 09:00:34 lfs-oss-1-13 kernel: Lustre: 32025:0:(ost_handler.c:887:ost_brw_read()) Skipped 22 previous similar messages Apr 22 09:00:45 lfs-oss-1-13 kernel: Lustre: Service thread pid 32070 completed after 216.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 09:01:14 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810604ad6000 Apr 22 09:01:14 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d5d3e000 Apr 22 09:01:14 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101df2d0000 Apr 22 09:01:57 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104c82ec000 Apr 22 09:01:59 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 5 seconds Apr 22 09:01:59 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 09:01:59 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (36) Apr 22 09:01:59 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 09:01:59 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aea7ec000 Apr 22 09:02:22 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c182da980 Apr 22 09:02:22 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075bf34000 Apr 22 09:02:22 lfs-oss-1-13 kernel: Lustre: Service thread pid 32107 completed after 309.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 09:02:22 lfs-oss-1-13 kernel: LustreError: 32100:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105d8889c00 x1398900888934407/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335086050 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:02:22 lfs-oss-1-13 kernel: Lustre: 32100:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008c: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 09:02:22 lfs-oss-1-13 kernel: Lustre: 32100:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 09:02:32 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810151b10000 Apr 22 09:03:13 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a7c48000 Apr 22 09:03:13 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 09:03:13 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 22 09:03:35 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 09:03:35 lfs-oss-1-13 kernel: LustreError: Skipped 8 previous similar messages Apr 22 09:03:40 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107fd67d000 Apr 22 09:04:00 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108024fa000 Apr 22 09:04:00 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81028ba8e000 Apr 22 09:04:00 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81057d0fc000 Apr 22 09:04:00 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81035baee000 Apr 22 09:04:51 lfs-oss-1-13 kernel: Lustre: 31979:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0085: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff8105e473c200; still busy with 1 active RPCs Apr 22 09:04:51 lfs-oss-1-13 kernel: Lustre: 31979:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 32 previous similar messages Apr 22 09:04:51 lfs-oss-1-13 kernel: LustreError: 31979:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8107d52b9800 x1398900877634660/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335085591 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 09:04:51 lfs-oss-1-13 kernel: LustreError: 31979:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 33 previous similar messages Apr 22 09:04:51 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81080c342000 Apr 22 09:04:51 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106bb876000 Apr 22 09:04:56 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810565105000 Apr 22 09:05:19 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810918834000 Apr 22 09:05:42 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105a2608000 Apr 22 09:05:47 lfs-oss-1-13 kernel: Lustre: 31721:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008d: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 09:05:47 lfs-oss-1-13 kernel: Lustre: 31721:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 226 previous similar messages Apr 22 09:06:31 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810967f4a000 Apr 22 09:06:44 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ad33bc000 Apr 22 09:06:44 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810afd87a000 Apr 22 09:06:44 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106f0272000 Apr 22 09:07:11 lfs-oss-1-13 kernel: Lustre: scratch1-OST008c: haven't heard from client 1662f6a0-94ac-b558-ad6c-555bd1b705c9 (at 10.174.0.200@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 09:07:11 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 09:07:24 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027e0f4000 Apr 22 09:07:26 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a57f78000 Apr 22 09:08:50 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108c93da000 Apr 22 09:08:50 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810375a7a000 Apr 22 09:08:50 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81076eb72000 Apr 22 09:08:50 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a116ae000 Apr 22 09:08:50 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b975e2000 Apr 22 09:08:50 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810397f04000 Apr 22 09:09:07 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810838fc0000 Apr 22 09:09:07 lfs-oss-1-13 kernel: LustreError: 32242:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8106db8f6800 x1398901148842874/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086410 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:09:07 lfs-oss-1-13 kernel: LustreError: 32242:0:(ost_handler.c:829:ost_brw_read()) Skipped 34 previous similar messages Apr 22 09:09:28 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81080b7ea000 Apr 22 09:09:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9c66a000 Apr 22 09:09:28 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81078bf9e000 Apr 22 09:09:33 lfs-oss-1-13 kernel: LustreError: 32107:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8106d61ae400 x1398900877642455/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335085773 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:09:33 lfs-oss-1-13 kernel: LustreError: 32107:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 09:09:40 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81019e573000 Apr 22 09:10:23 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4d394000 Apr 22 09:10:45 lfs-oss-1-13 kernel: Lustre: 31929:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897320005171 sent from scratch1-OST0089 to NID 10.174.0.200@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 09:10:45 lfs-oss-1-13 kernel: req@ffff8105a9705400 x1398897320005171/t0 o106->@NET_0x500000aae00c8_UUID:15/16 lens 296/424 e 0 to 1 dl 1335085845 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 09:10:45 lfs-oss-1-13 kernel: Lustre: 31929:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 22 09:10:46 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810674688000 Apr 22 09:10:46 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a4d394000 Apr 22 09:11:04 lfs-oss-1-13 kernel: Lustre: 32116:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 09:11:04 lfs-oss-1-13 kernel: Lustre: 32116:0:(ost_handler.c:887:ost_brw_read()) Skipped 42 previous similar messages Apr 22 09:11:38 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a232e2000 Apr 22 09:11:59 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108d417bcc0 Apr 22 09:11:59 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102f6007000 Apr 22 09:11:59 lfs-oss-1-13 kernel: LustreError: 32123:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c358c6850 x1398900888449715/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335085988 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:12:00 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106f06b4000 Apr 22 09:12:56 lfs-oss-1-13 kernel: LustreError: 32077:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c367b7850 x1398900889011721/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086148 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:13:02 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c300a000 Apr 22 09:13:03 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ef3f6000 Apr 22 09:13:03 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810abe6de000 Apr 22 09:13:03 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a1a668000 Apr 22 09:13:03 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b70458000 Apr 22 09:13:49 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105484ca000 Apr 22 09:13:52 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.0.200@o2ib was evicted due to a lock completion callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 09:13:52 lfs-oss-1-13 kernel: LustreError: Skipped 2 previous similar messages Apr 22 09:14:47 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108b4fd8000 Apr 22 09:14:47 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 09:14:47 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 16 previous similar messages Apr 22 09:14:51 lfs-oss-1-13 kernel: LustreError: 32063:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-107) req@ffff810c28cb4800 x1398900890026107/t0 o4->@:0/0 lens 448/0 e 0 to 0 dl 1335086259 ref 1 fl Interpret:/0/0 rc -107/0 Apr 22 09:14:51 lfs-oss-1-13 kernel: LustreError: 32063:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Apr 22 09:14:51 lfs-oss-1-13 kernel: LustreError: 11766:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.200@o2ib arrived at 1335086091 with bad export cookie 14745250233809720503 Apr 22 09:14:51 lfs-oss-1-13 kernel: LustreError: 11766:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 09:15:07 lfs-oss-1-13 kernel: Lustre: 31919:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff810c27154a00; still busy with 4 active RPCs Apr 22 09:15:07 lfs-oss-1-13 kernel: Lustre: 31919:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 20 previous similar messages Apr 22 09:15:16 lfs-oss-1-13 kernel: LustreError: 11782:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.200@o2ib arrived at 1335086116 with bad export cookie 14745250233809574252 Apr 22 09:15:16 lfs-oss-1-13 kernel: LustreError: 11782:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 09:15:27 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 09:15:27 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 5 previous similar messages Apr 22 09:15:27 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (45) Apr 22 09:15:27 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 5 previous similar messages Apr 22 09:15:27 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810997df3000 Apr 22 09:15:55 lfs-oss-1-13 kernel: Lustre: 31844:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 09:15:55 lfs-oss-1-13 kernel: Lustre: 31844:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 180 previous similar messages Apr 22 09:16:11 lfs-oss-1-13 kernel: LustreError: 32064:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c15080000 x1398900890332436/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086211 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:16:24 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e3e57000 Apr 22 09:16:41 lfs-oss-1-13 kernel: LustreError: 23516:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.0.200@o2ib arrived at 1335086201 with bad export cookie 14745250233821217072 Apr 22 09:16:41 lfs-oss-1-13 kernel: LustreError: 23516:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 09:17:15 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065b7ac000 Apr 22 09:17:15 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae5346000 Apr 22 09:17:15 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108fac38000 Apr 22 09:17:15 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab2112000 Apr 22 09:17:15 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a765d2000 Apr 22 09:17:27 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e3e57000 Apr 22 09:17:33 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104a487f0c0 Apr 22 09:17:33 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81044d2be000 Apr 22 09:17:33 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8104a487f0c0 Apr 22 09:17:33 lfs-oss-1-13 kernel: LustreError: 32214:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(29984) req@ffff810605126000 x1398900890347549/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 1 to 0 dl 1335086317 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:17:33 lfs-oss-1-13 kernel: Lustre: 32214:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008b: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 09:17:33 lfs-oss-1-13 kernel: Lustre: 32214:0:(ost_handler.c:1224:ost_brw_write()) Skipped 1 previous similar message Apr 22 09:17:53 lfs-oss-1-13 kernel: LustreError: 32244:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff81091d9c1c00 x1399132033012293/t0 o4->d7c08517-323f-702d-3c66-1e59ae4e0e78@NET_0x500000aae0040_UUID:0/0 lens 448/416 e 0 to 0 dl 1335086369 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:18:20 lfs-oss-1-13 kernel: LustreError: 31932:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff8105e473c200 ns: filter-scratch1-OST0085_UUID lock: ffff810693724e00/0xcca1a6f6c73163c4 lrc: 1/0,0 mode: --/PW res: 33196257/0 rrc: 1 type: EXT [0->4095] (req 0->4095) flags: 0x20000080 remote: 0x382dcc42b8afd7f7 expref: 9 pid: 31932 timeout 0 Apr 22 09:18:21 lfs-oss-1-13 kernel: LustreError: 32222:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c28f39400 x1398900877660148/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 464/400 e 0 to 0 dl 1335087048 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:18:22 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090c678000 Apr 22 09:18:30 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81001f06b000 Apr 22 09:18:34 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810387a72000 Apr 22 09:19:26 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e78c4000 Apr 22 09:19:27 lfs-oss-1-13 kernel: LustreError: 32044:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c35a80450 x1398900890353254/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086409 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:19:27 lfs-oss-1-13 kernel: LustreError: 32044:0:(ost_handler.c:829:ost_brw_read()) Skipped 26 previous similar messages Apr 22 09:19:59 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029e8ea000 Apr 22 09:20:17 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client c49d8140-06a7-779c-f541-694bd8aab9b4 (at 10.174.0.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 09:20:17 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 09:20:37 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105a9c29000 x1398897320221460/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:20:37 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008b_UUID lock: ffff810a05b96800/0xcca1a6f6c7435aaf lrc: 3/0,0 mode: PW/PW res: 33204643/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x6de46e9720898c72 expref: 11 pid: 31875 timeout 0 Apr 22 09:20:37 lfs-oss-1-13 kernel: LustreError: 31990:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff8105c5665400 x1398900890358925/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335086522 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:20:38 lfs-oss-1-13 kernel: LustreError: 32148:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c17a9ac00 x1398900890358924/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086522 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:20:55 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd19ba000 Apr 22 09:21:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b3ecfa000 Apr 22 09:21:15 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810615a80000 Apr 22 09:21:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810840db6000 Apr 22 09:21:15 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104cd75e000 Apr 22 09:21:15 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae5346000 Apr 22 09:21:20 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81069cbea000 Apr 22 09:21:20 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81038b51e880 Apr 22 09:21:20 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81038b51e880 Apr 22 09:21:20 lfs-oss-1-13 kernel: Lustre: 32148:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008b: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 09:21:20 lfs-oss-1-13 kernel: Lustre: 32148:0:(ost_handler.c:887:ost_brw_read()) Skipped 32 previous similar messages Apr 22 09:21:57 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b2f22000 Apr 22 09:22:00 lfs-oss-1-13 kernel: LustreError: 32039:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105e7da8400 x1398900877719867/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086520 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:22:00 lfs-oss-1-13 kernel: LustreError: 32039:0:(ost_handler.c:822:ost_brw_read()) Skipped 6 previous similar messages Apr 22 09:22:36 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b87bdd000 Apr 22 09:22:59 lfs-oss-1-13 kernel: Lustre: 31783:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897320236859 sent from scratch1-OST0088 to NID 10.174.0.204@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 09:22:59 lfs-oss-1-13 kernel: req@ffff810c18014400 x1398897320236859/t0 o106->@NET_0x500000aae00cc_UUID:15/16 lens 296/424 e 0 to 1 dl 1335086579 ref 2 fl Rpc:/0/0 rc 0/0 Apr 22 09:22:59 lfs-oss-1-13 kernel: Lustre: 31783:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 33 previous similar messages Apr 22 09:23:00 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b2f22000 Apr 22 09:23:39 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c6441000 Apr 22 09:24:03 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81069cbea000 Apr 22 09:24:25 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810551b88000 Apr 22 09:24:25 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810418e1e000 Apr 22 09:24:25 lfs-oss-1-13 kernel: LustreError: 32243:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff81083898a000 Apr 22 09:24:25 lfs-oss-1-13 kernel: LustreError: 32230:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff810a765d2000 Apr 22 09:24:25 lfs-oss-1-13 kernel: LustreError: 6742:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0089_UUID lock: ffff8106be4d1800/0xcca1a6f6c74e88cf lrc: 3/0,0 mode: PW/PW res: 33059144/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x6de46e9720899a8e expref: 5 pid: 31910 timeout 0 Apr 22 09:24:49 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103b8e8f000 Apr 22 09:24:49 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 09:24:49 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 22 09:25:06 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810193ff0000 Apr 22 09:25:06 lfs-oss-1-13 kernel: LustreError: 31896:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c21115000 x1398901148859335/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335086806 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 09:25:06 lfs-oss-1-13 kernel: LustreError: 31896:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 34 previous similar messages Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810345594000 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810387a72000 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b7d216000 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff810ab2112000 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 2, status -103, desc ffff810ab2112000 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 32147:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff810c11875400 x1398900877774460/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335086728 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027fda2000 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b0cdea800 Apr 22 09:25:15 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b0cdea800 Apr 22 09:25:55 lfs-oss-1-13 kernel: Lustre: 31857:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008d: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 09:25:55 lfs-oss-1-13 kernel: Lustre: 31857:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 226 previous similar messages Apr 22 09:25:58 lfs-oss-1-13 kernel: Lustre: Service thread pid 32134 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 09:25:58 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 09:25:58 lfs-oss-1-13 kernel: Pid: 32134, comm: ll_ost_io_141 Apr 22 09:25:58 lfs-oss-1-13 kernel: Apr 22 09:25:58 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:25:58 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:25:58 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:25:58 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:25:58 lfs-oss-1-13 kernel: Apr 22 09:25:58 lfs-oss-1-13 kernel: Pid: 32209, comm: ll_ost_io_216 Apr 22 09:25:58 lfs-oss-1-13 kernel: Apr 22 09:25:58 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:25:58 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:25:58 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:25:58 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:25:58 lfs-oss-1-13 kernel: Apr 22 09:25:58 lfs-oss-1-13 kernel: Pid: 32081, comm: ll_ost_io_89 Apr 22 09:25:58 lfs-oss-1-13 kernel: Apr 22 09:25:58 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:25:58 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:25:58 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:25:58 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:25:58 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:25:58 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:25:58 lfs-oss-1-13 kernel: Apr 22 09:26:09 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81014f520000 Apr 22 09:26:12 lfs-oss-1-13 kernel: Lustre: 31810:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff8105b6363e00; still busy with 5 active RPCs Apr 22 09:26:12 lfs-oss-1-13 kernel: Lustre: 31810:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 27 previous similar messages Apr 22 09:26:12 lfs-oss-1-13 kernel: Lustre: Service thread pid 32134 completed after 214.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 09:26:12 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 09:27:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32195 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 09:27:11 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 09:27:11 lfs-oss-1-13 kernel: Pid: 32195, comm: ll_ost_io_202 Apr 22 09:27:11 lfs-oss-1-13 kernel: Apr 22 09:27:11 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:27:11 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:27:11 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:27:11 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:27:11 lfs-oss-1-13 kernel: Apr 22 09:27:11 lfs-oss-1-13 kernel: Pid: 32069, comm: ll_ost_io_77 Apr 22 09:27:11 lfs-oss-1-13 kernel: Apr 22 09:27:11 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:27:11 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] lock_timer_base+0x1b/0x3c Apr 22 09:27:11 lfs-oss-1-13 kernel: [] __mod_timer+0x100/0x10f Apr 22 09:27:11 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:27:11 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:27:11 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:27:11 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:27:11 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:27:11 lfs-oss-1-13 kernel: Apr 22 09:27:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32174 was inactive for 200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 22 09:27:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32218 was inactive for 200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 22 09:27:18 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008b: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 09:27:18 lfs-oss-1-13 kernel: LustreError: Skipped 9 previous similar messages Apr 22 09:27:18 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105a377a400 x1398897320376859/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:27:18 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008b_UUID lock: ffff81078d0a9200/0xcca1a6f6c7625570 lrc: 3/0,0 mode: PW/PW res: 33211225/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x6de46e9720e5aa37 expref: 31 pid: 31761 timeout 0 Apr 22 09:27:18 lfs-oss-1-13 kernel: LustreError: 32148:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105cc2ba800 x1398900890374916/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335086923 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:27:24 lfs-oss-1-13 kernel: Lustre: Service thread pid 32195 completed after 213.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 09:27:24 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 09:27:38 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 09:27:38 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 9 previous similar messages Apr 22 09:27:38 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (37) Apr 22 09:27:38 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 9 previous similar messages Apr 22 09:27:38 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81065b7ac000 Apr 22 09:27:52 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810046d4d000 Apr 22 09:27:55 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810804b7c000 Apr 22 09:27:55 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015cd0c000 Apr 22 09:27:55 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e19ec000 Apr 22 09:27:55 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810268ec2000 Apr 22 09:27:55 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81006d45c000 Apr 22 09:27:59 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a697d0000 Apr 22 09:27:59 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81069cbea000 Apr 22 09:27:59 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106b0bf4000 Apr 22 09:27:59 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100b2142000 Apr 22 09:27:59 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109992e6000 Apr 22 09:27:59 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81087ae0a000 Apr 22 09:29:19 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107054a6000 Apr 22 09:29:25 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105e098f000 x1398897320388374/t0 o105->@NET_0x500000aae00c8_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:29:25 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.0.200@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0085_UUID lock: ffff810a0d9ad600/0xcca1a6f6c7654a50 lrc: 3/0,0 mode: PW/PW res: 33214769/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x0 remote: 0x6de46e97219f13c4 expref: 6 pid: 31795 timeout 0 Apr 22 09:30:06 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105e07f4400 x1398897320388452/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:30:11 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810487714000 Apr 22 09:30:11 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81048f739bc0 Apr 22 09:30:11 lfs-oss-1-13 kernel: LustreError: 32126:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c18374000 x1398900888467867/t0 o4->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/416 e 0 to 0 dl 1335087268 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:30:11 lfs-oss-1-13 kernel: LustreError: 32126:0:(ost_handler.c:1064:ost_brw_write()) Skipped 1 previous similar message Apr 22 09:30:11 lfs-oss-1-13 kernel: Lustre: 32126:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008c: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 09:30:11 lfs-oss-1-13 kernel: Lustre: 32126:0:(ost_handler.c:1224:ost_brw_write()) Skipped 5 previous similar messages Apr 22 09:30:23 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103730ca000 Apr 22 09:30:23 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810346fbc000 Apr 22 09:30:23 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100359a4000 Apr 22 09:30:43 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810149db0000 Apr 22 09:30:44 lfs-oss-1-13 kernel: LustreError: 32125:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105a8cc4c00 x1398901148863750/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335087701 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:30:44 lfs-oss-1-13 kernel: LustreError: 32125:0:(ost_handler.c:829:ost_brw_read()) Skipped 26 previous similar messages Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81083ab66000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103b442e000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033f1b6000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810143964000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a95ff0000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108955d4000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810143964000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8108955d4000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a95ff0000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 32055:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req@ffff8105d888d400 x1398900877782393/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 1 to 0 dl 1335087306 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 32055:0:(ost_handler.c:1073:ost_brw_write()) Skipped 1 previous similar message Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff81039b1a6000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 2, status -103, desc ffff81039b1a6000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 4, status -103, desc ffff81041fabe000 Apr 22 09:31:21 lfs-oss-1-13 kernel: LustreError: 21826:0:(events.c:381:server_bulk_callback()) event type 2, status -103, desc ffff81041fabe000 Apr 22 09:31:25 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 09:31:25 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 09:32:03 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109c1624000 Apr 22 09:32:03 lfs-oss-1-13 kernel: Lustre: 32084:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 09:32:03 lfs-oss-1-13 kernel: Lustre: 32084:0:(ost_handler.c:887:ost_brw_read()) Skipped 38 previous similar messages Apr 22 09:33:01 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810c23384000 x1398900890378799/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335087255 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:33:31 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b80154000 Apr 22 09:35:26 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bdf0f0000 Apr 22 09:35:26 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810804b7c000 Apr 22 09:35:26 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101f7a82000 Apr 22 09:35:26 lfs-oss-1-13 kernel: LustreError: 31897:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105f5d62850 x1398900890389935/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335087426 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 09:35:26 lfs-oss-1-13 kernel: LustreError: 31897:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 21 previous similar messages Apr 22 09:35:37 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810acb1c4000 Apr 22 09:35:37 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.6.174@o2ib Apr 22 09:35:37 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 12 previous similar messages Apr 22 09:35:57 lfs-oss-1-13 kernel: Lustre: 31877:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: df2e0f6d-50a6-f345-0e51-0137be7a5fd1 reconnecting Apr 22 09:35:57 lfs-oss-1-13 kernel: Lustre: 31877:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 146 previous similar messages Apr 22 09:36:31 lfs-oss-1-13 kernel: Lustre: 31791:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 09:36:31 lfs-oss-1-13 kernel: Lustre: 31791:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 18 previous similar messages Apr 22 09:36:50 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b02f8e000 Apr 22 09:36:50 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c16922000 Apr 22 09:36:50 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81082c8c0000 Apr 22 09:36:50 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103730ca000 Apr 22 09:37:31 lfs-oss-1-13 kernel: Lustre: 31729:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897320542923 sent from scratch1-OST0085 to NID 10.174.0.200@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 09:37:31 lfs-oss-1-13 kernel: req@ffff810bd8d11800 x1398897320542923/t0 o104->@NET_0x500000aae00c8_UUID:15/16 lens 296/384 e 0 to 1 dl 1335087451 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 09:37:31 lfs-oss-1-13 kernel: Lustre: 31729:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Apr 22 09:37:31 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0085: A client on nid 10.174.0.200@o2ib was evicted due to a lock blocking callback to 10.174.0.200@o2ib timed out: rc -107 Apr 22 09:37:31 lfs-oss-1-13 kernel: LustreError: Skipped 2 previous similar messages Apr 22 09:37:53 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101e2342000 Apr 22 09:37:53 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105a48e2000 Apr 22 09:37:53 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bce3cc000 Apr 22 09:37:53 lfs-oss-1-13 kernel: LustreError: 32113:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105c940cc00 x1398900890392981/t0 o4->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/416 e 0 to 0 dl 1335087916 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:37:56 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81068e104000 Apr 22 09:38:30 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d184e000 Apr 22 09:38:30 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101341f4000 Apr 22 09:38:30 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81030f586000 Apr 22 09:38:30 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109f561e000 Apr 22 09:38:30 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810259844000 Apr 22 09:38:30 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810580b32000 Apr 22 09:39:33 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106fca18000 Apr 22 09:39:33 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108f2e9c000 Apr 22 09:39:33 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105c08d8000 Apr 22 09:39:33 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab1cee000 Apr 22 09:39:33 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81018c4c2000 Apr 22 09:39:33 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101f2e08000 Apr 22 09:39:39 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 3 seconds Apr 22 09:39:39 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 6 previous similar messages Apr 22 09:39:39 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (47) Apr 22 09:39:39 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 6 previous similar messages Apr 22 09:39:39 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810715a68000 Apr 22 09:39:59 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103b442e000 Apr 22 09:39:59 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109c1624000 Apr 22 09:39:59 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd76d0000 Apr 22 09:39:59 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108855b8000 Apr 22 09:39:59 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81088cbc4000 Apr 22 09:40:15 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810882018000 Apr 22 09:40:42 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026da0a000 Apr 22 09:40:46 lfs-oss-1-13 kernel: LustreError: 32231:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105f42af400 x1398900890394864/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335087646 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:40:46 lfs-oss-1-13 kernel: LustreError: 32231:0:(ost_handler.c:822:ost_brw_read()) Skipped 4 previous similar messages Apr 22 09:40:46 lfs-oss-1-13 kernel: LustreError: 32090:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810bc7ffac00 x1398900888478057/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335087687 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:40:46 lfs-oss-1-13 kernel: LustreError: 32090:0:(ost_handler.c:829:ost_brw_read()) Skipped 43 previous similar messages Apr 22 09:40:56 lfs-oss-1-13 kernel: Lustre: 31999:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008b: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 09:40:56 lfs-oss-1-13 kernel: Lustre: 31999:0:(ost_handler.c:1224:ost_brw_write()) Skipped 7 previous similar messages Apr 22 09:41:15 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108f2e9c000 Apr 22 09:41:15 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102e9786000 Apr 22 09:41:15 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102fe954000 Apr 22 09:41:15 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810348598000 Apr 22 09:41:15 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810270506000 Apr 22 09:41:15 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810387330000 Apr 22 09:41:52 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107203be000 Apr 22 09:42:34 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102ecc82000 Apr 22 09:42:34 lfs-oss-1-13 kernel: Lustre: 32066:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 09:42:34 lfs-oss-1-13 kernel: Lustre: 32066:0:(ost_handler.c:887:ost_brw_read()) Skipped 58 previous similar messages Apr 22 09:42:43 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b5834000 Apr 22 09:42:43 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810956348000 Apr 22 09:42:43 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81033fba4000 Apr 22 09:42:43 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020f68e000 Apr 22 09:42:43 lfs-oss-1-13 kernel: LustreError: 31719:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 318 < 320 req@ffff8105a3406000 x1398900877809345/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 09:43:07 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810137666000 Apr 22 09:43:08 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810437856000 Apr 22 09:43:08 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810503a76000 Apr 22 09:43:08 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d96ea000 Apr 22 09:43:08 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81071f6f6000 Apr 22 09:43:08 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106de208000 Apr 22 09:43:08 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810451216000 Apr 22 09:43:21 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81076d236000 Apr 22 09:43:21 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810491904000 Apr 22 09:43:21 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ab51b6000 Apr 22 09:43:21 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103fd754000 Apr 22 09:43:21 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81018bb42000 Apr 22 09:43:46 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81009c9aa880 Apr 22 09:43:46 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108855b8000 Apr 22 09:43:46 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944c86000 Apr 22 09:44:02 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105e4c7a000 Apr 22 09:44:23 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810410013000 Apr 22 09:44:24 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a5ad6000 Apr 22 09:44:24 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100703ea000 Apr 22 09:44:24 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81008e002000 Apr 22 09:44:24 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016d3b8000 Apr 22 09:44:24 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810427f6e000 Apr 22 09:45:29 lfs-oss-1-13 kernel: LustreError: 31900:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105c5665400 x1398900890402182/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335088029 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 09:45:29 lfs-oss-1-13 kernel: LustreError: 31900:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 39 previous similar messages Apr 22 09:45:40 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105b6ab7c00 x1398900877814153/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335088316 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:45:40 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:825:ost_brw_read()) Skipped 7 previous similar messages Apr 22 09:46:03 lfs-oss-1-13 kernel: Lustre: Service thread pid 32142 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 09:46:03 lfs-oss-1-13 kernel: Lustre: Skipped 1 previous similar message Apr 22 09:46:03 lfs-oss-1-13 kernel: Pid: 32142, comm: ll_ost_io_149 Apr 22 09:46:03 lfs-oss-1-13 kernel: Apr 22 09:46:03 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:46:03 lfs-oss-1-13 kernel: [] deadline_queue_empty+0x0/0x23 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:46:03 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:46:03 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:46:03 lfs-oss-1-13 kernel: Apr 22 09:46:03 lfs-oss-1-13 kernel: Pid: 32175, comm: ll_ost_io_182 Apr 22 09:46:03 lfs-oss-1-13 kernel: Apr 22 09:46:03 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:46:03 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:46:03 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:46:03 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:46:03 lfs-oss-1-13 kernel: Apr 22 09:46:03 lfs-oss-1-13 kernel: Pid: 32232, comm: ll_ost_io_239 Apr 22 09:46:03 lfs-oss-1-13 kernel: Apr 22 09:46:03 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:46:03 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:46:03 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:46:03 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:46:03 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:46:03 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:46:03 lfs-oss-1-13 kernel: Apr 22 09:46:08 lfs-oss-1-13 kernel: Lustre: 31914:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 09:46:08 lfs-oss-1-13 kernel: Lustre: 31914:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 351 previous similar messages Apr 22 09:46:18 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bd76d0000 Apr 22 09:46:18 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81043753e000 Apr 22 09:46:18 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.43@o2ib Apr 22 09:46:18 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 21 previous similar messages Apr 22 09:46:29 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102a4a8b000 Apr 22 09:46:30 lfs-oss-1-13 kernel: Lustre: scratch1-OST0089: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 09:46:30 lfs-oss-1-13 kernel: Lustre: Skipped 5 previous similar messages Apr 22 09:46:34 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b7ebf2000 Apr 22 09:46:34 lfs-oss-1-13 kernel: Lustre: 31823:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 09:46:34 lfs-oss-1-13 kernel: Lustre: 31823:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 37 previous similar messages Apr 22 09:47:25 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108e33dc000 Apr 22 09:47:25 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c2a7a000 Apr 22 09:47:25 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108ba742000 Apr 22 09:47:25 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810143f3a000 Apr 22 09:47:25 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81037e252000 Apr 22 09:47:25 lfs-oss-1-13 kernel: LustreError: 31765:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 166 < 167 req@ffff8105b1989400 x1398900890407399/t0 o400->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 192/0 e 0 to 0 dl 1335088127 ref 1 fl Interpret:H/0/0 rc 0/0 Apr 22 09:47:53 lfs-oss-1-13 kernel: Lustre: Service thread pid 32118 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 09:47:53 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 09:47:53 lfs-oss-1-13 kernel: Pid: 32118, comm: ll_ost_io_125 Apr 22 09:47:53 lfs-oss-1-13 kernel: Apr 22 09:47:53 lfs-oss-1-13 kernel: Call Trace: Apr 22 09:47:53 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 09:47:53 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 09:47:53 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 09:47:53 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 09:47:53 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 09:47:53 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 09:47:53 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 09:47:53 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 09:47:53 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 09:47:53 lfs-oss-1-13 kernel: Apr 22 09:48:02 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81039f1ee000 Apr 22 09:48:02 lfs-oss-1-13 kernel: Lustre: Service thread pid 32118 completed after 209.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 09:48:02 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 09:48:36 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810480ed9000 Apr 22 09:49:03 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109da82c000 Apr 22 09:49:03 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b46702000 Apr 22 09:49:03 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a6e960000 Apr 22 09:49:11 lfs-oss-1-13 kernel: LustreError: 32232:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810ae6f3fc00 x1398900877809799/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 2 to 0 dl 1335088307 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:49:11 lfs-oss-1-13 kernel: LustreError: 32232:0:(ost_handler.c:825:ost_brw_read()) Skipped 3 previous similar messages Apr 22 09:49:11 lfs-oss-1-13 kernel: Lustre: Service thread pid 32232 completed after 388.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 09:49:57 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 4 seconds Apr 22 09:49:57 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 6 previous similar messages Apr 22 09:49:57 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (40) Apr 22 09:49:57 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 6 previous similar messages Apr 22 09:49:57 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104b9002000 Apr 22 09:50:09 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108809ac000 Apr 22 09:50:09 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81020f68e000 Apr 22 09:50:09 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81020f68e000 Apr 22 09:50:09 lfs-oss-1-13 kernel: LustreError: 32222:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1026392) req@ffff810692dba800 x1398900877831770/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335088760 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:50:09 lfs-oss-1-13 kernel: LustreError: 32222:0:(ost_handler.c:1073:ost_brw_write()) Skipped 4 previous similar messages Apr 22 09:50:30 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a6e960000 Apr 22 09:50:30 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b5834000 Apr 22 09:50:30 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81071f6f6000 Apr 22 09:50:30 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a42dc2000 Apr 22 09:50:30 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b46702000 Apr 22 09:50:30 lfs-oss-1-13 kernel: Lustre: 31943:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST008c: exp ffff8105a66d7200 already connecting Apr 22 09:50:34 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810775e6c000 Apr 22 09:50:52 lfs-oss-1-13 kernel: Lustre: 32197:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897320850353 sent from scratch1-OST0085 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 09:50:52 lfs-oss-1-13 kernel: req@ffff8105bab60800 x1398897320850353/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335088252 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:50:52 lfs-oss-1-13 kernel: Lustre: 32197:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 09:50:52 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0085: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 09:50:52 lfs-oss-1-13 kernel: LustreError: Skipped 3 previous similar messages Apr 22 09:51:45 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a3cabc000 Apr 22 09:52:02 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102dada2000 Apr 22 09:52:22 lfs-oss-1-13 kernel: LustreError: 32034:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c248cac00 x1398901148884543/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335088981 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:52:22 lfs-oss-1-13 kernel: LustreError: 32034:0:(ost_handler.c:829:ost_brw_read()) Skipped 57 previous similar messages Apr 22 09:52:23 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104cb0a0000 Apr 22 09:52:23 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81093f07c000 Apr 22 09:52:23 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109efcda000 Apr 22 09:52:40 lfs-oss-1-13 kernel: Lustre: 32240:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 09:52:40 lfs-oss-1-13 kernel: Lustre: 32240:0:(ost_handler.c:887:ost_brw_read()) Skipped 61 previous similar messages Apr 22 09:53:14 lfs-oss-1-13 kernel: LustreError: 32101:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105bd143c00 x1399591474482203/t0 o4->5d2ee37f-016d-86e3-a037-771ecb8874f1@NET_0x500000aae0739_UUID:0/0 lens 448/416 e 0 to 0 dl 1335088984 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:53:14 lfs-oss-1-13 kernel: Lustre: 32101:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0089: ignoring bulk IO comm error with 5d2ee37f-016d-86e3-a037-771ecb8874f1@NET_0x500000aae0739_UUID id 12345-10.174.7.57@o2ib - client will retry Apr 22 09:53:14 lfs-oss-1-13 kernel: Lustre: 32101:0:(ost_handler.c:1224:ost_brw_write()) Skipped 1 previous similar message Apr 22 09:53:25 lfs-oss-1-13 kernel: LustreError: 32137:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105c21bf000 x1398900886623551/t0 o4->6655a9b8-0c55-162b-cad6-550284c60b93@NET_0x500000aae073b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335088995 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:53:25 lfs-oss-1-13 kernel: LustreError: 32137:0:(ost_handler.c:1064:ost_brw_write()) Skipped 3 previous similar messages Apr 22 09:53:43 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81090d3e4000 Apr 22 09:53:51 lfs-oss-1-13 kernel: LustreError: 32186:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c22965000 x1398900877847097/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335088519 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 09:53:51 lfs-oss-1-13 kernel: LustreError: 32186:0:(ost_handler.c:825:ost_brw_read()) Skipped 6 previous similar messages Apr 22 09:54:04 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810192cc5000 Apr 22 09:54:30 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81016694e000 Apr 22 09:54:40 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.7.55@o2ib ns: filter-scratch1-OST0088_UUID lock: ffff810437dd1000/0xcca1a6f6c7c3a209 lrc: 3/0,0 mode: PR/PR res: 33244930/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x527a9ac90c0f649a expref: 18 pid: 31849 timeout 5294966047 Apr 22 09:54:40 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 4 previous similar messages Apr 22 09:54:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8105f047e800 x1398897320940617/t0 o105->@NET_0x500000aae0737_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:54:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.7.55@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0088_UUID lock: ffff8107b0649000/0xcca1a6f6c7c3fab6 lrc: 3/0,0 mode: PW/PW res: 33244930/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->315391) flags: 0x0 remote: 0x527a9ac90c0f64cb expref: 13 pid: 31808 timeout 0 Apr 22 09:54:40 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 09:55:07 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944c86000 Apr 22 09:56:09 lfs-oss-1-13 kernel: Lustre: 31776:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0088: e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1 reconnecting Apr 22 09:56:09 lfs-oss-1-13 kernel: Lustre: 31776:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 252 previous similar messages Apr 22 09:56:15 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81086ba12000 Apr 22 09:56:15 lfs-oss-1-13 kernel: LustreError: 31734:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810a0df41400 x1398901148889344/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335088675 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 09:56:15 lfs-oss-1-13 kernel: LustreError: 31734:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 40 previous similar messages Apr 22 09:56:23 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b26e60000 Apr 22 09:56:23 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 09:56:23 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 22 09:56:44 lfs-oss-1-13 kernel: LustreError: 31896:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff81012ce87e00/0xcca1a6f6c7c83594 lrc: 3/0,0 mode: PR/PR res: 33245155/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b8de92a3 expref: 12 pid: 31946 timeout 5295090975 Apr 22 09:56:44 lfs-oss-1-13 kernel: LustreError: 31896:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 1 previous similar message Apr 22 09:56:44 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c263b3800 x1398897321005324/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 09:56:44 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0084_UUID lock: ffff81092323ca00/0xcca1a6f6c7d37f9f lrc: 3/0,0 mode: PW/PW res: 33245155/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->45055) flags: 0x0 remote: 0x382dcc42b8de9352 expref: 8 pid: 31737 timeout 0 Apr 22 09:57:03 lfs-oss-1-13 kernel: Lustre: 31751:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from e2a5fb38-e5ba-b715-cd20-b10ad1baf6e1@10.174.0.68@o2ib to 0xffff8105cb225e00; still busy with 1 active RPCs Apr 22 09:57:03 lfs-oss-1-13 kernel: Lustre: 31751:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 34 previous similar messages Apr 22 09:57:42 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810474bf8000 Apr 22 09:58:42 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107af6f7000 Apr 22 09:58:55 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81013113a000 Apr 22 09:58:55 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810421dc8000 Apr 22 09:58:55 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810421dc8000 Apr 22 09:58:55 lfs-oss-1-13 kernel: LustreError: 32016:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1004208) req@ffff8106f0073000 x1398900877919436/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335089275 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 09:59:11 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810972ce0000 Apr 22 10:00:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 10:00:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 4 previous similar messages Apr 22 10:00:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (46) Apr 22 10:00:02 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 4 previous similar messages Apr 22 10:00:02 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103df418000 Apr 22 10:01:17 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100694da000 Apr 22 10:01:26 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810337717000 Apr 22 10:01:26 lfs-oss-1-13 kernel: Lustre: 31821:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897321007339 sent from scratch1-OST0086 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 10:01:26 lfs-oss-1-13 kernel: req@ffff810c1e40d400 x1398897321007339/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335088886 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 10:01:26 lfs-oss-1-13 kernel: Lustre: 31821:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Apr 22 10:01:26 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0086: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 10:01:26 lfs-oss-1-13 kernel: LustreError: Skipped 2 previous similar messages Apr 22 10:01:30 lfs-oss-1-13 kernel: Lustre: scratch1-OST0087: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 10:01:30 lfs-oss-1-13 kernel: Lustre: Skipped 3 previous similar messages Apr 22 10:02:07 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b10051c00 Apr 22 10:02:07 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810944c86000 Apr 22 10:02:07 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810b10051c00 Apr 22 10:02:07 lfs-oss-1-13 kernel: LustreError: 32136:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(118940) req@ffff810c23249400 x1398900877924200/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335089466 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:02:12 lfs-oss-1-13 kernel: Lustre: 32170:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-159), not sending early reply Apr 22 10:02:12 lfs-oss-1-13 kernel: req@ffff8105c7057000 x1398900890409944/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 1 to 0 dl 1335088937 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 10:02:17 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 764+0s req@ffff8105c7057000 x1398900890409944/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 1 to 0 dl 1335088937 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:02:17 lfs-oss-1-13 kernel: LustreError: 32023:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 10:02:20 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bc31e2000 Apr 22 10:03:02 lfs-oss-1-13 kernel: LustreError: 31792:0:(ldlm_lockd.c:1184:ldlm_handle_enqueue()) ### lock on destroyed export ffff810a797eb400 ns: filter-scratch1-OST008c_UUID lock: ffff8103d5990600/0xcca1a6f6c7df4932 lrc: 1/0,0 mode: --/PW res: 33249305/0 rrc: 1 type: EXT [0->4095] (req 0->4095) flags: 0x20000080 remote: 0x382dcc42b8e5cf18 expref: 20 pid: 31792 timeout 0 Apr 22 10:03:02 lfs-oss-1-13 kernel: LustreError: 32100:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c1ca99800 x1398900877943669/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335089730 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:03:03 lfs-oss-1-13 kernel: LustreError: 32010:0:(ost_handler.c:1060:ost_brw_write()) @@@ Eviction on bulk GET req@ffff810c28dd7c00 x1398900877943611/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335089730 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:03:33 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810491ba6000 Apr 22 10:03:46 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c1a12d8c0 Apr 22 10:03:46 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a16f6a000 Apr 22 10:03:46 lfs-oss-1-13 kernel: Lustre: 32010:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008c: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:03:46 lfs-oss-1-13 kernel: Lustre: 32010:0:(ost_handler.c:1224:ost_brw_write()) Skipped 6 previous similar messages Apr 22 10:03:46 lfs-oss-1-13 kernel: Lustre: 32000:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008c: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:03:46 lfs-oss-1-13 kernel: Lustre: 32000:0:(ost_handler.c:887:ost_brw_read()) Skipped 20 previous similar messages Apr 22 10:04:34 lfs-oss-1-13 kernel: Lustre: 32101:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-301), not sending early reply Apr 22 10:04:34 lfs-oss-1-13 kernel: req@ffff810c2959ec00 x1398900890409946/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 1 to 0 dl 1335089079 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 10:04:34 lfs-oss-1-13 kernel: Lustre: 32101:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 22 10:04:39 lfs-oss-1-13 kernel: LustreError: 32035:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 906+0s req@ffff8105bad15400 x1398900890409949/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 1 to 0 dl 1335089079 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:04:39 lfs-oss-1-13 kernel: LustreError: 32035:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 10:06:15 lfs-oss-1-13 kernel: Lustre: 31722:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008d: df2e0f6d-50a6-f345-0e51-0137be7a5fd1 reconnecting Apr 22 10:06:15 lfs-oss-1-13 kernel: Lustre: 31722:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 146 previous similar messages Apr 22 10:06:30 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bb995a000 Apr 22 10:06:30 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.200@o2ib Apr 22 10:06:30 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 10:06:30 lfs-oss-1-13 kernel: LustreError: 31810:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105a9322450 x1398900890441767/t0 o8->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 368/264 e 0 to 0 dl 1335089290 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 10:06:30 lfs-oss-1-13 kernel: LustreError: 31810:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 20 previous similar messages Apr 22 10:06:30 lfs-oss-1-13 kernel: LustreError: 31997:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105ecb28800 x1398900890440942/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335089208 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:06:30 lfs-oss-1-13 kernel: LustreError: 31997:0:(ost_handler.c:829:ost_brw_read()) Skipped 21 previous similar messages Apr 22 10:07:24 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b5834000 Apr 22 10:07:24 lfs-oss-1-13 kernel: LustreError: 21820:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81007e142000 Apr 22 10:07:24 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101680f4000 Apr 22 10:07:24 lfs-oss-1-13 kernel: Lustre: 31845:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from 1662f6a0-94ac-b558-ad6c-555bd1b705c9@10.174.0.200@o2ib to 0xffff8105b476c400; still busy with 4 active RPCs Apr 22 10:07:24 lfs-oss-1-13 kernel: Lustre: 31845:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 22 previous similar messages Apr 22 10:07:58 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106d5254000 Apr 22 10:08:23 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81072db22000 Apr 22 10:08:23 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101b5834000 Apr 22 10:08:23 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100ae36e000 Apr 22 10:08:23 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a41004000 Apr 22 10:10:17 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810412dee000 Apr 22 10:11:33 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81029f1fc000 Apr 22 10:11:33 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109d4976000 Apr 22 10:11:33 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101c2154000 Apr 22 10:11:46 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff8106e930f000/0xcca1a6f6c7e7589c lrc: 3/0,0 mode: PR/PR res: 33254748/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b8ef4a58 expref: 11 pid: 31735 timeout 5295992268 Apr 22 10:12:48 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ba032a000 Apr 22 10:12:48 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e5446000 Apr 22 10:12:48 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107c16d2000 Apr 22 10:12:48 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81079cdfe000 Apr 22 10:12:48 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105051a4000 Apr 22 10:12:48 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81085f7de000 Apr 22 10:14:31 lfs-oss-1-13 kernel: Lustre: 32154:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-111), not sending early reply Apr 22 10:14:31 lfs-oss-1-13 kernel: req@ffff810874316000 x1398901148895758/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335089676 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 10:14:31 lfs-oss-1-13 kernel: Lustre: 32154:0:(service.c:808:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 22 10:14:36 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 716+0s req@ffff810874316000 x1398901148895758/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335089676 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:14:36 lfs-oss-1-13 kernel: LustreError: 32031:0:(ost_handler.c:822:ost_brw_read()) Skipped 3 previous similar messages Apr 22 10:14:36 lfs-oss-1-13 kernel: Lustre: 32031:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID id 12345-10.174.6.174@o2ib - client will retry Apr 22 10:14:36 lfs-oss-1-13 kernel: Lustre: 32031:0:(ost_handler.c:887:ost_brw_read()) Skipped 17 previous similar messages Apr 22 10:14:43 lfs-oss-1-13 kernel: Lustre: scratch1-OST008c: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 10:15:48 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 9 seconds Apr 22 10:15:48 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 10:15:48 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (47) Apr 22 10:15:48 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 10:15:48 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109434ae000 Apr 22 10:16:09 lfs-oss-1-13 kernel: Lustre: Service thread pid 32028 was inactive for 936.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 10:16:09 lfs-oss-1-13 kernel: Pid: 32028, comm: ll_ost_io_36 Apr 22 10:16:09 lfs-oss-1-13 kernel: Apr 22 10:16:09 lfs-oss-1-13 kernel: Call Trace: Apr 22 10:16:09 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 10:16:09 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 10:16:09 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 10:16:09 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 10:16:09 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 10:16:09 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 10:16:09 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 10:16:09 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 10:16:09 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 10:16:09 lfs-oss-1-13 kernel: Apr 22 10:16:35 lfs-oss-1-13 kernel: LustreError: 31997:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c2c17c400 x1398900877971998/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335090270 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:16:35 lfs-oss-1-13 kernel: LustreError: 31997:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 10:16:48 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107ee1b13c0 Apr 22 10:16:48 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8107ee1b13c0 Apr 22 10:16:48 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108e1286000 Apr 22 10:16:48 lfs-oss-1-13 kernel: LustreError: 32210:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(11264) req@ffff8105b72de800 x1398900877977685/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335090504 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:16:48 lfs-oss-1-13 kernel: Lustre: 32210:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0087: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:16:48 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.43@o2ib Apr 22 10:16:48 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 10 previous similar messages Apr 22 10:16:48 lfs-oss-1-13 kernel: Lustre: 31779:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0087: df2e0f6d-50a6-f345-0e51-0137be7a5fd1 reconnecting Apr 22 10:16:48 lfs-oss-1-13 kernel: Lustre: 31779:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 155 previous similar messages Apr 22 10:16:51 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81084e8ee000 Apr 22 10:16:55 lfs-oss-1-13 kernel: LustreError: 31829:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105e6099400 x1398901148909428/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335089915 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 10:16:55 lfs-oss-1-13 kernel: LustreError: 31829:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 16 previous similar messages Apr 22 10:16:55 lfs-oss-1-13 kernel: LustreError: 32216:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105dfb42000 x1398901148908614/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335090473 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:16:55 lfs-oss-1-13 kernel: LustreError: 32216:0:(ost_handler.c:829:ost_brw_read()) Skipped 11 previous similar messages Apr 22 10:18:07 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810075672000 Apr 22 10:18:07 lfs-oss-1-13 kernel: Lustre: 31937:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@10.174.6.174@o2ib to 0xffff8105bbedb200; still busy with 1 active RPCs Apr 22 10:18:07 lfs-oss-1-13 kernel: Lustre: 31937:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 9 previous similar messages Apr 22 10:19:10 lfs-oss-1-13 kernel: Lustre: 32210:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-517), not sending early reply Apr 22 10:19:10 lfs-oss-1-13 kernel: req@ffff8105dfa88000 x1398900888499378/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 1 to 0 dl 1335089955 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 10:19:10 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102455aa000 Apr 22 10:19:15 lfs-oss-1-13 kernel: Lustre: Service thread pid 32028 completed after 1122.03s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 10:19:15 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 10:20:13 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e5446000 Apr 22 10:20:24 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0087_UUID lock: ffff810668b23e00/0xcca1a6f6c81b276c lrc: 3/0,0 mode: PR/PR res: 33271699/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b8f73d28 expref: 16 pid: 31736 timeout 5296510319 Apr 22 10:20:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c03c1ac00 x1398897321438549/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 10:20:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0087_UUID lock: ffff81048c605000/0xcca1a6f6c823fe79 lrc: 3/0,0 mode: PW/PW res: 33271699/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b8f7ad67 expref: 11 pid: 31941 timeout 0 Apr 22 10:21:16 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a50ce0000 Apr 22 10:21:41 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104ab040000 Apr 22 10:21:41 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810552636000 Apr 22 10:22:20 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027daaa000 Apr 22 10:22:31 lfs-oss-1-13 kernel: Lustre: 31853:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897321461459 sent from scratch1-OST008c to NID 10.174.14.43@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 10:22:31 lfs-oss-1-13 kernel: req@ffff8105ee635c00 x1398897321461459/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335090151 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 10:22:31 lfs-oss-1-13 kernel: Lustre: 31853:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Apr 22 10:22:31 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008c: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 10:22:31 lfs-oss-1-13 kernel: LustreError: Skipped 10 previous similar messages Apr 22 10:22:31 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c33aa2800 x1398897321461499/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 10:22:31 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008c_UUID lock: ffff8101a8a71c00/0xcca1a6f6c838122b lrc: 3/0,0 mode: PW/PW res: 33276621/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x0 remote: 0x382dcc42b8f99473 expref: 10 pid: 31853 timeout 0 Apr 22 10:22:32 lfs-oss-1-13 kernel: LustreError: 32040:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810c21a94000 x1398900877988185/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335090240 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:22:32 lfs-oss-1-13 kernel: LustreError: 32040:0:(ost_handler.c:825:ost_brw_read()) Skipped 5 previous similar messages Apr 22 10:23:10 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105ed624280 Apr 22 10:23:10 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81027c2e6000 Apr 22 10:23:11 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108814cd000 Apr 22 10:23:12 lfs-oss-1-13 kernel: Lustre: 32244:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 10:23:12 lfs-oss-1-13 kernel: req@ffff81095382fc00 x1398900890445916/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335090197 ref 2 fl Interpret:/0/0 rc 0/0 Apr 22 10:23:17 lfs-oss-1-13 kernel: LustreError: 32197:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 755+0s req@ffff81095382fc00 x1398900890445916/t0 o3->1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID:0/0 lens 448/400 e 0 to 0 dl 1335090197 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:23:17 lfs-oss-1-13 kernel: LustreError: 32197:0:(ost_handler.c:822:ost_brw_read()) Skipped 1 previous similar message Apr 22 10:23:22 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST0087_UUID lock: ffff81030c695800/0xcca1a6f6c835f97e lrc: 3/0,0 mode: PW/PW res: 33184707/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b8f80606 expref: 8 pid: 31815 timeout 5296688160 Apr 22 10:23:23 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105eaf8c000 Apr 22 10:24:14 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bc57ff000 Apr 22 10:24:24 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810934692000 Apr 22 10:25:01 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100334c0000 Apr 22 10:25:01 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81082d038000 Apr 22 10:25:01 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b7a7e6000 Apr 22 10:25:01 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81063aab2000 Apr 22 10:25:01 lfs-oss-1-13 kernel: Lustre: 32062:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008b: ignoring bulk IO comm error with 1662f6a0-94ac-b558-ad6c-555bd1b705c9@NET_0x500000aae00c8_UUID id 12345-10.174.0.200@o2ib - client will retry Apr 22 10:25:01 lfs-oss-1-13 kernel: Lustre: 32062:0:(ost_handler.c:887:ost_brw_read()) Skipped 26 previous similar messages Apr 22 10:25:17 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102b7513000 Apr 22 10:25:54 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: tx_queue, 8 seconds Apr 22 10:25:54 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 10 previous similar messages Apr 22 10:25:54 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (58) Apr 22 10:25:54 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 10 previous similar messages Apr 22 10:25:54 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810552636000 Apr 22 10:25:55 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810079d5c000 Apr 22 10:25:55 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109caf36000 Apr 22 10:25:55 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81082d038000 Apr 22 10:26:02 lfs-oss-1-13 kernel: Lustre: 31851:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897321531331 sent from scratch1-OST0089 to NID 10.174.14.43@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 10:26:02 lfs-oss-1-13 kernel: req@ffff8107c054b000 x1398897321531331/t0 o106->@NET_0x500000aae0e2b_UUID:15/16 lens 296/424 e 0 to 1 dl 1335090362 ref 1 fl Rpc:/0/0 rc 0/0 Apr 22 10:26:02 lfs-oss-1-13 kernel: Lustre: 31851:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 103294 previous similar messages Apr 22 10:26:28 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106da804000 Apr 22 10:26:57 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810232eb6000 Apr 22 10:26:57 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8105c3319800 Apr 22 10:26:57 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8105c3319800 Apr 22 10:26:57 lfs-oss-1-13 kernel: LustreError: 32102:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(218624) req@ffff810c2959ec00 x1398900878018347/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335091132 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 32102:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008e: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.43@o2ib Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 22 previous similar messages Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 31936:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: df2e0f6d-50a6-f345-0e51-0137be7a5fd1 reconnecting Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 31839:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008c: df2e0f6d-50a6-f345-0e51-0137be7a5fd1 reconnecting Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 31839:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 300 previous similar messages Apr 22 10:26:57 lfs-oss-1-13 kernel: LustreError: 31819:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810bd3933800 x1398900878019296/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335090517 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 10:26:57 lfs-oss-1-13 kernel: Lustre: 31936:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 303 previous similar messages Apr 22 10:26:57 lfs-oss-1-13 kernel: LustreError: 31819:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 29 previous similar messages Apr 22 10:26:58 lfs-oss-1-13 kernel: LustreError: 32018:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c34d07850 x1398900878015225/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091119 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:26:58 lfs-oss-1-13 kernel: LustreError: 32018:0:(ost_handler.c:829:ost_brw_read()) Skipped 23 previous similar messages Apr 22 10:26:58 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9d5e8000 Apr 22 10:26:58 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106caf7e000 Apr 22 10:26:58 lfs-oss-1-13 kernel: LustreError: 21823:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81082d038000 Apr 22 10:27:23 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a13ba4000 Apr 22 10:27:24 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008b_UUID lock: ffff81039becf600/0xcca1a6f6c83812cc lrc: 3/0,0 mode: PR/PR res: 33280583/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x382dcc42b8f9a3ae expref: 9 pid: 31724 timeout 5296930056 Apr 22 10:27:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810bd175c000 x1398897321596904/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 10:27:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008b_UUID lock: ffff8101416bfc00/0xcca1a6f6c843c5b4 lrc: 3/0,0 mode: PW/PW res: 33280583/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b90076d6 expref: 7 pid: 31713 timeout 0 Apr 22 10:27:57 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c3c488000 Apr 22 10:28:06 lfs-oss-1-13 kernel: Lustre: scratch1-OST0086: haven't heard from client df2e0f6d-50a6-f345-0e51-0137be7a5fd1 (at 10.174.14.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 10:28:06 lfs-oss-1-13 kernel: Lustre: Skipped 5 previous similar messages Apr 22 10:28:14 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81075d0dc000 Apr 22 10:28:14 lfs-oss-1-13 kernel: Lustre: 31791:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from c49d8140-06a7-779c-f541-694bd8aab9b4@10.174.0.204@o2ib to 0xffff8105eb2f0200; still busy with 1 active RPCs Apr 22 10:28:14 lfs-oss-1-13 kernel: Lustre: 31791:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 31 previous similar messages Apr 22 10:28:46 lfs-oss-1-13 kernel: LustreError: 32056:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105dd329000 x1398900878024316/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091247 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:28:46 lfs-oss-1-13 kernel: LustreError: 32056:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 10:29:03 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103e1a44000 Apr 22 10:29:25 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b6cc72000 Apr 22 10:30:06 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81026f820000 Apr 22 10:30:07 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8101fbbe0000 Apr 22 10:30:46 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810200e37000 Apr 22 10:31:10 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810416268000 Apr 22 10:31:35 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810743c68000 Apr 22 10:31:56 lfs-oss-1-13 kernel: LustreError: 21802:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106e31a9000 Apr 22 10:32:35 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810723268000 Apr 22 10:32:46 lfs-oss-1-13 kernel: Lustre: 31742:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897321717530 sent from scratch1-OST0089 to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 10:32:46 lfs-oss-1-13 kernel: req@ffff8105b2826c00 x1398897321717530/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335090766 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 10:32:46 lfs-oss-1-13 kernel: Lustre: 31742:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 135319 previous similar messages Apr 22 10:32:46 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 10:32:46 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810bbd5dc800 x1398897321724333/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 10:32:46 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0089_UUID lock: ffff810176f95e00/0xcca1a6f6c86a1e93 lrc: 3/0,0 mode: PW/PW res: 33285416/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b9085fc0 expref: 13 pid: 31742 timeout 0 Apr 22 10:32:47 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff810a6617e400 x1398900878031482/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335090858 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:33:03 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810166942000 Apr 22 10:33:05 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810829b5e000 Apr 22 10:33:28 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b5c378000 Apr 22 10:33:50 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c283fc800 x1398897321761705/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 10:33:50 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST0088_UUID lock: ffff810259727400/0xcca1a6f6c8726d4a lrc: 3/0,0 mode: PW/PW res: 33298210/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 remote: 0x382dcc42b909f5dc expref: 8 pid: 31929 timeout 0 Apr 22 10:33:54 lfs-oss-1-13 kernel: LustreError: 782:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335090834 with bad export cookie 14745250233833218572 Apr 22 10:34:32 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c1f56000 Apr 22 10:34:44 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81048e156000 Apr 22 10:34:44 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106caf7e000 Apr 22 10:34:44 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810acc2ca000 Apr 22 10:34:44 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108c0ef8000 Apr 22 10:34:44 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103cb0d0000 Apr 22 10:34:44 lfs-oss-1-13 kernel: Lustre: 31801:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-OST0085: exp ffff81059dd87400 already connecting Apr 22 10:35:35 lfs-oss-1-13 kernel: Lustre: Service thread pid 32179 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 22 10:35:35 lfs-oss-1-13 kernel: Pid: 32179, comm: ll_ost_io_186 Apr 22 10:35:35 lfs-oss-1-13 kernel: Apr 22 10:35:35 lfs-oss-1-13 kernel: Call Trace: Apr 22 10:35:35 lfs-oss-1-13 kernel: [] LNetMDBind+0x301/0x450 [lnet] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] schedule_timeout+0x8a/0xad Apr 22 10:35:35 lfs-oss-1-13 kernel: [] process_timeout+0x0/0x5 Apr 22 10:35:35 lfs-oss-1-13 kernel: [] ost_brw_read+0x127c/0x1a70 [ost] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] default_wake_function+0x0/0xe Apr 22 10:35:35 lfs-oss-1-13 kernel: [] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] ost_handle+0x2e73/0x55b0 [ost] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] __next_cpu+0x19/0x28 Apr 22 10:35:35 lfs-oss-1-13 kernel: [] smp_send_reschedule+0x4e/0x53 Apr 22 10:35:35 lfs-oss-1-13 kernel: [] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] __wake_up_common+0x3e/0x68 Apr 22 10:35:35 lfs-oss-1-13 kernel: [] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] child_rip+0xa/0x11 Apr 22 10:35:35 lfs-oss-1-13 kernel: [] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 22 10:35:35 lfs-oss-1-13 kernel: [] child_rip+0x0/0x11 Apr 22 10:35:35 lfs-oss-1-13 kernel: Apr 22 10:35:36 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81083527c000 Apr 22 10:35:36 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810c1e1b3dc0 Apr 22 10:35:36 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810164e5e000 Apr 22 10:35:36 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81051763c000 Apr 22 10:35:36 lfs-oss-1-13 kernel: LustreError: 31716:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 361 < 362 req@ffff8105a9c52c50 x1398900878037951/t0 o400->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 192/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 10:35:36 lfs-oss-1-13 kernel: LustreError: 31716:0:(service.c:653:ptlrpc_check_req()) Skipped 1 previous similar message Apr 22 10:35:37 lfs-oss-1-13 kernel: Lustre: 32039:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:35:37 lfs-oss-1-13 kernel: Lustre: 32039:0:(ost_handler.c:887:ost_brw_read()) Skipped 35 previous similar messages Apr 22 10:36:51 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810734178000 Apr 22 10:36:59 lfs-oss-1-13 kernel: Lustre: 31910:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 reconnecting Apr 22 10:36:59 lfs-oss-1-13 kernel: Lustre: 31910:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 205 previous similar messages Apr 22 10:37:00 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100aab72000 Apr 22 10:37:00 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810aab7a6000 Apr 22 10:37:00 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81001b5a0000 Apr 22 10:37:00 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff81001b5a0000 Apr 22 10:37:00 lfs-oss-1-13 kernel: LustreError: 32149:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1026392) req@ffff810c33521850 x1398900878041474/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335091062 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:37:00 lfs-oss-1-13 kernel: Lustre: 32149:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008a: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:37:00 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.14.43@o2ib Apr 22 10:37:00 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 9 previous similar messages Apr 22 10:37:47 lfs-oss-1-13 kernel: LustreError: 32050:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff8105c7d7c400 x1398900878041191/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091067 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:37:54 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81083527c000 Apr 22 10:37:54 lfs-oss-1-13 kernel: LustreError: 31952:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810902826400 x1398901148930227/t0 o8->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 368/264 e 0 to 0 dl 1335091174 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 10:37:54 lfs-oss-1-13 kernel: LustreError: 31952:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 27 previous similar messages Apr 22 10:37:54 lfs-oss-1-13 kernel: LustreError: 32192:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105f63f2000 x1398901148929430/t0 o3->960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54@NET_0x500000aae06ae_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091735 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:37:54 lfs-oss-1-13 kernel: LustreError: 32192:0:(ost_handler.c:829:ost_brw_read()) Skipped 29 previous similar messages Apr 22 10:38:19 lfs-oss-1-13 kernel: Lustre: scratch1-OST008c: haven't heard from client 960fc3ce-09cd-0bdb-fb3e-652cc7fdcc54 (at 10.174.6.174@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 10:38:19 lfs-oss-1-13 kernel: Lustre: Skipped 5 previous similar messages Apr 22 10:38:29 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810ae6136000 Apr 22 10:38:29 lfs-oss-1-13 kernel: Lustre: 31924:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from df2e0f6d-50a6-f345-0e51-0137be7a5fd1@10.174.14.43@o2ib to 0xffff810c220c2a00; still busy with 1 active RPCs Apr 22 10:38:29 lfs-oss-1-13 kernel: Lustre: 31924:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 21 previous similar messages Apr 22 10:38:57 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8100932e4000 Apr 22 10:39:32 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810b65bda000 Apr 22 10:40:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 10:40:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 13 previous similar messages Apr 22 10:40:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.6.174@o2ib (24) Apr 22 10:40:25 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 13 previous similar messages Apr 22 10:40:25 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81017c352000 Apr 22 10:41:40 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81096940a000 Apr 22 10:41:40 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81017f706000 Apr 22 10:41:40 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81025a452000 Apr 22 10:41:41 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810abe9b8000 Apr 22 10:43:08 lfs-oss-1-13 kernel: Lustre: 32127:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897321924987 sent from scratch1-OST008a to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 10:43:08 lfs-oss-1-13 kernel: req@ffff8105de28a800 x1398897321924987/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335091388 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 10:43:08 lfs-oss-1-13 kernel: Lustre: 32127:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 10:43:08 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008a: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 10:43:08 lfs-oss-1-13 kernel: LustreError: Skipped 2 previous similar messages Apr 22 10:43:13 lfs-oss-1-13 kernel: LustreError: 32088:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8108ef2e9c00 x1398900878071552/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091451 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:43:13 lfs-oss-1-13 kernel: LustreError: 32088:0:(ost_handler.c:825:ost_brw_read()) Skipped 1 previous similar message Apr 22 10:43:18 lfs-oss-1-13 kernel: LustreError: 21810:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a219c6000 Apr 22 10:43:18 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81091b236000 Apr 22 10:43:18 lfs-oss-1-13 kernel: LustreError: 21816:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8108a5fe0000 Apr 22 10:43:18 lfs-oss-1-13 kernel: LustreError: 23521:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335091398 with bad export cookie 14745250233842820766 Apr 22 10:43:18 lfs-oss-1-13 kernel: LustreError: 23521:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 22 10:44:45 lfs-oss-1-13 kernel: Lustre: 32048:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Apr 22 10:44:45 lfs-oss-1-13 kernel: req@ffff810803385800 x1398900888529997/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091490 ref 2 fl Interpret:/2/0 rc 0/0 Apr 22 10:44:50 lfs-oss-1-13 kernel: Lustre: Service thread pid 32179 completed after 755.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 22 10:45:55 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102d12d9000 Apr 22 10:45:56 lfs-oss-1-13 kernel: Lustre: 32113:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 10:45:56 lfs-oss-1-13 kernel: Lustre: 32113:0:(ost_handler.c:887:ost_brw_read()) Skipped 21 previous similar messages Apr 22 10:46:23 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81032ae57000 Apr 22 10:47:43 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81059c741000 Apr 22 10:47:43 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 10:47:43 lfs-oss-1-13 kernel: Lustre: 9219:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 5 previous similar messages Apr 22 10:47:43 lfs-oss-1-13 kernel: Lustre: 31734:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 10:47:43 lfs-oss-1-13 kernel: Lustre: 31734:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 130 previous similar messages Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810185fb0000 Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810954732000 Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81043f87a000 Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81092c32e000 Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 31810:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c183c5000 x1398900878114459/t0 o8->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335091802 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 31810:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 15 previous similar messages Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 32178:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810812d84000 x1398900878111900/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335091704 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:48:22 lfs-oss-1-13 kernel: LustreError: 32178:0:(ost_handler.c:829:ost_brw_read()) Skipped 12 previous similar messages Apr 22 10:48:51 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81041bf44000 Apr 22 10:48:51 lfs-oss-1-13 kernel: Lustre: 31825:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from c49d8140-06a7-779c-f541-694bd8aab9b4@10.174.0.204@o2ib to 0xffff8105eb2f0200; still busy with 1 active RPCs Apr 22 10:48:51 lfs-oss-1-13 kernel: Lustre: 31825:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 14 previous similar messages Apr 22 10:49:49 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102267bb000 Apr 22 10:50:16 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.14.43@o2ib ns: filter-scratch1-OST008a_UUID lock: ffff810467c3c800/0xcca1a6f6c8aded90 lrc: 3/0,0 mode: PW/PW res: 33316851/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->69631) flags: 0x20 remote: 0x382dcc42b923ac8a expref: 17 pid: 31910 timeout 5298302561 Apr 22 10:50:16 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 1 previous similar message Apr 22 10:50:17 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8105a5ad2c00 x1398900878115304/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335092491 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:50:17 lfs-oss-1-13 kernel: LustreError: 32164:0:(ost_handler.c:825:ost_brw_read()) Skipped 2 previous similar messages Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 2 seconds Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.14.43@o2ib (18) Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81015f56c000 Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102f97bb000 Apr 22 10:50:31 lfs-oss-1-13 kernel: LustreError: 31979:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 323 < 324 req@ffff810c225d1400 x1398900878117748/t0 o400->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 192/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 10:50:32 lfs-oss-1-13 kernel: LustreError: 32146:0:(events.c:381:server_bulk_callback()) event type 4, status -113, desc ffff810c219ef400 Apr 22 10:50:44 lfs-oss-1-13 kernel: LustreError: 32146:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105f08a0400 x1398900878118023/t0 o4->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/416 e 0 to 0 dl 1335092587 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:50:44 lfs-oss-1-13 kernel: Lustre: 32146:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0084: ignoring bulk IO comm error with df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID id 12345-10.174.14.43@o2ib - client will retry Apr 22 10:50:45 lfs-oss-1-13 kernel: LustreError: 32041:0:(service.c:653:ptlrpc_check_req()) @@@ DROPPING req from old connection 116 < 117 req@ffff810c35537450 x1398900876922158/t0 o4->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/0 e 0 to 0 dl 0 ref 2 fl New:/0/0 rc 0/0 Apr 22 10:50:53 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107e4ded000 Apr 22 10:50:55 lfs-oss-1-13 kernel: LustreError: 32119:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff8105b8a6c000 x1398900878649146/t0 o4->fc9defa2-847d-ec6c-210a-37d1e2ea24a1@NET_0x500000aae0e2f_UUID:0/0 lens 448/416 e 0 to 0 dl 1335092610 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:50:55 lfs-oss-1-13 kernel: LustreError: 32188:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff8105f4550c00 x1398900878649104/t0 o4->fc9defa2-847d-ec6c-210a-37d1e2ea24a1@NET_0x500000aae0e2f_UUID:0/0 lens 448/416 e 0 to 0 dl 1335092610 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 10:51:56 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810230c7d000 Apr 22 10:52:36 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 101s: evicting client at 10.174.14.47@o2ib ns: filter-scratch1-OST008a_UUID lock: ffff81057740e800/0xcca1a6f6c8b1b66f lrc: 3/0,0 mode: PR/PR res: 33317876/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x8bbdec56af40f642 expref: 22 pid: 31968 timeout 5298442087 Apr 22 10:52:36 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c28220000 x1398897322183753/t0 o105->@NET_0x500000aae0e2f_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 10:52:36 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 22 10:52:36 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.47@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008a_UUID lock: ffff8103c387d400/0xcca1a6f6c8b81ca0 lrc: 3/0,0 mode: PW/PW res: 33317876/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->12287) flags: 0x0 remote: 0x8bbdec56af410059 expref: 13 pid: 31824 timeout 0 Apr 22 10:52:36 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 10:53:11 lfs-oss-1-13 kernel: LustreError: 21818:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103ec5d5000 Apr 22 10:53:40 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104335c0000 Apr 22 10:53:40 lfs-oss-1-13 kernel: LustreError: 21819:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8104d77a2000 Apr 22 10:53:40 lfs-oss-1-13 kernel: LustreError: 21815:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81067edf4000 Apr 22 10:54:52 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81044b068000 Apr 22 10:55:44 lfs-oss-1-13 kernel: LustreError: 21809:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810797daa000 Apr 22 10:55:44 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109e3c62000 Apr 22 10:55:44 lfs-oss-1-13 kernel: LustreError: 21808:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103cd2c4000 Apr 22 10:56:01 lfs-oss-1-13 kernel: LustreError: 21805:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81056d394000 Apr 22 10:56:02 lfs-oss-1-13 kernel: Lustre: 32138:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 10:56:02 lfs-oss-1-13 kernel: Lustre: 32138:0:(ost_handler.c:887:ost_brw_read()) Skipped 19 previous similar messages Apr 22 10:56:32 lfs-oss-1-13 kernel: LustreError: 32053:0:(ost_handler.c:822:ost_brw_read()) @@@ timeout on bulk PUT after 100+0s req@ffff810c21ee9000 x1398900878123652/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 0 to 0 dl 1335092192 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:56:32 lfs-oss-1-13 kernel: LustreError: 32053:0:(ost_handler.c:822:ost_brw_read()) Skipped 2 previous similar messages Apr 22 10:56:58 lfs-oss-1-13 kernel: Lustre: scratch1-OST008d: haven't heard from client c49d8140-06a7-779c-f541-694bd8aab9b4 (at 10.174.0.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 10:56:58 lfs-oss-1-13 kernel: Lustre: Skipped 4 previous similar messages Apr 22 10:57:11 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050e1b8000 Apr 22 10:58:07 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81003f131000 Apr 22 10:58:07 lfs-oss-1-13 kernel: Lustre: 31757:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0085: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 10:58:07 lfs-oss-1-13 kernel: Lustre: 31841:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0088: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 10:58:07 lfs-oss-1-13 kernel: Lustre: 31721:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0086: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 10:58:07 lfs-oss-1-13 kernel: Lustre: 31841:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 179 previous similar messages Apr 22 10:58:07 lfs-oss-1-13 kernel: Lustre: 31721:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 179 previous similar messages Apr 22 10:58:41 lfs-oss-1-13 kernel: LustreError: 31806:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff8105adc6bc00 x1398900888555606/t0 o8->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 368/264 e 0 to 0 dl 1335092421 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 10:58:41 lfs-oss-1-13 kernel: LustreError: 31806:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 26 previous similar messages Apr 22 10:58:42 lfs-oss-1-13 kernel: LustreError: 32130:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c24b63800 x1398900888554405/t0 o3->c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID:0/0 lens 448/400 e 0 to 0 dl 1335092990 ref 1 fl Interpret:/2/0 rc 0/0 Apr 22 10:58:42 lfs-oss-1-13 kernel: LustreError: 32130:0:(ost_handler.c:829:ost_brw_read()) Skipped 20 previous similar messages Apr 22 11:00:08 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810332b96000 Apr 22 11:00:08 lfs-oss-1-13 kernel: Lustre: 9201:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.0.204@o2ib Apr 22 11:00:08 lfs-oss-1-13 kernel: Lustre: 9201:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 7 previous similar messages Apr 22 11:00:56 lfs-oss-1-13 kernel: Lustre: 31770:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897322233685 sent from scratch1-OST008c to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 11:00:56 lfs-oss-1-13 kernel: req@ffff810c29cc8000 x1398897322233685/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335092456 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 11:00:56 lfs-oss-1-13 kernel: Lustre: 31770:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Apr 22 11:00:56 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008c: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 11:00:56 lfs-oss-1-13 kernel: LustreError: Skipped 18 previous similar messages Apr 22 11:01:10 lfs-oss-1-13 kernel: Lustre: 31970:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0084: refuse reconnection from c49d8140-06a7-779c-f541-694bd8aab9b4@10.174.0.204@o2ib to 0xffff8105eb2f0200; still busy with 1 active RPCs Apr 22 11:01:10 lfs-oss-1-13 kernel: Lustre: 31970:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 22 previous similar messages Apr 22 11:02:28 lfs-oss-1-13 kernel: LustreError: 21811:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81046d4e6000 Apr 22 11:02:28 lfs-oss-1-13 kernel: LustreError: 21814:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106a4378000 Apr 22 11:02:28 lfs-oss-1-13 kernel: LustreError: 21817:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81017e8ec000 Apr 22 11:02:28 lfs-oss-1-13 kernel: LustreError: 21813:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810304bb6000 Apr 22 11:02:28 lfs-oss-1-13 kernel: LustreError: 782:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335092548 with bad export cookie 14745250233845462146 Apr 22 11:02:28 lfs-oss-1-13 kernel: LustreError: 782:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) Skipped 6 previous similar messages Apr 22 11:02:32 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 1 seconds Apr 22 11:02:32 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 3 previous similar messages Apr 22 11:02:32 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.0.204@o2ib (21) Apr 22 11:02:32 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 3 previous similar messages Apr 22 11:02:32 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810662b6c000 Apr 22 11:03:55 lfs-oss-1-13 kernel: LustreError: 21824:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8102fd3ab000 Apr 22 11:03:55 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 11:03:59 lfs-oss-1-13 kernel: LustreError: 32150:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT req@ffff8108b5805800 x1398900878131823/t0 o3->df2e0f6d-50a6-f345-0e51-0137be7a5fd1@NET_0x500000aae0e2b_UUID:0/0 lens 448/400 e 2 to 0 dl 1335092773 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:05:54 lfs-oss-1-13 kernel: LustreError: 21825:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810bcc8ef000 Apr 22 11:06:12 lfs-oss-1-13 kernel: Lustre: 32195:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0084: ignoring bulk IO comm error with c49d8140-06a7-779c-f541-694bd8aab9b4@NET_0x500000aae00cc_UUID id 12345-10.174.0.204@o2ib - client will retry Apr 22 11:06:12 lfs-oss-1-13 kernel: Lustre: 32195:0:(ost_handler.c:887:ost_brw_read()) Skipped 14 previous similar messages Apr 22 11:07:00 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0089: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 11:07:00 lfs-oss-1-13 kernel: LustreError: Skipped 12 previous similar messages Apr 22 11:07:05 lfs-oss-1-13 kernel: LustreError: 32496:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335092825 with bad export cookie 14745250233844956662 Apr 22 11:07:17 lfs-oss-1-13 kernel: LustreError: 21812:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109436d7000 Apr 22 11:08:20 lfs-oss-1-13 kernel: LustreError: 21803:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8106689da000 Apr 22 11:08:23 lfs-oss-1-13 kernel: Lustre: 31906:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST008b: c49d8140-06a7-779c-f541-694bd8aab9b4 reconnecting Apr 22 11:08:23 lfs-oss-1-13 kernel: Lustre: 31906:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 59 previous similar messages Apr 22 11:18:35 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 9c5b2e9f-7628-0434-e306-fe0f7f302694 reconnecting Apr 22 11:18:35 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 10 previous similar messages Apr 22 11:18:35 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST0089: refuse reconnection from 9c5b2e9f-7628-0434-e306-fe0f7f302694@10.174.14.49@o2ib to 0xffff810c24797a00; still busy with 1 active RPCs Apr 22 11:18:35 lfs-oss-1-13 kernel: Lustre: 31772:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 5 previous similar messages Apr 22 11:18:35 lfs-oss-1-13 kernel: LustreError: 31772:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c20b3f000 x1398900877161561/t0 o8->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 368/264 e 0 to 0 dl 1335093615 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 11:18:35 lfs-oss-1-13 kernel: LustreError: 31772:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 7 previous similar messages Apr 22 11:18:35 lfs-oss-1-13 kernel: LustreError: 32157:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105a9320800 x1398900877161498/t0 o3->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/400 e 1 to 0 dl 1335093541 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:18:35 lfs-oss-1-13 kernel: LustreError: 32157:0:(ost_handler.c:829:ost_brw_read()) Skipped 6 previous similar messages Apr 22 11:18:43 lfs-oss-1-13 kernel: Lustre: 32157:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST0089: ignoring bulk IO comm error with 9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID id 12345-10.174.14.49@o2ib - client will retry Apr 22 11:18:43 lfs-oss-1-13 kernel: Lustre: 32157:0:(ost_handler.c:887:ost_brw_read()) Skipped 2 previous similar messages Apr 22 11:19:00 lfs-oss-1-13 kernel: LustreError: 32226:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff8105b477d000 x1398900878884482/t0 o4->fc9defa2-847d-ec6c-210a-37d1e2ea24a1@NET_0x500000aae0e2f_UUID:0/0 lens 448/416 e 0 to 0 dl 1335093547 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:19:00 lfs-oss-1-13 kernel: Lustre: 32226:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0089: ignoring bulk IO comm error with fc9defa2-847d-ec6c-210a-37d1e2ea24a1@NET_0x500000aae0e2f_UUID id 12345-10.174.14.47@o2ib - client will retry Apr 22 11:19:00 lfs-oss-1-13 kernel: Lustre: 32226:0:(ost_handler.c:1224:ost_brw_write()) Skipped 4 previous similar messages Apr 22 11:19:17 lfs-oss-1-13 kernel: LustreError: 32032:0:(ost_handler.c:1078:ost_brw_write()) @@@ ptlrpc_bulk_get failed: rc -107 req@ffff810c1558dc00 x1398900877162013/t0 o4->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/416 e 0 to 0 dl 1335093564 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:19:17 lfs-oss-1-13 kernel: LustreError: 32045:0:(ost_handler.c:1064:ost_brw_write()) @@@ Reconnect on bulk GET req@ffff810874316000 x1398900877161984/t0 o4->9c5b2e9f-7628-0434-e306-fe0f7f302694@NET_0x500000aae0e31_UUID:0/0 lens 448/416 e 0 to 0 dl 1335093564 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:19:17 lfs-oss-1-13 kernel: LustreError: 32045:0:(ost_handler.c:1064:ost_brw_write()) Skipped 2 previous similar messages Apr 22 11:32:35 lfs-oss-1-13 kernel: Lustre: 31941:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 4dafe478-ef2f-f548-4940-cfce714d2f7d reconnecting Apr 22 11:32:35 lfs-oss-1-13 kernel: Lustre: 31941:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 36 previous similar messages Apr 22 11:32:35 lfs-oss-1-13 kernel: Lustre: 31901:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from 4dafe478-ef2f-f548-4940-cfce714d2f7d@10.174.1.88@o2ib to 0xffff8105bb5c4a00; still busy with 1 active RPCs Apr 22 11:32:35 lfs-oss-1-13 kernel: Lustre: 31901:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 8 previous similar messages Apr 22 11:32:35 lfs-oss-1-13 kernel: LustreError: 31901:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810c1a93bc00 x1399131993284574/t0 o8->4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID:0/0 lens 368/264 e 0 to 0 dl 1335094455 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 11:32:35 lfs-oss-1-13 kernel: LustreError: 31901:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 8 previous similar messages Apr 22 11:32:35 lfs-oss-1-13 kernel: LustreError: 32073:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff8105a2e4a400 x1399131993284162/t0 o3->4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID:0/0 lens 448/400 e 1 to 0 dl 1335094413 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:32:35 lfs-oss-1-13 kernel: LustreError: 32073:0:(ost_handler.c:829:ost_brw_read()) Skipped 3 previous similar messages Apr 22 11:32:35 lfs-oss-1-13 kernel: Lustre: 32073:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008b: ignoring bulk IO comm error with 4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID id 12345-10.174.1.88@o2ib - client will retry Apr 22 11:32:35 lfs-oss-1-13 kernel: Lustre: 32073:0:(ost_handler.c:887:ost_brw_read()) Skipped 3 previous similar messages Apr 22 11:33:24 lfs-oss-1-13 kernel: Lustre: 31811:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897322841392 sent from scratch1-OST008d to NID 10.174.14.43@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 11:33:24 lfs-oss-1-13 kernel: req@ffff810807b0f400 x1398897322841392/t0 o104->@NET_0x500000aae0e2b_UUID:15/16 lens 296/384 e 0 to 1 dl 1335094404 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 11:33:24 lfs-oss-1-13 kernel: Lustre: 31811:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Apr 22 11:33:24 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008d: A client on nid 10.174.14.43@o2ib was evicted due to a lock blocking callback to 10.174.14.43@o2ib timed out: rc -107 Apr 22 11:33:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff810c18bc2000 x1398897322851193/t0 o105->@NET_0x500000aae0e2b_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 11:33:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 22 11:33:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.14.43@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008d_UUID lock: ffff810b859f6600/0xcca1a6f6c94773b8 lrc: 3/0,0 mode: PW/PW res: 33362617/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0x382dcc42b9ee0666 expref: 7 pid: 31811 timeout 0 Apr 22 11:33:24 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) Skipped 1 previous similar message Apr 22 11:33:24 lfs-oss-1-13 kernel: LustreError: 785:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.14.43@o2ib arrived at 1335094404 with bad export cookie 14745250233850069959 Apr 22 11:34:47 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.1.88@o2ib ns: filter-scratch1-OST0084_UUID lock: ffff810479ba9c00/0xcca1a6f6c94655fa lrc: 3/0,0 mode: PW/PW res: 33361777/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->671743) flags: 0x20 remote: 0xc4ce34cc31e4b18d expref: 8 pid: 31826 timeout 5300973672 Apr 22 11:34:47 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) Skipped 3 previous similar messages Apr 22 11:37:06 lfs-oss-1-13 kernel: Lustre: 31922:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897322980278 sent from scratch1-OST008e to NID 10.174.1.88@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 11:37:06 lfs-oss-1-13 kernel: req@ffff8105e9187400 x1398897322980278/t0 o104->@NET_0x500000aae0158_UUID:15/16 lens 296/384 e 0 to 1 dl 1335094626 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 11:37:06 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008e: A client on nid 10.174.1.88@o2ib was evicted due to a lock blocking callback to 10.174.1.88@o2ib timed out: rc -107 Apr 22 11:37:55 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8109d1d9a000 Apr 22 11:37:55 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8103c0e5d000 Apr 22 11:37:55 lfs-oss-1-13 kernel: LustreError: 21822:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103c0e5d000 Apr 22 11:37:55 lfs-oss-1-13 kernel: LustreError: 32177:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(593280) req@ffff8105a9554400 x1399131993290103/t0 o4->4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID:0/0 lens 448/416 e 0 to 0 dl 1335094705 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:37:55 lfs-oss-1-13 kernel: Lustre: 32177:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST008b: ignoring bulk IO comm error with 4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID id 12345-10.174.1.88@o2ib - client will retry Apr 22 11:37:55 lfs-oss-1-13 kernel: Lustre: 32177:0:(ost_handler.c:1224:ost_brw_write()) Skipped 2 previous similar messages Apr 22 11:37:55 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Conn race 10.174.1.88@o2ib Apr 22 11:37:55 lfs-oss-1-13 kernel: Lustre: 9213:0:(o2iblnd_cb.c:2257:kiblnd_passive_connect()) Skipped 5 previous similar messages Apr 22 11:40:27 lfs-oss-1-13 kernel: LustreError: 21804:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81050dac0000 Apr 22 11:40:52 lfs-oss-1-13 kernel: Lustre: scratch1-OST0084: haven't heard from client 4dafe478-ef2f-f548-4940-cfce714d2f7d (at 10.174.1.88@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 11:40:52 lfs-oss-1-13 kernel: Lustre: Skipped 2 previous similar messages Apr 22 11:41:59 lfs-oss-1-13 kernel: Lustre: 5540:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398897323039598 sent from scratch1-OST008c to NID 10.174.1.88@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 11:41:59 lfs-oss-1-13 kernel: req@ffff810ad965e800 x1398897323039598/t0 o105->@NET_0x500000aae0158_UUID:15/16 lens 344/384 e 0 to 1 dl 1335094919 ref 2 fl Rpc:N/0/0 rc 0/0 Apr 22 11:41:59 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST008c: A client on nid 10.174.1.88@o2ib was evicted due to a lock completion callback to 10.174.1.88@o2ib timed out: rc -107 Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Timed out tx: active_txs, 0 seconds Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2912:kiblnd_check_txs()) Skipped 1 previous similar message Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Timed out RDMA with 10.174.1.88@o2ib (31) Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21826:0:(o2iblnd_cb.c:2975:kiblnd_check_conns()) Skipped 1 previous similar message Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21821:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff81013ee98000 Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff810a9eeea000 Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 21806:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff810a9eeea000 Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 32085:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(915472) req@ffff8105e4945c00 x1399131993384619/t0 o4->4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID:0/0 lens 448/416 e 0 to 0 dl 1335095083 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 32085:0:(ost_handler.c:1224:ost_brw_write()) scratch1-OST0085: ignoring bulk IO comm error with 4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID id 12345-10.174.1.88@o2ib - client will retry Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 31713:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 4dafe478-ef2f-f548-4940-cfce714d2f7d reconnecting Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 31909:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0089: 4dafe478-ef2f-f548-4940-cfce714d2f7d reconnecting Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 31909:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 45 previous similar messages Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 31713:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 45 previous similar messages Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 31740:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-OST008b: refuse reconnection from 4dafe478-ef2f-f548-4940-cfce714d2f7d@10.174.1.88@o2ib to 0xffff8105bb5c4a00; still busy with 1 active RPCs Apr 22 11:43:26 lfs-oss-1-13 kernel: Lustre: 31740:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 4 previous similar messages Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 31740:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff810b07dbe800 x1399131993385551/t0 o8->4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID:0/0 lens 368/264 e 0 to 0 dl 1335095106 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 11:43:26 lfs-oss-1-13 kernel: LustreError: 31740:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 6 previous similar messages Apr 22 11:43:27 lfs-oss-1-13 kernel: LustreError: 32138:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT req@ffff810c16bf9800 x1399131993384604/t0 o3->4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID:0/0 lens 448/400 e 0 to 0 dl 1335095092 ref 1 fl Interpret:/0/0 rc 0/0 Apr 22 11:43:27 lfs-oss-1-13 kernel: LustreError: 32138:0:(ost_handler.c:829:ost_brw_read()) Skipped 3 previous similar messages Apr 22 11:43:27 lfs-oss-1-13 kernel: Lustre: 32138:0:(ost_handler.c:887:ost_brw_read()) scratch1-OST008b: ignoring bulk IO comm error with 4dafe478-ef2f-f548-4940-cfce714d2f7d@NET_0x500000aae0158_UUID id 12345-10.174.1.88@o2ib - client will retry Apr 22 11:43:27 lfs-oss-1-13 kernel: Lustre: 32138:0:(ost_handler.c:887:ost_brw_read()) Skipped 3 previous similar messages Apr 22 11:44:30 lfs-oss-1-13 kernel: LustreError: 138-a: scratch1-OST0088: A client on nid 10.174.1.88@o2ib was evicted due to a lock blocking callback to 10.174.1.88@o2ib timed out: rc -107 Apr 22 11:44:40 lfs-oss-1-13 kernel: LustreError: 11768:0:(ldlm_lockd.c:1883:ldlm_cancel_handler()) ldlm_cancel from 10.174.1.88@o2ib arrived at 1335095080 with bad export cookie 14745250231713888331 Apr 22 11:45:42 lfs-oss-1-13 kernel: LustreError: 21807:0:(events.c:381:server_bulk_callback()) event type 4, status -5, desc ffff8107d8a50000 Apr 22 11:47:22 lfs-oss-1-13 kernel: LustreError: 0:0:(ldlm_lockd.c:313:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.174.1.88@o2ib ns: filter-scratch1-OST0085_UUID lock: ffff810a27401400/0xcca1a6f6c980844f lrc: 3/0,0 mode: PR/PR res: 33379783/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xc4ce34cc31ed4897 expref: 11 pid: 31724 timeout 5301728825 Apr 22 11:47:22 lfs-oss-1-13 kernel: LustreError: 31710:0:(client.c:841:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff81087c3b9000 x1398897323186918/t0 o105->@NET_0x500000aae0158_UUID:15/16 lens 344/384 e 0 to 1 dl 0 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 11:47:22 lfs-oss-1-13 kernel: LustreError: 31710:0:(ldlm_lockd.c:612:ldlm_handle_ast_error()) ### client (nid 10.174.1.88@o2ib) returned 0 from completion AST ns: filter-scratch1-OST008e_UUID lock: ffff8107f5c2b000/0xcca1a6f6c980ed79 lrc: 3/0,0 mode: PW/PW res: 33382404/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->159743) flags: 0x0 remote: 0xc4ce34cc31ed4977 expref: 6 pid: 31956 timeout 0 Apr 22 15:55:34 lfs-oss-1-13 kernel: Lustre: scratch1-OST0084: haven't heard from client 47b07d89-c3f8-a00a-de0e-9357a7d50b60 (at 10.174.12.155@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 15:59:12 lfs-oss-1-13 kernel: Lustre: 31911:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: efeb141a-c225-44d7-e68f-877751c3c514 reconnecting Apr 22 15:59:12 lfs-oss-1-13 kernel: Lustre: 31911:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 17 previous similar messages Apr 22 15:59:12 lfs-oss-1-13 kernel: LustreError: 137-5: UUID 'scratch1-OST008f_UUID' is not available for connect (no target) Apr 22 15:59:12 lfs-oss-1-13 kernel: LustreError: Skipped 43 previous similar messages Apr 22 15:59:12 lfs-oss-1-13 kernel: LustreError: 31895:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-19) req@ffff810c2af6e400 x1399964527757835/t0 o8->@:0/0 lens 368/0 e 0 to 0 dl 1335110452 ref 1 fl Interpret:/0/0 rc -19/0 Apr 22 15:59:12 lfs-oss-1-13 kernel: LustreError: 31895:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 4 previous similar messages Apr 22 15:59:12 lfs-oss-1-13 kernel: LustreError: 137-5: UUID 'scratch1-OST0093_UUID' is not available for connect (no target) Apr 22 15:59:12 lfs-oss-1-13 kernel: LustreError: Skipped 14 previous similar messages Apr 22 16:34:55 lfs-oss-1-13 kernel: Lustre: 31969:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-OST0084: 79d187a2-7e57-51d2-202c-257598a636a0 reconnecting Apr 22 16:34:55 lfs-oss-1-13 kernel: Lustre: 31969:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 55 previous similar messages Apr 22 16:34:55 lfs-oss-1-13 kernel: LustreError: 137-5: UUID 'scratch1-OST0090_UUID' is not available for connect (no target) Apr 22 16:34:55 lfs-oss-1-13 kernel: LustreError: Skipped 28 previous similar messages Apr 22 16:34:55 lfs-oss-1-13 kernel: LustreError: 31761:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-19) req@ffff8105f5c85800 x1399966798973449/t0 o8->@:0/0 lens 368/0 e 0 to 0 dl 1335112595 ref 1 fl Interpret:/0/0 rc -19/0 Apr 22 16:34:55 lfs-oss-1-13 kernel: LustreError: 31761:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 43 previous similar messages Apr 22 16:34:55 lfs-oss-1-13 kernel: LustreError: 137-5: UUID 'scratch1-OST008f_UUID' is not available for connect (no target) Apr 22 16:34:55 lfs-oss-1-13 kernel: LustreError: Skipped 43 previous similar messages