Apr 22 00:53:30 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899924906237 sent from scratch1-OST0086-osc to NID 10.174.31.213@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 00:53:30 lfs-mds-1-1 kernel: req@ffff810906890c00 x1398899924906237/t0 o13->scratch1-OST0086_UUID@10.174.31.213@o2ib:7/4 lens 192/528 e 0 to 1 dl 1335056010 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 00:53:30 lfs-mds-1-1 kernel: Lustre: scratch1-OST0086-osc: Connection to service scratch1-OST0086 via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 00:53:32 lfs-mds-1-1 kernel: Lustre: scratch1-OST0086-osc: Connection restored to service scratch1-OST0086 using nid 10.174.31.213@o2ib. Apr 22 00:53:32 lfs-mds-1-1 kernel: Lustre: Skipped 30 previous similar messages Apr 22 00:53:32 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST008a_UUID now active, resetting orphans Apr 22 00:53:32 lfs-mds-1-1 kernel: Lustre: Skipped 173 previous similar messages Apr 22 00:53:32 lfs-mds-1-1 kernel: LustreError: 23645:0:(quota_master.c:1698:qmaster_recovery_main()) scratch1-MDT0000: qmaster recovery failed for gid 11944 rc:-11) Apr 22 01:59:18 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937028377 sent from scratch1-OST0089-osc to NID 10.174.31.213@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 01:59:18 lfs-mds-1-1 kernel: req@ffff8108c8f4c000 x1398899937028377/t0 o13->scratch1-OST0089_UUID@10.174.31.213@o2ib:7/4 lens 192/528 e 0 to 1 dl 1335059958 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:59:18 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Apr 22 01:59:18 lfs-mds-1-1 kernel: Lustre: scratch1-OST0089-osc: Connection to service scratch1-OST0089 via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 01:59:18 lfs-mds-1-1 kernel: Lustre: Skipped 7 previous similar messages Apr 22 01:59:19 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937029949 sent from scratch1-OST008e-osc to NID 10.174.31.213@o2ib 7s ago has timed out (7s prior to deadline). Apr 22 01:59:19 lfs-mds-1-1 kernel: req@ffff81095fb29c00 x1398899937029949/t0 o5->scratch1-OST008e_UUID@10.174.31.213@o2ib:7/4 lens 400/592 e 0 to 1 dl 1335059959 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:59:19 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 01:59:19 lfs-mds-1-1 kernel: Lustre: scratch1-OST0086-osc: Connection to service scratch1-OST0086 via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 01:59:19 lfs-mds-1-1 kernel: Lustre: Skipped 3 previous similar messages Apr 22 01:59:24 lfs-mds-1-1 kernel: Lustre: 25374:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937034790 sent from scratch1-OST0089-osc to NID 10.174.31.213@o2ib 6s ago has timed out (6s prior to deadline). Apr 22 01:59:24 lfs-mds-1-1 kernel: req@ffff8108c52e0800 x1398899937034790/t0 o8->scratch1-OST0089_UUID@10.174.31.213@o2ib:28/4 lens 368/584 e 0 to 1 dl 1335059964 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:59:24 lfs-mds-1-1 kernel: Lustre: 25374:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Apr 22 01:59:26 lfs-mds-1-1 kernel: Lustre: 25374:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937034791 sent from scratch1-OST008b-osc to NID 10.174.31.213@o2ib 8s ago has timed out (8s prior to deadline). Apr 22 01:59:26 lfs-mds-1-1 kernel: req@ffff81092328d000 x1398899937034791/t0 o8->scratch1-OST008b_UUID@10.174.31.213@o2ib:28/4 lens 368/584 e 0 to 1 dl 1335059966 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:59:26 lfs-mds-1-1 kernel: Lustre: 25374:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 01:59:31 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937036105 sent from scratch1-OST0084-osc to NID 10.174.31.213@o2ib 9s ago has timed out (9s prior to deadline). Apr 22 01:59:31 lfs-mds-1-1 kernel: req@ffff81122d988c00 x1398899937036105/t0 o13->scratch1-OST0084_UUID@10.174.31.213@o2ib:7/4 lens 192/528 e 0 to 1 dl 1335059971 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 01:59:31 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 01:59:31 lfs-mds-1-1 kernel: Lustre: scratch1-OST0084-osc: Connection to service scratch1-OST0084 via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 01:59:31 lfs-mds-1-1 kernel: Lustre: Skipped 2 previous similar messages Apr 22 01:59:32 lfs-mds-1-1 kernel: Lustre: 25375:0:(import.c:517:import_select_connection()) scratch1-OST0089-osc: tried all connections, increasing latency to 2s Apr 22 01:59:32 lfs-mds-1-1 kernel: Lustre: 25375:0:(import.c:517:import_select_connection()) Skipped 94 previous similar messages Apr 22 01:59:37 lfs-mds-1-1 kernel: Lustre: scratch1-OST0084-osc: Connection restored to service scratch1-OST0084 using nid 10.174.31.213@o2ib. Apr 22 01:59:37 lfs-mds-1-1 kernel: Lustre: Skipped 7 previous similar messages Apr 22 01:59:37 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST0085_UUID now active, resetting orphans Apr 22 01:59:37 lfs-mds-1-1 kernel: Lustre: Skipped 7 previous similar messages Apr 22 01:59:37 lfs-mds-1-1 kernel: LustreError: 5051:0:(quota_master.c:1698:qmaster_recovery_main()) scratch1-MDT0000: qmaster recovery failed for gid 11944 rc:-11) Apr 22 01:59:37 lfs-mds-1-1 kernel: LustreError: 5051:0:(quota_master.c:1698:qmaster_recovery_main()) Skipped 1 previous similar message Apr 22 01:59:37 lfs-mds-1-1 kernel: LustreError: 5055:0:(quota_master.c:1698:qmaster_recovery_main()) scratch1-MDT0000: qmaster recovery failed for gid 19751 rc:-11) Apr 22 01:59:37 lfs-mds-1-1 kernel: LustreError: 5055:0:(quota_master.c:1698:qmaster_recovery_main()) Skipped 1 previous similar message Apr 22 02:00:10 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937073664 sent from scratch1-OST008b-osc to NID 10.174.31.213@o2ib 10s ago has timed out (10s prior to deadline). Apr 22 02:00:10 lfs-mds-1-1 kernel: req@ffff8108d094c400 x1398899937073664/t0 o13->scratch1-OST008b_UUID@10.174.31.213@o2ib:7/4 lens 192/528 e 0 to 1 dl 1335060010 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:00:10 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Apr 22 02:00:10 lfs-mds-1-1 kernel: Lustre: scratch1-OST008b-osc: Connection to service scratch1-OST008b via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 02:00:10 lfs-mds-1-1 kernel: Lustre: Skipped 3 previous similar messages Apr 22 02:00:19 lfs-mds-1-1 kernel: Lustre: 25374:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899937074050 sent from scratch1-OST008b-osc to NID 10.174.31.213@o2ib 9s ago has timed out (9s prior to deadline). Apr 22 02:00:19 lfs-mds-1-1 kernel: req@ffff8108b21f4c00 x1398899937074050/t0 o8->scratch1-OST008b_UUID@10.174.31.213@o2ib:28/4 lens 368/584 e 0 to 1 dl 1335060019 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:00:19 lfs-mds-1-1 kernel: Lustre: 25374:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 02:00:30 lfs-mds-1-1 kernel: Lustre: 25375:0:(import.c:517:import_select_connection()) scratch1-OST008b-osc: tried all connections, increasing latency to 24s Apr 22 02:00:30 lfs-mds-1-1 kernel: Lustre: 25375:0:(import.c:517:import_select_connection()) Skipped 6 previous similar messages Apr 22 02:00:30 lfs-mds-1-1 kernel: Lustre: scratch1-OST008b-osc: Connection restored to service scratch1-OST008b using nid 10.174.31.213@o2ib. Apr 22 02:00:30 lfs-mds-1-1 kernel: Lustre: Skipped 10 previous similar messages Apr 22 02:00:30 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST008b_UUID now active, resetting orphans Apr 22 02:00:30 lfs-mds-1-1 kernel: Lustre: Skipped 10 previous similar messages Apr 22 02:00:30 lfs-mds-1-1 kernel: LustreError: 5325:0:(quota_master.c:1698:qmaster_recovery_main()) scratch1-MDT0000: qmaster recovery failed for gid 11944 rc:-11) Apr 22 02:00:30 lfs-mds-1-1 kernel: LustreError: 5325:0:(quota_master.c:1698:qmaster_recovery_main()) Skipped 4 previous similar messages Apr 22 02:00:39 lfs-mds-1-1 kernel: Lustre: scratch1-OST008d-osc: Connection restored to service scratch1-OST008d using nid 10.174.31.213@o2ib. Apr 22 02:00:39 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST008e_UUID now active, resetting orphans Apr 22 02:16:33 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899940636846 sent from scratch1-OST0085-osc to NID 10.174.31.213@o2ib 11s ago has timed out (11s prior to deadline). Apr 22 02:16:33 lfs-mds-1-1 kernel: req@ffff8108af233400 x1398899940636846/t0 o13->scratch1-OST0085_UUID@10.174.31.213@o2ib:7/4 lens 192/528 e 0 to 1 dl 1335060993 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:16:33 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 02:16:33 lfs-mds-1-1 kernel: Lustre: scratch1-OST0085-osc: Connection to service scratch1-OST0085 via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 02:16:33 lfs-mds-1-1 kernel: Lustre: Skipped 2 previous similar messages Apr 22 02:16:38 lfs-mds-1-1 kernel: Lustre: scratch1-OST0088-osc: Connection restored to service scratch1-OST0088 using nid 10.174.31.213@o2ib. Apr 22 02:16:38 lfs-mds-1-1 kernel: Lustre: Skipped 1 previous similar message Apr 22 02:16:38 lfs-mds-1-1 kernel: LustreError: 8854:0:(quota_master.c:1698:qmaster_recovery_main()) scratch1-MDT0000: qmaster recovery failed for gid 11944 rc:-11) Apr 22 02:16:38 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST0088_UUID now active, resetting orphans Apr 22 02:16:38 lfs-mds-1-1 kernel: Lustre: Skipped 1 previous similar message Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1398899947523012 sent from scratch1-OST0084-osc to NID 10.174.31.213@o2ib 10s ago has timed out (10s prior to deadline). Apr 22 02:59:53 lfs-mds-1-1 kernel: req@ffff81123a5a9000 x1398899947523012/t0 o13->scratch1-OST0084_UUID@10.174.31.213@o2ib:7/4 lens 192/528 e 0 to 1 dl 1335063593 ref 1 fl Rpc:N/0/0 rc 0/0 Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: 25373:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: scratch1-OST0084-osc: Connection to service scratch1-OST0084 via nid 10.174.31.213@o2ib was lost; in progress operations using this service will wait for recovery to complete. Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: Skipped 2 previous similar messages Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: scratch1-OST0084-osc: Connection restored to service scratch1-OST0084 using nid 10.174.31.213@o2ib. Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: Skipped 2 previous similar messages Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST0084_UUID now active, resetting orphans Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: Skipped 2 previous similar messages Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: MDS scratch1-MDT0000: scratch1-OST0087_UUID now active, resetting orphans Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: Skipped 3 previous similar messages Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: scratch1-OST0088-osc: Connection restored to service scratch1-OST0088 using nid 10.174.31.213@o2ib. Apr 22 02:59:53 lfs-mds-1-1 kernel: Lustre: Skipped 4 previous similar messages Apr 22 02:59:53 lfs-mds-1-1 kernel: LustreError: 17668:0:(quota_master.c:1698:qmaster_recovery_main()) scratch1-MDT0000: qmaster recovery failed for gid 19751 rc:-11) Apr 22 04:34:56 lfs-mds-1-1 kernel: Lustre: 3152:0:(ldlm_lib.c:574:target_handle_reconnect()) MGS: 0823569e-ecae-599d-cfee-314af2a952df reconnecting Apr 22 04:34:56 lfs-mds-1-1 kernel: Lustre: 3152:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 1 previous similar message Apr 22 04:35:01 lfs-mds-1-1 kernel: Lustre: 25936:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-MDT0000: 62cc34d7-c801-d114-13d9-55692d721c93 reconnecting Apr 22 04:35:01 lfs-mds-1-1 kernel: Lustre: 25936:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 8 previous similar messages Apr 22 04:35:03 lfs-mds-1-1 kernel: Lustre: 3169:0:(ldlm_lib.c:574:target_handle_reconnect()) MGS: 0823569e-ecae-599d-cfee-314af2a952df reconnecting Apr 22 09:01:37 lfs-mds-1-1 kernel: LustreError: 28854:0:(llog_server.c:466:llog_origin_handle_cancel()) Cancel 1 of 122 llog-records failed: -2 Apr 22 09:01:37 lfs-mds-1-1 kernel: LustreError: 28854:0:(llog_server.c:466:llog_origin_handle_cancel()) Skipped 34 previous similar messages Apr 22 10:03:13 lfs-mds-1-1 kernel: LDISKFS-fs (dm-22): mounted filesystem with ordered data mode Apr 22 15:55:34 lfs-mds-1-1 kernel: Lustre: scratch1-MDT0000: haven't heard from client 47b07d89-c3f8-a00a-de0e-9357a7d50b60 (at 10.174.12.155@o2ib) in 227 seconds. I think it's dead, and I am evicting it. Apr 22 15:55:34 lfs-mds-1-1 kernel: Lustre: Skipped 1 previous similar message Apr 22 15:59:12 lfs-mds-1-1 kernel: Lustre: 3180:0:(ldlm_lib.c:574:target_handle_reconnect()) MGS: 3758607a-d83e-b71f-1c4f-072513430b90 reconnecting Apr 22 15:59:12 lfs-mds-1-1 kernel: Lustre: 3171:0:(ldlm_lib.c:874:target_handle_connect()) MGS: refuse reconnection from 3758607a-d83e-b71f-1c4f-072513430b90@10.174.12.155@o2ib to 0xffff8108b3dbe800; still busy with 1 active RPCs Apr 22 15:59:12 lfs-mds-1-1 kernel: LustreError: 3171:0:(mgs_handler.c:673:mgs_handle()) MGS handle cmd=250 rc=-16 Apr 22 15:59:12 lfs-mds-1-1 kernel: LustreError: 25946:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81091d877050 x1399964527758091/t0 o38->efeb141a-c225-44d7-e68f-877751c3c514@NET_0x500000aae0c9b_UUID:0/0 lens 368/264 e 0 to 0 dl 1335110452 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 15:59:12 lfs-mds-1-1 kernel: LustreError: 25946:0:(ldlm_lib.c:1919:target_send_reply_msg()) Skipped 1 previous similar message Apr 22 15:59:17 lfs-mds-1-1 kernel: Lustre: 25943:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-MDT0000: efeb141a-c225-44d7-e68f-877751c3c514 reconnecting Apr 22 15:59:17 lfs-mds-1-1 kernel: Lustre: 25943:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 8 previous similar messages Apr 22 15:59:19 lfs-mds-1-1 kernel: Lustre: 3179:0:(ldlm_lib.c:574:target_handle_reconnect()) MGS: 3758607a-d83e-b71f-1c4f-072513430b90 reconnecting Apr 22 16:34:55 lfs-mds-1-1 kernel: Lustre: 3162:0:(ldlm_lib.c:574:target_handle_reconnect()) MGS: 31747e0f-4af8-ffbc-6316-13eda99c6921 reconnecting Apr 22 16:34:55 lfs-mds-1-1 kernel: Lustre: 12170:0:(ldlm_lib.c:874:target_handle_connect()) scratch1-MDT0000: refuse reconnection from 79d187a2-7e57-51d2-202c-257598a636a0@10.174.8.64@o2ib to 0xffff8108b9f12e00; still busy with 1 active RPCs Apr 22 16:34:55 lfs-mds-1-1 kernel: Lustre: 12170:0:(ldlm_lib.c:874:target_handle_connect()) Skipped 1 previous similar message Apr 22 16:34:55 lfs-mds-1-1 kernel: LustreError: 12170:0:(ldlm_lib.c:1919:target_send_reply_msg()) @@@ processing error (-16) req@ffff81122a52bc00 x1399966798973125/t0 o38->79d187a2-7e57-51d2-202c-257598a636a0@NET_0x500000aae0840_UUID:0/0 lens 368/264 e 0 to 0 dl 1335112595 ref 1 fl Interpret:/0/0 rc -16/0 Apr 22 16:34:55 lfs-mds-1-1 kernel: Lustre: 12185:0:(ldlm_lib.c:803:target_handle_connect()) scratch1-MDT0000: exp ffff8108b9f12e00 already connecting Apr 22 16:35:00 lfs-mds-1-1 kernel: Lustre: 12163:0:(ldlm_lib.c:574:target_handle_reconnect()) scratch1-MDT0000: 79d187a2-7e57-51d2-202c-257598a636a0 reconnecting Apr 22 16:35:00 lfs-mds-1-1 kernel: Lustre: 12163:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 16 previous similar messages Apr 22 16:35:02 lfs-mds-1-1 kernel: Lustre: 3180:0:(ldlm_lib.c:574:target_handle_reconnect()) MGS: 31747e0f-4af8-ffbc-6316-13eda99c6921 reconnecting Apr 22 16:35:02 lfs-mds-1-1 kernel: Lustre: 3180:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 1 previous similar message