Apr 26 15:35:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 15:35:18 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Apr 26 15:35:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 15:35:18 fir-md1-s1 kernel: Lustre: Skipped 213 previous similar messages Apr 26 15:41:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a0fe0889-0c25-9b23-e21f-0fe6633e4388 (at 10.9.101.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7322d1f800, cur 1556318490 expire 1556318340 last 1556318263 Apr 26 15:41:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 15:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 15:48:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9c79909a-77e2-a34a-b7c3-31dc625f9836 (at 10.9.103.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b722026a400, cur 1556318919 expire 1556318769 last 1556318692 Apr 26 15:48:39 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Apr 26 15:54:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 38282ee7-5b2e-0d0f-e08b-07dbfb6d401e (at 10.9.103.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72e2a79000, cur 1556319242 expire 1556319092 last 1556319015 Apr 26 15:54:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 16:02:10 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.68@o2ib4, removing former export from same NID Apr 26 16:02:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:02:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 16:02:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:02:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 16:02:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:03:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 3171edda-eb26-2a43-9ed5-8fcfe1212665 (at 10.9.108.13@o2ib4) Apr 26 16:08:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bbe50cfc-346c-ee8a-5e16-7b911dabb1f0 (at 10.8.14.5@o2ib6) Apr 26 16:08:26 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Apr 26 16:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 89a99511-90b9-ed05-4958-be4f82c827b0 (at 10.8.11.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7276316c00, cur 1556321081 expire 1556320931 last 1556320854 Apr 26 16:24:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 16:29:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.68@o2ib4, removing former export from same NID Apr 26 16:29:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:29:18 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Apr 26 16:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:47:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:47:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:47:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:53:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.68@o2ib4, removing former export from same NID Apr 26 16:53:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:56:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:56:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 16:56:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e50f172c-4219-0e86-02a4-fb13914da056 (at 10.8.11.3@o2ib6) Apr 26 16:56:30 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.68@o2ib4, removing former export from same NID Apr 26 17:05:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 17:05:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 17:05:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Apr 26 17:29:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 430702ad-34cc-0222-5c7a-4c9c811bccc0 (at 10.9.107.68@o2ib4) Apr 26 17:29:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72667f0c00, cur 1556324991 expire 1556324841 last 1556324764 Apr 26 17:29:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 17:52:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2307de04-312a-a4f0-e69d-51c5345bffd9 (at 10.9.103.40@o2ib4) Apr 26 17:52:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 17:54:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.43@o2ib4) Apr 26 17:54:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 18:56:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6d67e932-1276-a13a-e23d-9293929aac12 (at 10.8.20.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72d2a33400, cur 1556330174 expire 1556330024 last 1556329947 Apr 26 18:56:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 18:56:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5b99a272-040b-ad74-ab3b-5a9c00c921dd (at 10.8.20.11@o2ib6) Apr 26 18:56:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 18:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6d67e932-1276-a13a-e23d-9293929aac12 (at 10.8.20.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72f5d4c000, cur 1556330181 expire 1556330031 last 1556329954 Apr 26 18:56:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 19:32:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 74350ddd-f6df-a3ff-9d15-8ae80bfac2b5 (at 10.8.9.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72e5b32400, cur 1556332366 expire 1556332216 last 1556332139 Apr 26 22:08:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b1c719e0-e359-3672-f68c-50568515d76a (at 10.8.20.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7123218c00, cur 1556341705 expire 1556341555 last 1556341478 Apr 26 22:08:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 22:08:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7ae1ec28-f301-af37-519b-3393ab28cf65 (at 10.8.20.13@o2ib6) Apr 26 22:08:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 22:08:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b1c719e0-e359-3672-f68c-50568515d76a (at 10.8.20.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7225e2b000, cur 1556341722 expire 1556341572 last 1556341495 Apr 26 22:08:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 22:12:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9c6f341-1402-7ab1-11c3-68af210aa388 (at 10.8.30.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b730e35f000, cur 1556341948 expire 1556341798 last 1556341721 Apr 26 22:12:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b9c6f341-1402-7ab1-11c3-68af210aa388 (at 10.8.30.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7302838000, cur 1556341964 expire 1556341814 last 1556341737 Apr 26 22:12:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 22:12:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0abc0af6-aa0b-cce7-6141-f4d4bb50137a (at 10.8.30.7@o2ib6) Apr 26 22:12:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 01:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a19416cf-bdf5-3e6e-7a8a-06ebf953ccde (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7241bb6400, cur 1556352133 expire 1556351983 last 1556351906 Apr 27 01:02:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a19416cf-bdf5-3e6e-7a8a-06ebf953ccde (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7322d1e800, cur 1556352135 expire 1556351985 last 1556351908 Apr 27 01:30:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72dca6ac00, cur 1556353846 expire 1556353696 last 1556353619 Apr 27 01:30:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 27 01:30:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to faf112e6-add7-bad5-c6b8-4f1cf748e849 (at 10.8.21.21@o2ib6) Apr 27 01:30:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 01:30:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7271b1f000, cur 1556353858 expire 1556353708 last 1556353631 Apr 27 01:30:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 27 02:17:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to faf112e6-add7-bad5-c6b8-4f1cf748e849 (at 10.8.21.21@o2ib6) Apr 27 02:17:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 02:17:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 42023289-bd73-d552-9154-8822cfffd412 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b461b6c3c00, cur 1556356667 expire 1556356517 last 1556356440 Apr 27 03:40:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0e15507c-acb2-e83c-13f2-69ce5b69d859 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b48422da000, cur 1556361636 expire 1556361486 last 1556361409 Apr 27 03:40:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 03:41:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to faf112e6-add7-bad5-c6b8-4f1cf748e849 (at 10.8.21.21@o2ib6) Apr 27 03:41:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 04:07:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 94710490-ff88-a015-6011-4fd8162eb622 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b398769d400, cur 1556363279 expire 1556363129 last 1556363052 Apr 27 04:07:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 04:08:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to faf112e6-add7-bad5-c6b8-4f1cf748e849 (at 10.8.21.21@o2ib6) Apr 27 04:08:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 04:23:18 fir-md1-s1 kernel: Lustre: 105306:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b516a46d700 x1631534729122624/t0(0) o101->5735cd86-3a30-362c-bc05-c634d3fa1859@10.9.107.11@o2ib4:23/0 lens 584/3264 e 1 to 0 dl 1556364203 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 04:23:18 fir-md1-s1 kernel: Lustre: 105306:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Apr 27 04:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f5468e72-fdf8-2c00-55b2-35b2a8b48641 (at 10.9.107.9@o2ib4) reconnecting Apr 27 04:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 826fbeb7-54e9-5127-860e-c32891bc78a7 (at 10.9.107.9@o2ib4) Apr 27 04:23:24 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Apr 27 04:23:28 fir-md1-s1 kernel: Lustre: 105080:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b730bab8600 x1631558214763136/t0(0) o101->8937ba9c-f8a3-cff7-01bf-a35c8d69bba5@10.9.107.10@o2ib4:3/0 lens 584/3264 e 0 to 0 dl 1556364213 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 04:23:28 fir-md1-s1 kernel: Lustre: 105080:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 27 04:23:33 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.107.9@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b4ce731da00/0x378007f753d294a3 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x60200400000020 nid: 10.9.107.9@o2ib4 remote: 0xcaaa09a2b232a96c expref: 171 pid: 105281 timeout: 364047 lvb_type: 0 Apr 27 04:23:33 fir-md1-s1 kernel: LustreError: 105281:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72fa1f3000 ns: mdt-fir-MDT0000_UUID lock: ffff8b4ce731a400/0x378007f753d29639 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x20/0x0 rrc: 22 type: IBT flags: 0x50200000000000 nid: 10.9.107.9@o2ib4 remote: 0xcaaa09a2b232a981 expref: 90 pid: 105281 timeout: 0 lvb_type: 0 Apr 27 04:23:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 826fbeb7-54e9-5127-860e-c32891bc78a7 (at 10.9.107.9@o2ib4) Apr 27 04:23:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 04:23:58 fir-md1-s1 kernel: Lustre: 105235:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5f6671bc00 x1631534729127472/t0(0) o101->5735cd86-3a30-362c-bc05-c634d3fa1859@10.9.107.11@o2ib4:3/0 lens 584/3264 e 0 to 0 dl 1556364243 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 04:23:58 fir-md1-s1 kernel: Lustre: 105235:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 27 04:24:03 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.107.15@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b53215b0240/0x378007f754016854 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 84 type: IBT flags: 0x60200400000020 nid: 10.9.107.15@o2ib4 remote: 0xcf1976d3dcbcccc8 expref: 163 pid: 105005 timeout: 364077 lvb_type: 0 Apr 27 04:24:03 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72d2a31400 ns: mdt-fir-MDT0000_UUID lock: ffff8b51242e5a00/0x378007f754016d55 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 81 type: IBT flags: 0x50200400000020 nid: 10.9.107.15@o2ib4 remote: 0xcf1976d3dcbccce4 expref: 6 pid: 105126 timeout: 0 lvb_type: 0 Apr 27 04:24:03 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 27 04:24:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c76170eb-ba7a-cab7-b276-a54bf27a0834 (at 10.9.107.15@o2ib4) Apr 27 04:24:33 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.107.14@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b4ccd74b180/0x378007f7543196ee lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 82 type: IBT flags: 0x60200400000020 nid: 10.9.107.14@o2ib4 remote: 0xe0df9cfb42b1ec05 expref: 164 pid: 105419 timeout: 364107 lvb_type: 0 Apr 27 04:24:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8a37f7b1-3efc-30e9-f8d1-739df6680357 (at 10.9.104.19@o2ib4) reconnecting Apr 27 04:24:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:24:34 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Apr 27 04:24:58 fir-md1-s1 kernel: Lustre: 104335:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b627029b600 x1631586645120352/t0(0) o101->f5468e72-fdf8-2c00-55b2-35b2a8b48641@10.9.107.9@o2ib4:3/0 lens 480/568 e 0 to 0 dl 1556364303 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 04:24:58 fir-md1-s1 kernel: Lustre: 104335:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 66 previous similar messages Apr 27 04:25:03 fir-md1-s1 kernel: LustreError: 105127:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4c677df000 ns: mdt-fir-MDT0000_UUID lock: ffff8b62cbbc0900/0x378007f754319f99 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 74 type: IBT flags: 0x50200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c4ca804 expref: 8 pid: 105127 timeout: 0 lvb_type: 0 Apr 27 04:25:03 fir-md1-s1 kernel: LustreError: 105127:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 27 04:25:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:25:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Apr 27 04:25:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f5468e72-fdf8-2c00-55b2-35b2a8b48641 (at 10.9.107.9@o2ib4) reconnecting Apr 27 04:25:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Apr 27 04:25:33 fir-md1-s1 kernel: LustreError: 114915:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556364243, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4bc4f98240/0x378007f75431a10c lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 75 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114915 timeout: 0 lvb_type: 0 Apr 27 04:25:33 fir-md1-s1 kernel: LustreError: 114915:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 32 previous similar messages Apr 27 04:25:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:25:34 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Apr 27 04:26:03 fir-md1-s1 kernel: LustreError: 114851:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556364273, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b5f1da5f500/0x378007f75467bc45 lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 75 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114851 timeout: 0 lvb_type: 0 Apr 27 04:26:03 fir-md1-s1 kernel: LustreError: 114851:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 27 04:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 04:26:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Apr 27 04:26:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:26:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Apr 27 04:26:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 826fbeb7-54e9-5127-860e-c32891bc78a7 (at 10.9.107.9@o2ib4) Apr 27 04:26:37 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Apr 27 04:27:24 fir-md1-s1 kernel: LNet: Service thread pid 114938 was inactive for 200.58s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 04:27:24 fir-md1-s1 kernel: Pid: 114938, comm: mdt00_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 04:27:24 fir-md1-s1 kernel: Call Trace: Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 04:27:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 04:27:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 04:27:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556364444.114938 Apr 27 04:27:24 fir-md1-s1 kernel: Pid: 105123, comm: mdt02_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 04:27:24 fir-md1-s1 kernel: Call Trace: Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 04:27:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 04:27:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 04:27:24 fir-md1-s1 kernel: Pid: 105058, comm: mdt01_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 04:27:24 fir-md1-s1 kernel: Call Trace: Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 04:27:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 04:27:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 04:27:24 fir-md1-s1 kernel: Pid: 105234, comm: mdt02_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 04:27:24 fir-md1-s1 kernel: Call Trace: Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 04:27:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 04:27:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 04:27:24 fir-md1-s1 kernel: LNet: Service thread pid 114792 was inactive for 201.08s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 04:27:24 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 27 04:27:24 fir-md1-s1 kernel: Pid: 114792, comm: mdt03_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 04:27:24 fir-md1-s1 kernel: Call Trace: Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 04:27:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 04:27:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 04:27:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 04:27:24 fir-md1-s1 kernel: LNet: Service thread pid 105093 was inactive for 201.24s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 04:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 04:27:38 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Apr 27 04:27:53 fir-md1-s1 kernel: LNet: Service thread pid 114851 was inactive for 200.38s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 04:27:53 fir-md1-s1 kernel: LNet: Skipped 23 previous similar messages Apr 27 04:27:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556364473.114851 Apr 27 04:27:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556364474.105239 Apr 27 04:28:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:28:09 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Apr 27 04:28:23 fir-md1-s1 kernel: LNet: Service thread pid 105265 was inactive for 200.29s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 04:28:23 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 27 04:28:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556364503.105265 Apr 27 04:30:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 04:30:13 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Apr 27 04:30:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:30:44 fir-md1-s1 kernel: Lustre: Skipped 74 previous similar messages Apr 27 04:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 04:35:23 fir-md1-s1 kernel: Lustre: Skipped 149 previous similar messages Apr 27 04:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:35:23 fir-md1-s1 kernel: Lustre: Skipped 134 previous similar messages Apr 27 04:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:44:10 fir-md1-s1 kernel: Lustre: Skipped 254 previous similar messages Apr 27 04:45:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 04:45:44 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 04:54:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 04:54:31 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 04:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 04:56:04 fir-md1-s1 kernel: Lustre: Skipped 298 previous similar messages Apr 27 04:56:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client efd9b01d-7646-af53-52bb-21baf8260d64 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72153b7800, cur 1556366184 expire 1556366034 last 1556365957 Apr 27 04:56:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 05:01:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 27b59868-2412-e12b-495b-ef469cbee2d7 (at 10.8.22.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72153b5c00, cur 1556366465 expire 1556366315 last 1556366238 Apr 27 05:01:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 05:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 05:04:51 fir-md1-s1 kernel: Lustre: Skipped 305 previous similar messages Apr 27 05:06:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 05:06:24 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 05:15:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 47f502d3-bcb5-fabb-7382-b112636cf782 (at 10.9.104.17@o2ib4) Apr 27 05:15:11 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 05:16:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 5c33d5d2-2621-e59d-0e36-98a7cb2caa9e (at 10.9.104.17@o2ib4) reconnecting Apr 27 05:16:44 fir-md1-s1 kernel: Lustre: Skipped 300 previous similar messages Apr 27 05:19:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5b36977e-2123-3c92-aece-1cc3fbfc3aea (at 10.8.14.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b487a645c00, cur 1556367566 expire 1556367416 last 1556367339 Apr 27 05:19:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 05:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 6d93770c-d547-b1b4-ca86-fb148b5522f3 (at 10.9.107.16@o2ib4) Apr 27 05:25:31 fir-md1-s1 kernel: Lustre: Skipped 300 previous similar messages Apr 27 05:27:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d4ebf645-312a-a45c-d2fb-9a1e61693fc2 (at 10.9.107.16@o2ib4) reconnecting Apr 27 05:27:04 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 05:28:33 fir-md1-s1 kernel: LNet: Service thread pid 114915 completed after 3869.96s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 05:28:33 fir-md1-s1 kernel: Lustre: 105380:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:3809s); client may timeout. req@ffff8b714d220900 x1631546038456768/t0(0) o55->5c33d5d2-2621-e59d-0e36-98a7cb2caa9e@10.9.104.17@o2ib4:3/0 lens 472/192 e 0 to 0 dl 1556364304 ref 1 fl Complete:/0/0 rc -22/-22 Apr 27 05:28:33 fir-md1-s1 kernel: Lustre: 105380:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 27 05:28:33 fir-md1-s1 kernel: LustreError: 114938:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4c677df000 ns: mdt-fir-MDT0000_UUID lock: ffff8b3a85c39f80/0x378007f75431a271 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x1b/0x0 rrc: 67 type: IBT flags: 0x50200000000000 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c4ca82e expref: 4 pid: 114938 timeout: 0 lvb_type: 0 Apr 27 05:28:33 fir-md1-s1 kernel: LNet: Skipped 32 previous similar messages Apr 27 05:28:58 fir-md1-s1 kernel: Lustre: 104335:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b62a22be600 x1631546039524944/t0(0) o101->5c33d5d2-2621-e59d-0e36-98a7cb2caa9e@10.9.104.17@o2ib4:3/0 lens 568/0 e 0 to 0 dl 1556368143 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 27 05:28:58 fir-md1-s1 kernel: Lustre: 104335:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 27 05:30:03 fir-md1-s1 kernel: LustreError: 105265:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556368113, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b3dc23a1680/0x378007f7782c6628 lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 87 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105265 timeout: 0 lvb_type: 0 Apr 27 05:30:03 fir-md1-s1 kernel: LustreError: 105265:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 39 previous similar messages Apr 27 05:31:03 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.104.17@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b3eede13840/0x378007f7782c65b1 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 87 type: IBT flags: 0x60200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c50eea4 expref: 23 pid: 105021 timeout: 368097 lvb_type: 0 Apr 27 05:31:03 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 27 05:31:03 fir-md1-s1 kernel: LustreError: 105421:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b3fb536c400 ns: mdt-fir-MDT0000_UUID lock: ffff8b3aa8a81b00/0x378007f7782c6732 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x1b/0x0 rrc: 82 type: IBT flags: 0x50200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c50eeb2 expref: 5 pid: 105421 timeout: 0 lvb_type: 0 Apr 27 05:31:03 fir-md1-s1 kernel: LustreError: 105421:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 27 05:31:28 fir-md1-s1 kernel: Lustre: 105429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5f63f2aa00 x1631709884175792/t0(0) o101->79273348-f676-2905-42da-85a87e1ba2d5@10.9.107.14@o2ib4:3/0 lens 480/568 e 0 to 0 dl 1556368293 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 05:31:28 fir-md1-s1 kernel: Lustre: 105429:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 36 previous similar messages Apr 27 05:31:33 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.18@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b72f618d100/0x378007f7782c6897 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 83 type: IBT flags: 0x60200400000020 nid: 10.9.104.18@o2ib4 remote: 0x24bf5fd606f48d45 expref: 20 pid: 105085 timeout: 368127 lvb_type: 0 Apr 27 05:31:53 fir-md1-s1 kernel: LNet: Service thread pid 114861 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 05:31:53 fir-md1-s1 kernel: Pid: 114861, comm: mdt02_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 05:31:53 fir-md1-s1 kernel: Call Trace: Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 05:31:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 05:31:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 05:31:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556368313.114861 Apr 27 05:31:53 fir-md1-s1 kernel: Pid: 105251, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 05:31:53 fir-md1-s1 kernel: Call Trace: Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 05:31:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 05:31:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 05:31:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 05:31:54 fir-md1-s1 kernel: Pid: 114938, comm: mdt00_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 05:31:54 fir-md1-s1 kernel: Call Trace: Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 05:31:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 05:31:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 05:31:54 fir-md1-s1 kernel: Pid: 114809, comm: mdt01_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 05:31:54 fir-md1-s1 kernel: Call Trace: Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 05:31:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 05:31:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 05:31:54 fir-md1-s1 kernel: LNet: Service thread pid 104728 was inactive for 200.75s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 05:31:54 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 27 05:31:54 fir-md1-s1 kernel: Pid: 104728, comm: mdt02_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 05:31:54 fir-md1-s1 kernel: Call Trace: Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 05:31:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 05:31:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 05:31:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 05:31:54 fir-md1-s1 kernel: LNet: Service thread pid 105282 was inactive for 200.90s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 05:31:54 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 27 05:31:58 fir-md1-s1 kernel: Lustre: 105429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5d7c2a1800 x1631585657660368/t0(0) o101->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:3/0 lens 584/3264 e 0 to 0 dl 1556368323 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 05:31:58 fir-md1-s1 kernel: Lustre: 105429:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 27 05:32:03 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.107.13@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b5fa3beb840/0x378007f7782c6c4f lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 76 type: IBT flags: 0x60200400000020 nid: 10.9.107.13@o2ib4 remote: 0xf9e65bb32762354b expref: 19 pid: 114991 timeout: 368157 lvb_type: 0 Apr 27 05:32:03 fir-md1-s1 kernel: LustreError: 104944:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72fbbe7400 ns: mdt-fir-MDT0000_UUID lock: ffff8b5fc6090fc0/0x378007f7782c7165 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 74 type: IBT flags: 0x50200400000020 nid: 10.9.107.13@o2ib4 remote: 0xf9e65bb327623567 expref: 17 pid: 104944 timeout: 0 lvb_type: 0 Apr 27 05:32:03 fir-md1-s1 kernel: LNet: Service thread pid 104944 completed after 209.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 05:32:03 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 27 05:32:33 fir-md1-s1 kernel: LustreError: 104966:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556368263, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b627a5a8480/0x378007f779dd9f50 lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 69 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104966 timeout: 0 lvb_type: 0 Apr 27 05:32:33 fir-md1-s1 kernel: LustreError: 104966:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 27 05:33:03 fir-md1-s1 kernel: LustreError: 105025:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556368293, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4b0ce2d340/0x378007f77a355b01 lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x2/0x0 rrc: 69 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105025 timeout: 0 lvb_type: 0 Apr 27 05:33:03 fir-md1-s1 kernel: LustreError: 105025:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 27 05:33:33 fir-md1-s1 kernel: LustreError: 114826:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556368323, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4b1e2a45c0/0x378007f77a8b8bda lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x13/0x8 rrc: 69 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114826 timeout: 0 lvb_type: 0 Apr 27 05:33:33 fir-md1-s1 kernel: LustreError: 114826:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 27 05:34:23 fir-md1-s1 kernel: LNet: Service thread pid 104966 was inactive for 200.34s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 05:34:23 fir-md1-s1 kernel: LNet: Skipped 21 previous similar messages Apr 27 05:34:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556368463.104966 Apr 27 05:34:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556368464.105021 Apr 27 05:34:54 fir-md1-s1 kernel: LNet: Service thread pid 105237 was inactive for 200.60s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 05:34:54 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 27 05:34:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556368494.105237 Apr 27 05:35:23 fir-md1-s1 kernel: LNet: Service thread pid 114826 was inactive for 200.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 05:35:23 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 27 05:35:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556368523.114826 Apr 27 05:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c63decb4-6bcf-4a94-0462-15dbebe921ce (at 10.9.107.13@o2ib4) Apr 27 05:35:40 fir-md1-s1 kernel: Lustre: Skipped 282 previous similar messages Apr 27 05:37:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) reconnecting Apr 27 05:37:13 fir-md1-s1 kernel: Lustre: Skipped 279 previous similar messages Apr 27 05:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c63decb4-6bcf-4a94-0462-15dbebe921ce (at 10.9.107.13@o2ib4) Apr 27 05:46:00 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 05:47:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) reconnecting Apr 27 05:47:33 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 05:53:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 964455a3-fd69-54d9-fb4c-40ccde325a87 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72d7e18c00, cur 1556369581 expire 1556369431 last 1556369354 Apr 27 05:53:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 05:56:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c63decb4-6bcf-4a94-0462-15dbebe921ce (at 10.9.107.13@o2ib4) Apr 27 05:56:21 fir-md1-s1 kernel: Lustre: Skipped 302 previous similar messages Apr 27 05:57:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) reconnecting Apr 27 05:57:54 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:06:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c63decb4-6bcf-4a94-0462-15dbebe921ce (at 10.9.107.13@o2ib4) Apr 27 06:06:41 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) reconnecting Apr 27 06:08:14 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:17:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c63decb4-6bcf-4a94-0462-15dbebe921ce (at 10.9.107.13@o2ib4) Apr 27 06:17:01 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:18:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) reconnecting Apr 27 06:18:34 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:27:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to c63decb4-6bcf-4a94-0462-15dbebe921ce (at 10.9.107.13@o2ib4) Apr 27 06:27:21 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f7a14e1b-b524-07f0-440e-e264a01c9b69 (at 10.9.107.13@o2ib4) reconnecting Apr 27 06:28:54 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:33:36 fir-md1-s1 kernel: LNet: Service thread pid 105406 completed after 3902.99s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:33:36 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 27 06:34:04 fir-md1-s1 kernel: Lustre: 105399:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-8), not sending early reply req@ffff8b714775a400 x1631585913193296/t0(0) o55->8a37f7b1-3efc-30e9-f8d1-739df6680357@10.9.104.19@o2ib4:9/0 lens 472/224 e 0 to 0 dl 1556372049 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 06:34:04 fir-md1-s1 kernel: Lustre: 105399:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 27 06:34:06 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.22@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b731c23a640/0x378007f7782c739c lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 55 type: IBT flags: 0x60200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897db30380a1 expref: 13 pid: 105075 timeout: 371880 lvb_type: 0 Apr 27 06:34:06 fir-md1-s1 kernel: LNet: Service thread pid 105080 completed after 3932.88s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:34:06 fir-md1-s1 kernel: LustreError: 104355:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b59bb779400 ns: mdt-fir-MDT0000_UUID lock: ffff8b36d5f36540/0x378007f7782c7547 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x50200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897db30380b6 expref: 5 pid: 104355 timeout: 0 lvb_type: 0 Apr 27 06:34:06 fir-md1-s1 kernel: LustreError: 104355:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 27 06:34:06 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 27 06:34:21 fir-md1-s1 kernel: Lustre: 104332:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b5101af8000 x1631558381478144/t0(0) o101->759f8f0b-2ad4-e681-fa5a-f9895c0b0d9b@10.9.104.24@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1556372066 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 06:34:31 fir-md1-s1 kernel: Lustre: 104985:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5257ce1200 x1631586050480016/t0(0) o101->0fafc81c-d2f9-5fcc-1c5e-9d205df82025@10.9.104.20@o2ib4:6/0 lens 480/568 e 0 to 0 dl 1556372076 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 06:34:31 fir-md1-s1 kernel: Lustre: 104985:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 27 06:34:36 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.21@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b47326cb3c0/0x378007f7782c7e1c lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 60 type: IBT flags: 0x60200400000020 nid: 10.9.104.21@o2ib4 remote: 0xba59162a298a8865 expref: 20 pid: 114828 timeout: 371910 lvb_type: 0 Apr 27 06:34:36 fir-md1-s1 kernel: LNet: Service thread pid 105058 completed after 3962.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:34:36 fir-md1-s1 kernel: LustreError: 105117:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b47052f8c00 ns: mdt-fir-MDT0000_UUID lock: ffff8b388e2e2ac0/0x378007f7782c8045 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 56 type: IBT flags: 0x50200400000020 nid: 10.9.104.18@o2ib4 remote: 0x24bf5fd606f48d68 expref: 6 pid: 105117 timeout: 0 lvb_type: 0 Apr 27 06:34:36 fir-md1-s1 kernel: LustreError: 105117:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 27 06:34:36 fir-md1-s1 kernel: Lustre: 105117:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:3778s); client may timeout. req@ffff8b41d0242d00 x1631585657642800/t0(0) o101->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:3/0 lens 480/536 e 0 to 0 dl 1556368298 ref 1 fl Complete:/0/0 rc -107/-107 Apr 27 06:34:36 fir-md1-s1 kernel: Lustre: 105117:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Apr 27 06:34:36 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Apr 27 06:35:01 fir-md1-s1 kernel: Lustre: 115018:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b529392cb00 x1631534729965744/t0(0) o101->5735cd86-3a30-362c-bc05-c634d3fa1859@10.9.107.11@o2ib4:6/0 lens 568/0 e 0 to 0 dl 1556372106 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 27 06:35:01 fir-md1-s1 kernel: Lustre: 115018:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 27 06:35:06 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.19@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b3abdfd3600/0x378007f7782c8419 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 55 type: IBT flags: 0x60200400000020 nid: 10.9.104.19@o2ib4 remote: 0x2ed561758b1f4c8d expref: 26 pid: 104728 timeout: 371940 lvb_type: 0 Apr 27 06:35:06 fir-md1-s1 kernel: LustreError: 114861:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b3fb536c400 ns: mdt-fir-MDT0000_UUID lock: ffff8b5ed6293cc0/0x378007f7782c867a lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x20/0x0 rrc: 51 type: IBT flags: 0x50200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c50eeb9 expref: 2 pid: 114861 timeout: 0 lvb_type: 0 Apr 27 06:35:06 fir-md1-s1 kernel: LNet: Service thread pid 105251 completed after 3992.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:35:06 fir-md1-s1 kernel: LustreError: 114861:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 27 06:35:06 fir-md1-s1 kernel: Lustre: 114861:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:3839s); client may timeout. req@ffff8b62a22be600 x1631546039524944/t0(0) o101->5c33d5d2-2621-e59d-0e36-98a7cb2caa9e@10.9.104.17@o2ib4:3/0 lens 568/1672 e 0 to 0 dl 1556368267 ref 1 fl Complete:/0/0 rc -107/-107 Apr 27 06:35:06 fir-md1-s1 kernel: Lustre: 114861:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Apr 27 06:35:10 fir-md1-s1 kernel: LustreError: 23910:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.104.19@o2ib4 arrived at 1556372110 with bad export cookie 3999205222624917463 Apr 27 06:35:36 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.24@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b46db241d40/0x378007f7a6021542 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x60200400000020 nid: 10.9.104.24@o2ib4 remote: 0xc0a0d43c583327b3 expref: 24 pid: 114838 timeout: 371970 lvb_type: 0 Apr 27 06:35:36 fir-md1-s1 kernel: LustreError: 114882:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4db5a44000 ns: mdt-fir-MDT0000_UUID lock: ffff8b4923738b40/0x378007f7a6021e02 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x1b/0x0 rrc: 45 type: IBT flags: 0x50200400000020 nid: 10.9.104.24@o2ib4 remote: 0xc0a0d43c58332807 expref: 7 pid: 114882 timeout: 0 lvb_type: 0 Apr 27 06:35:36 fir-md1-s1 kernel: LustreError: 114828:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556372046, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b5053612640/0x378007f7a6021d37 lrc: 3/1,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x1b/0x0 rrc: 47 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114828 timeout: 0 lvb_type: 0 Apr 27 06:35:36 fir-md1-s1 kernel: LustreError: 114882:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 27 06:35:36 fir-md1-s1 kernel: Lustre: 104329:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (66:24s); client may timeout. req@ffff8b3d06c10600 x1631585913196336/t0(0) o101->8a37f7b1-3efc-30e9-f8d1-739df6680357@10.9.104.19@o2ib4:0/0 lens 568/1672 e 1 to 0 dl 1556372112 ref 1 fl Complete:/0/0 rc -107/-107 Apr 27 06:35:36 fir-md1-s1 kernel: Lustre: 104329:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 27 06:36:01 fir-md1-s1 kernel: Lustre: 114985:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5f75d4e300 x1631587186376256/t0(0) o101->d4ebf645-312a-a45c-d2fb-9a1e61693fc2@10.9.107.16@o2ib4:6/0 lens 584/3264 e 0 to 0 dl 1556372166 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 06:36:01 fir-md1-s1 kernel: Lustre: 114985:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages Apr 27 06:37:06 fir-md1-s1 kernel: LustreError: 105074:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556372136, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b527fb4de80/0x378007f7a722ac53 lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x13/0x8 rrc: 51 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105074 timeout: 0 lvb_type: 0 Apr 27 06:37:06 fir-md1-s1 kernel: LustreError: 105074:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 16 previous similar messages Apr 27 06:37:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 621a3cc3-5f17-0cd8-6a17-10cb0180c8b1 (at 10.9.104.23@o2ib4) Apr 27 06:37:40 fir-md1-s1 kernel: Lustre: Skipped 280 previous similar messages Apr 27 06:38:06 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.104.17@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b722feba400/0x378007f7a722aaee lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x60200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c559f7f expref: 22 pid: 104336 timeout: 372120 lvb_type: 0 Apr 27 06:38:31 fir-md1-s1 kernel: Lustre: 114938:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3ac9b8bc00 x1631585913232784/t0(0) o101->8a37f7b1-3efc-30e9-f8d1-739df6680357@10.9.104.19@o2ib4:6/0 lens 584/3264 e 0 to 0 dl 1556372316 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 06:38:31 fir-md1-s1 kernel: Lustre: 114938:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 27 previous similar messages Apr 27 06:38:57 fir-md1-s1 kernel: LNet: Service thread pid 104333 was inactive for 200.21s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 06:38:57 fir-md1-s1 kernel: Pid: 104333, comm: mdt02_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:38:57 fir-md1-s1 kernel: Call Trace: Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:38:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:38:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:38:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556372337.104333 Apr 27 06:38:57 fir-md1-s1 kernel: Pid: 104812, comm: mdt02_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:38:57 fir-md1-s1 kernel: Call Trace: Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:38:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:38:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:38:57 fir-md1-s1 kernel: Pid: 114882, comm: mdt01_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:38:57 fir-md1-s1 kernel: Call Trace: Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:38:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:38:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:38:57 fir-md1-s1 kernel: Pid: 114910, comm: mdt01_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:38:57 fir-md1-s1 kernel: Call Trace: Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:38:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:38:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:38:57 fir-md1-s1 kernel: LNet: Service thread pid 104728 was inactive for 201.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 06:38:57 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 27 06:38:57 fir-md1-s1 kernel: Pid: 104728, comm: mdt02_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:38:57 fir-md1-s1 kernel: Call Trace: Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:38:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:38:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:38:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:38:57 fir-md1-s1 kernel: LNet: Service thread pid 104966 was inactive for 201.24s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 06:39:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 16749711-2a27-479b-83fc-14b2199ba6af (at 10.9.104.18@o2ib4) reconnecting Apr 27 06:39:08 fir-md1-s1 kernel: Lustre: Skipped 267 previous similar messages Apr 27 06:39:36 fir-md1-s1 kernel: LustreError: 105046:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556372286, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b72e3af7080/0x378007f7a953f951 lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x13/0x8 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105046 timeout: 0 lvb_type: 0 Apr 27 06:39:36 fir-md1-s1 kernel: LustreError: 104691:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556372286, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b62792b69c0/0x378007f7a953f912 lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x20/0x0 rrc: 60 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104691 timeout: 0 lvb_type: 0 Apr 27 06:39:36 fir-md1-s1 kernel: LustreError: 104691:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 27 06:39:36 fir-md1-s1 kernel: LustreError: 105046:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Apr 27 06:40:36 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 47s: evicting client at 10.9.106.17@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b35c132d340/0x378007f7aaaf0050 lrc: 3/0,0 mode: PW/PW res: [0x2c001a78d:0x3fc:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.106.17@o2ib4 remote: 0x6533bfca4610fa11 expref: 2279 pid: 105269 timeout: 372253 lvb_type: 0 Apr 27 06:40:36 fir-md1-s1 kernel: LNet: Service thread pid 114882 completed after 299.90s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:40:36 fir-md1-s1 kernel: LustreError: 114809:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b385cf18800 ns: mdt-fir-MDT0000_UUID lock: ffff8b70fd2c9d40/0x378007f7a722b012 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x1b/0x0 rrc: 56 type: IBT flags: 0x50200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c559f94 expref: 5 pid: 114809 timeout: 0 lvb_type: 0 Apr 27 06:40:36 fir-md1-s1 kernel: LustreError: 114809:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Apr 27 06:40:36 fir-md1-s1 kernel: Lustre: 104335:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8b6290231b00 x1631546040680272/t236552209170(0) o36->5c33d5d2-2621-e59d-0e36-98a7cb2caa9e@10.9.104.17@o2ib4:6/0 lens 488/424 e 0 to 0 dl 1556372290 ref 1 fl Complete:/0/0 rc 0/0 Apr 27 06:40:36 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 27 06:41:01 fir-md1-s1 kernel: Lustre: 105252:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b520f68d100 x1631558215623152/t0(0) o101->8937ba9c-f8a3-cff7-01bf-a35c8d69bba5@10.9.107.10@o2ib4:6/0 lens 480/568 e 0 to 0 dl 1556372466 ref 2 fl Interpret:/0/0 rc 0/0 Apr 27 06:41:01 fir-md1-s1 kernel: Lustre: 105252:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Apr 27 06:41:06 fir-md1-s1 kernel: LNet: Service thread pid 105117 completed after 329.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:41:06 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 27 06:41:26 fir-md1-s1 kernel: LNet: Service thread pid 105239 was inactive for 200.18s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 06:41:26 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Apr 27 06:41:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556372486.105239 Apr 27 06:41:31 fir-md1-s1 kernel: LNet: Service thread pid 114991 was inactive for 200.14s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 06:41:31 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Apr 27 06:41:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556372491.114991 Apr 27 06:41:36 fir-md1-s1 kernel: LustreError: 114910:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b60bdb4cc00 ns: mdt-fir-MDT0000_UUID lock: ffff8b52e2971d40/0x378007f7a722b601 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 56 type: IBT flags: 0x50200400000020 nid: 10.9.104.23@o2ib4 remote: 0xc74f0df9ebcda5c8 expref: 28 pid: 114910 timeout: 0 lvb_type: 0 Apr 27 06:41:36 fir-md1-s1 kernel: LustreError: 114910:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Apr 27 06:41:36 fir-md1-s1 kernel: LNet: Service thread pid 104812 completed after 359.90s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 06:41:36 fir-md1-s1 kernel: Lustre: 104728:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:206s); client may timeout. req@ffff8b6290232100 x1631546040680320/t0(0) o101->5c33d5d2-2621-e59d-0e36-98a7cb2caa9e@10.9.104.17@o2ib4:6/0 lens 568/1672 e 0 to 0 dl 1556372290 ref 1 fl Complete:/0/0 rc -107/-107 Apr 27 06:41:36 fir-md1-s1 kernel: Lustre: 104728:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 27 06:41:36 fir-md1-s1 kernel: LNet: Skipped 18 previous similar messages Apr 27 06:43:06 fir-md1-s1 kernel: LustreError: 114828:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556372496, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4a01b19200/0x378007f7ac0a62fe lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 54 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114828 timeout: 0 lvb_type: 0 Apr 27 06:43:06 fir-md1-s1 kernel: LustreError: 114828:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 21 previous similar messages Apr 27 06:44:06 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.104.24@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b3e3c7286c0/0x378007f7ac0a607a lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 54 type: IBT flags: 0x60200400000020 nid: 10.9.104.24@o2ib4 remote: 0xc0a0d43c58332c21 expref: 78 pid: 104766 timeout: 372480 lvb_type: 0 Apr 27 06:44:06 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Apr 27 06:44:06 fir-md1-s1 kernel: LustreError: 105399:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b5f75d7a000 ns: mdt-fir-MDT0000_UUID lock: ffff8b3e68f93180/0x378007f7ac0a686f lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x50200400000020 nid: 10.9.104.24@o2ib4 remote: 0xc0a0d43c58332c28 expref: 8 pid: 105399 timeout: 0 lvb_type: 0 Apr 27 06:44:06 fir-md1-s1 kernel: LustreError: 105399:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 6 previous similar messages Apr 27 06:44:56 fir-md1-s1 kernel: LNet: Service thread pid 104728 was inactive for 200.07s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 06:44:56 fir-md1-s1 kernel: Pid: 104728, comm: mdt02_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:44:56 fir-md1-s1 kernel: Call Trace: Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:44:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:44:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:44:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556372696.104728 Apr 27 06:44:56 fir-md1-s1 kernel: Pid: 105128, comm: mdt02_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:44:56 fir-md1-s1 kernel: Call Trace: Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:44:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:44:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:44:56 fir-md1-s1 kernel: Pid: 105011, comm: mdt02_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:44:56 fir-md1-s1 kernel: Call Trace: Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:44:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:44:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:44:56 fir-md1-s1 kernel: Pid: 105046, comm: mdt02_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:44:56 fir-md1-s1 kernel: Call Trace: Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:44:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:44:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:44:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:44:57 fir-md1-s1 kernel: Pid: 105034, comm: mdt00_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 06:44:57 fir-md1-s1 kernel: Call Trace: Apr 27 06:44:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 06:44:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 06:44:57 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 27 06:44:57 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 27 06:44:57 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 27 06:44:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 06:44:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 06:44:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 06:44:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 06:44:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 06:44:57 fir-md1-s1 kernel: LNet: Service thread pid 104957 was inactive for 200.71s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 06:44:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556372697.114851 Apr 27 06:45:36 fir-md1-s1 kernel: LustreError: 104766:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556372646, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b72f8132880/0x378007f7ae1154e7 lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104766 timeout: 0 lvb_type: 0 Apr 27 06:45:36 fir-md1-s1 kernel: LustreError: 104766:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 27 06:47:27 fir-md1-s1 kernel: LNet: Service thread pid 104766 was inactive for 200.62s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 27 06:47:27 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 27 06:47:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556372847.104766 Apr 27 06:47:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.107.14@o2ib4) Apr 27 06:47:43 fir-md1-s1 kernel: Lustre: Skipped 270 previous similar messages Apr 27 06:49:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 79273348-f676-2905-42da-85a87e1ba2d5 (at 10.9.107.14@o2ib4) reconnecting Apr 27 06:49:16 fir-md1-s1 kernel: Lustre: Skipped 273 previous similar messages Apr 27 06:58:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.107.14@o2ib4) Apr 27 06:58:03 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 06:59:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 79273348-f676-2905-42da-85a87e1ba2d5 (at 10.9.107.14@o2ib4) reconnecting Apr 27 06:59:36 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:08:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.107.14@o2ib4) Apr 27 07:08:23 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:09:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 79273348-f676-2905-42da-85a87e1ba2d5 (at 10.9.107.14@o2ib4) reconnecting Apr 27 07:09:56 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.107.14@o2ib4) Apr 27 07:18:43 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:20:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 79273348-f676-2905-42da-85a87e1ba2d5 (at 10.9.107.14@o2ib4) reconnecting Apr 27 07:20:16 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:29:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 4a715ce4-44e5-9eb5-fd19-43a9ef3ba9b7 (at 10.9.107.10@o2ib4) Apr 27 07:29:03 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:30:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8937ba9c-f8a3-cff7-01bf-a35c8d69bba5 (at 10.9.107.10@o2ib4) reconnecting Apr 27 07:30:36 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:39:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 4a715ce4-44e5-9eb5-fd19-43a9ef3ba9b7 (at 10.9.107.10@o2ib4) Apr 27 07:39:23 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:40:37 fir-md1-s1 kernel: LNet: Service thread pid 114792 completed after 3541.44s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 07:40:37 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 27 07:40:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 8937ba9c-f8a3-cff7-01bf-a35c8d69bba5 (at 10.9.107.10@o2ib4) reconnecting Apr 27 07:40:56 fir-md1-s1 kernel: Lustre: Skipped 299 previous similar messages Apr 27 07:41:02 fir-md1-s1 kernel: Lustre: 105058:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5d7a9ad100 x1631534730383072/t0(0) o101->5735cd86-3a30-362c-bc05-c634d3fa1859@10.9.107.11@o2ib4:7/0 lens 632/0 e 0 to 0 dl 1556376067 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 27 07:41:02 fir-md1-s1 kernel: Lustre: 105058:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 30 previous similar messages Apr 27 07:42:07 fir-md1-s1 kernel: LustreError: 104389:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556376037, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4d410f0240/0x378007f7dcffbc77 lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x20/0x0 rrc: 53 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104389 timeout: 0 lvb_type: 0 Apr 27 07:42:07 fir-md1-s1 kernel: LustreError: 104389:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 27 07:43:07 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.104.19@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b60e5ace780/0x378007f7ac0a6892 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x60200400000020 nid: 10.9.104.19@o2ib4 remote: 0x2ed561758b24f8b2 expref: 17 pid: 114991 timeout: 376021 lvb_type: 0 Apr 27 07:43:07 fir-md1-s1 kernel: LustreError: 104728:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b5101ad2000 ns: mdt-fir-MDT0000_UUID lock: ffff8b5ededbe300/0x378007f7ac0a6bc5 lrc: 5/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x50200400000020 nid: 10.9.104.19@o2ib4 remote: 0x2ed561758b24f8b9 expref: 12 pid: 104728 timeout: 0 lvb_type: 0 Apr 27 07:43:07 fir-md1-s1 kernel: LNet: Service thread pid 104728 completed after 3691.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 07:43:07 fir-md1-s1 kernel: LNet: Service thread pid 105239 completed after 3691.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 07:43:32 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b6231114500 x1631546041738256/t0(0) o101->5c33d5d2-2621-e59d-0e36-98a7cb2caa9e@10.9.104.17@o2ib4:7/0 lens 568/0 e 0 to 0 dl 1556376217 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 27 07:43:32 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 27 07:43:37 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.17@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b57efe82880/0x378007f7ac0a6d3f lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x311:0x0].0x0 bits 0x40/0x0 rrc: 48 type: IBT flags: 0x60200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7c55a2ce expref: 1256 pid: 105239 timeout: 376051 lvb_type: 0 Apr 27 07:43:37 fir-md1-s1 kernel: LNet: Service thread pid 104337 completed after 3720.97s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 27 07:43:37 fir-md1-s1 kernel: LustreError: 104691:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b5101ad2000 ns: mdt-fir-MDT0000_UUID lock: ffff8b567d303a80/0x378007f7ac0a6e88 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x1b/0x0 rrc: 44 type: IBT flags: 0x50200400000020 nid: 10.9.104.19@o2ib4 remote: 0x2ed561758b24f8ce expref: 4 pid: 104691 timeout: 0 lvb_type: 0 Apr 27 07:43:37 fir-md1-s1 kernel: Lustre: 104691:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (3719:2s); client may timeout. req@ffff8b5e67f11200 x1631585913250576/t0(0) o101->8a37f7b1-3efc-30e9-f8d1-739df6680357@10.9.104.19@o2ib4:6/0 lens 584/1168 e 0 to 0 dl 1556376215 ref 1 fl Complete:/0/0 rc -107/-107 Apr 27 07:43:37 fir-md1-s1 kernel: Lustre: 104691:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 27 07:43:37 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 27 07:43:43 fir-md1-s1 kernel: LustreError: 82844:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.104.17@o2ib4 arrived at 1556376223 with bad export cookie 3999205229385308541 Apr 27 07:43:58 fir-md1-s1 kernel: LNet: Service thread pid 114991 was inactive for 200.64s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 27 07:43:58 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 27 07:43:58 fir-md1-s1 kernel: Pid: 114991, comm: mdt02_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 07:43:58 fir-md1-s1 kernel: Call Trace: Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 07:43:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 07:43:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 07:43:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556376238.114991 Apr 27 07:43:58 fir-md1-s1 kernel: Pid: 104389, comm: mdt01_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 27 07:43:58 fir-md1-s1 kernel: Call Trace: Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 27 07:43:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 27 07:43:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 27 07:43:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 27 07:44:07 fir-md1-s1 kernel: Lustre: 105128:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (3750:1s); client may timeout. req@ffff8b5e67f12700 x1631586050523824/t0(0) o101->0fafc81c-d2f9-5fcc-1c5e-9d205df82025@10.9.104.20@o2ib4:6/0 lens 480/536 e 0 to 0 dl 1556376246 ref 1 fl Complete:/0/0 rc 0/0 Apr 27 07:44:07 fir-md1-s1 kernel: LustreError: 104957:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b5f75d7a000 ns: mdt-fir-MDT0000_UUID lock: ffff8b52d16cf980/0x378007f7ac0a79a9 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x311:0x0].0x0 bits 0x20/0x0 rrc: 33 type: IBT flags: 0x50200000000000 nid: 10.9.104.24@o2ib4 remote: 0xc0a0d43c58332c4b expref: 2 pid: 104957 timeout: 0 lvb_type: 0 Apr 27 07:44:07 fir-md1-s1 kernel: LustreError: 104957:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 27 07:44:07 fir-md1-s1 kernel: Lustre: 105128:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 10 previous similar messages Apr 27 08:25:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f8efffa2-3213-b7c8-ccdc-9a5668e213bd (at 10.8.25.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72fa1f6c00, cur 1556378720 expire 1556378570 last 1556378493 Apr 27 08:25:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 08:25:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8bfed59e-0ae2-8175-d0ee-2a61e691b9d0 (at 10.8.25.13@o2ib6) Apr 27 08:25:36 fir-md1-s1 kernel: Lustre: Skipped 141 previous similar messages Apr 27 08:25:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f8efffa2-3213-b7c8-ccdc-9a5668e213bd (at 10.8.25.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72ff227400, cur 1556378742 expire 1556378592 last 1556378515 Apr 27 08:25:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 27 12:01:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 042451f7-afee-3ccc-06ab-cb159c555b53 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b522b67ec00, cur 1556391680 expire 1556391530 last 1556391453 Apr 27 12:01:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to faf112e6-add7-bad5-c6b8-4f1cf748e849 (at 10.8.21.21@o2ib6) Apr 27 12:01:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 13:02:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 804ab419-a813-7f25-6cb1-35e14a3ff3a4 (at 10.8.1.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b70d965c000, cur 1556395365 expire 1556395215 last 1556395138 Apr 27 13:02:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 13:04:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6aa1d90c-025f-c433-1a83-2952cf309910 (at 10.8.1.11@o2ib6) Apr 27 13:04:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 13:45:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0f9dff68-8375-0be2-d3b1-88e8149ac753 (at 10.8.10.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72963f7400, cur 1556397903 expire 1556397753 last 1556397676 Apr 27 13:45:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 13:45:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0f9dff68-8375-0be2-d3b1-88e8149ac753 (at 10.8.10.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72ff220000, cur 1556397906 expire 1556397756 last 1556397679 Apr 27 13:45:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 27 13:47:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c7052566-4d84-d96a-56f4-b6d8f3f29ad2 (at 10.8.10.5@o2ib6) Apr 27 13:47:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 15:34:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d3da365a-e12b-dd95-2805-14a34f48c77c (at 10.9.102.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7247672c00, cur 1556404441 expire 1556404291 last 1556404214 Apr 27 23:04:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7bcdde3c-236d-6bbd-afc2-cf3c9fa536ce (at 10.8.17.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72943b6800, cur 1556431447 expire 1556431297 last 1556431220 Apr 27 23:04:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 27 23:52:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2b43d6d0-617d-0cf0-75e3-8727983baa85 (at 10.8.14.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7123219800, cur 1556434364 expire 1556434214 last 1556434137 Apr 27 23:52:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 00:12:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 990001b2-7e7d-402a-1dc8-85b3e7069f05 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72ec721800, cur 1556435523 expire 1556435373 last 1556435296 Apr 28 00:12:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 00:14:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c8601412-6922-99d5-2d95-dec111840ceb (at 10.8.10.29@o2ib6) Apr 28 00:14:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 01:52:40 fir-md1-s1 kernel: Lustre: 105127:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b56ebe8ef00 x1631535234539200/t0(0) o101->6a18ff26-2f90-35f3-8dc0-c084882f2a83@10.8.18.15@o2ib6:15/0 lens 576/3264 e 1 to 0 dl 1556441565 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 01:52:40 fir-md1-s1 kernel: Lustre: 105127:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 28 01:52:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0556c970-f843-3f2b-1d1c-0b06884878cc (at 10.8.13.22@o2ib6) reconnecting Apr 28 01:52:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 5b99a272-040b-ad74-ab3b-5a9c00c921dd (at 10.8.20.11@o2ib6) Apr 28 01:52:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 01:52:46 fir-md1-s1 kernel: Lustre: Skipped 94 previous similar messages Apr 28 01:52:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 625f273d-37e1-2d1a-cd9b-3920076bd8e4 (at 10.8.13.4@o2ib6) Apr 28 01:52:47 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Apr 28 01:52:51 fir-md1-s1 kernel: Lustre: 114866:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b45f2f07b00 x1631545961432528/t0(0) o101->e508f6f3-acda-49f1-6911-42786d06f3ec@10.8.17.21@o2ib6:26/0 lens 584/3264 e 0 to 0 dl 1556441576 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 01:52:51 fir-md1-s1 kernel: Lustre: 114866:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 42 previous similar messages Apr 28 01:52:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ec3a6b48-c57a-17c3-f292-1109fbbb4e4d (at 10.9.107.32@o2ib4) Apr 28 01:52:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Apr 28 01:52:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.22.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b42f583b3c0/0x378007faaddb51f8 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 138 type: IBT flags: 0x60200400000020 nid: 10.8.22.23@o2ib6 remote: 0x26b2b40a92329f75 expref: 3486 pid: 105046 timeout: 441409 lvb_type: 0 Apr 28 01:52:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 01:52:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.13.17@o2ib6) Apr 28 01:53:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ccbaa6d4-c0cc-5bf8-7c12-a382eab3833d (at 10.8.30.13@o2ib6) Apr 28 01:53:07 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Apr 28 01:53:07 fir-md1-s1 kernel: LustreError: 114838:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72e0f1c400 ns: mdt-fir-MDT0002_UUID lock: ffff8b3646f27500/0x378007faaddcc418 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 140 type: IBT flags: 0x50200000000000 nid: 10.8.22.23@o2ib6 remote: 0x26b2b40a9232a0d3 expref: 4 pid: 114838 timeout: 0 lvb_type: 0 Apr 28 01:53:07 fir-md1-s1 kernel: LustreError: 114838:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 28 01:53:07 fir-md1-s1 kernel: Lustre: 114838:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:1s); client may timeout. req@ffff8b45ebb11e00 x1631549059665984/t0(0) o101->6137bba0-34c0-9107-d068-27095ef10964@10.8.22.23@o2ib6:15/0 lens 568/2296 e 1 to 0 dl 1556441586 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 01:53:08 fir-md1-s1 kernel: Lustre: 114838:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 01:53:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 688bf468-b12b-a6a5-9168-2e97ac5757d2 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b6fef339400, cur 1556441601 expire 1556441451 last 1556441374 Apr 28 01:53:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 01:53:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 61551ba3-276b-add5-766b-362f4d060385 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72f6d45800, cur 1556441610 expire 1556441460 last 1556441383 Apr 28 01:53:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 01:53:34 fir-md1-s1 kernel: Lustre: 105247:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4b4e308000 x1631543347308640/t0(0) o101->78ab2c22-394d-bdd4-0b8e-3553d6a47e28@10.8.17.2@o2ib6:9/0 lens 576/3264 e 0 to 0 dl 1556441619 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 01:53:34 fir-md1-s1 kernel: Lustre: 105247:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 33 previous similar messages Apr 28 01:53:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.22.30@o2ib6) Apr 28 01:53:40 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Apr 28 01:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 315cf750-5ce7-61a0-093d-91bfc52b74be (at 10.8.17.10@o2ib6) reconnecting Apr 28 01:54:04 fir-md1-s1 kernel: Lustre: Skipped 102 previous similar messages Apr 28 01:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7678bba2-0d4a-2c45-0477-8fdb0f15ad72 (at 10.8.17.10@o2ib6) Apr 28 01:54:04 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Apr 28 01:54:39 fir-md1-s1 kernel: LustreError: 104335:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441589, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b627caf4140/0x378007faaed9f60b lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 257 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104335 timeout: 0 lvb_type: 0 Apr 28 01:54:39 fir-md1-s1 kernel: LustreError: 104335:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 18 previous similar messages Apr 28 01:54:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c36c49f8-c777-2823-2238-ac147660c716 (at 10.8.25.19@o2ib6) Apr 28 01:54:42 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Apr 28 01:55:03 fir-md1-s1 kernel: LustreError: 105259:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441613, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3d4eaa4140/0x378007faaf5d8e48 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 257 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105259 timeout: 0 lvb_type: 0 Apr 28 01:55:03 fir-md1-s1 kernel: LustreError: 105259:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 78 previous similar messages Apr 28 01:55:15 fir-md1-s1 kernel: LustreError: 114832:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441625, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b72c9e46e40/0x378007faaf9f19fe lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 257 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114832 timeout: 0 lvb_type: 0 Apr 28 01:55:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.20.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4650e09200/0x378007faaed9d7db lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 280 type: IBT flags: 0x60200400000020 nid: 10.8.20.11@o2ib6 remote: 0xb2c5110cd1b0807d expref: 3854 pid: 114838 timeout: 441573 lvb_type: 0 Apr 28 01:55:45 fir-md1-s1 kernel: Lustre: 104990:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b38c9b5d100 x1631535799648384/t0(0) o101->06788f2c-1d86-3c8c-59cf-db93bda4f27e@10.8.8.30@o2ib6:20/0 lens 584/3264 e 0 to 0 dl 1556441750 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 01:55:45 fir-md1-s1 kernel: Lustre: 104990:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 77 previous similar messages Apr 28 01:55:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to f24fa8cf-aa2f-9e6d-8276-05979faf0cd3 (at 10.8.8.30@o2ib6) Apr 28 01:55:51 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Apr 28 01:56:29 fir-md1-s1 kernel: LNet: Service thread pid 105308 was inactive for 200.41s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 01:56:29 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 28 01:56:29 fir-md1-s1 kernel: Pid: 105308, comm: mdt01_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 01:56:29 fir-md1-s1 kernel: Call Trace: Apr 28 01:56:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 01:56:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 01:56:29 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 01:56:29 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 01:56:29 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 01:56:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 01:56:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 01:56:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 01:56:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 01:56:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 01:56:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441790.105308 Apr 28 01:56:30 fir-md1-s1 kernel: Pid: 105113, comm: mdt00_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 01:56:30 fir-md1-s1 kernel: Call Trace: Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 01:56:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 01:56:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 01:56:30 fir-md1-s1 kernel: Pid: 114938, comm: mdt00_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 01:56:30 fir-md1-s1 kernel: Call Trace: Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 01:56:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 01:56:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 01:56:30 fir-md1-s1 kernel: Pid: 105034, comm: mdt00_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 01:56:30 fir-md1-s1 kernel: Call Trace: Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 01:56:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 01:56:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 01:56:30 fir-md1-s1 kernel: LNet: Service thread pid 104940 was inactive for 200.90s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 01:56:30 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 28 01:56:30 fir-md1-s1 kernel: Pid: 104940, comm: mdt01_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 01:56:30 fir-md1-s1 kernel: Call Trace: Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 01:56:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 01:56:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 01:56:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 01:56:30 fir-md1-s1 kernel: LNet: Service thread pid 105239 was inactive for 201.12s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 01:56:30 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 28 01:56:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 315cf750-5ce7-61a0-093d-91bfc52b74be (at 10.8.17.10@o2ib6) reconnecting Apr 28 01:56:39 fir-md1-s1 kernel: Lustre: Skipped 252 previous similar messages Apr 28 01:56:50 fir-md1-s1 kernel: LustreError: 104965:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441720, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3fddef0d80/0x378007fab178b55b lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 279 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104965 timeout: 0 lvb_type: 0 Apr 28 01:56:53 fir-md1-s1 kernel: LNet: Service thread pid 105259 was inactive for 200.41s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 01:56:53 fir-md1-s1 kernel: LNet: Skipped 83 previous similar messages Apr 28 01:56:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441813.105259 Apr 28 01:57:06 fir-md1-s1 kernel: LNet: Service thread pid 114832 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 01:57:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441826.114832 Apr 28 01:57:14 fir-md1-s1 kernel: LustreError: 114909:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441744, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5d76c90000/0x378007fab2007bc6 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x21/0x0 rrc: 279 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114909 timeout: 0 lvb_type: 0 Apr 28 01:57:14 fir-md1-s1 kernel: LustreError: 114909:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 25 previous similar messages Apr 28 01:57:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 130603c9-a1b3-c8a3-280b-e15c83ebc000 (at 10.8.12.16@o2ib6) Apr 28 01:57:59 fir-md1-s1 kernel: Lustre: Skipped 262 previous similar messages Apr 28 01:58:14 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.30.13@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b522f973180/0x378007faaeda1323 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 279 type: IBT flags: 0x60200400000020 nid: 10.8.30.13@o2ib6 remote: 0x2034a6ecf11f61e4 expref: 3485 pid: 114799 timeout: 441728 lvb_type: 0 Apr 28 01:58:14 fir-md1-s1 kernel: LNet: Service thread pid 104696 completed after 305.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 01:58:14 fir-md1-s1 kernel: LNet: Skipped 16 previous similar messages Apr 28 01:58:19 fir-md1-s1 kernel: LNet: Service thread pid 105310 completed after 310.41s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 01:58:19 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Apr 28 01:58:40 fir-md1-s1 kernel: LNet: Service thread pid 104965 was inactive for 200.34s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 01:58:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441920.104965 Apr 28 01:58:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441922.114839 Apr 28 01:58:44 fir-md1-s1 kernel: Lustre: 105303:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4614bd3900 x1631834511291616/t0(0) o55->ed6bfb82-5106-e0fc-d3c8-7326144df664@10.8.25.19@o2ib6:19/0 lens 472/224 e 0 to 0 dl 1556441929 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 01:58:44 fir-md1-s1 kernel: Lustre: 105303:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Apr 28 01:58:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441925.104720 Apr 28 01:58:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441926.104982 Apr 28 01:58:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441927.114829 Apr 28 01:58:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441928.105025 Apr 28 01:58:49 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.13.5@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b5282235100/0x378007faaeda695a lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 281 type: IBT flags: 0x60200400000020 nid: 10.8.13.5@o2ib6 remote: 0x410a36eab3471410 expref: 3493 pid: 104973 timeout: 441763 lvb_type: 0 Apr 28 01:58:49 fir-md1-s1 kernel: LustreError: 105113:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b728b257400 ns: mdt-fir-MDT0002_UUID lock: ffff8b50fef05c40/0x378007faaeda6a41 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 278 type: IBT flags: 0x50200400000020 nid: 10.8.13.5@o2ib6 remote: 0x410a36eab347141e expref: 3033 pid: 105113 timeout: 0 lvb_type: 0 Apr 28 01:58:49 fir-md1-s1 kernel: LustreError: 105113:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 01:58:49 fir-md1-s1 kernel: LNet: Service thread pid 105113 completed after 340.17s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 01:58:49 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 28 01:59:04 fir-md1-s1 kernel: LNet: Service thread pid 114866 was inactive for 200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 01:59:04 fir-md1-s1 kernel: LNet: Skipped 22 previous similar messages Apr 28 01:59:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556441944.114866 Apr 28 01:59:20 fir-md1-s1 kernel: LNet: Service thread pid 104910 completed after 371.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 01:59:20 fir-md1-s1 kernel: LNet: Skipped 24 previous similar messages Apr 28 01:59:21 fir-md1-s1 kernel: Lustre: 114841:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (371:1s); client may timeout. req@ffff8b4728f68900 x1631342936586896/t0(0) o101->115a2222-e1b7-dfee-ffb1-27088b50273f@10.8.13.4@o2ib6:9/0 lens 480/536 e 0 to 0 dl 1556441960 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 01:59:21 fir-md1-s1 kernel: Lustre: 114841:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 01:59:21 fir-md1-s1 kernel: LustreError: 114804:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b3e53b2d800 ns: mdt-fir-MDT0002_UUID lock: ffff8b52cbda3180/0x378007faaeda8d6b lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 269 type: IBT flags: 0x50200400000020 nid: 10.8.20.11@o2ib6 remote: 0xb2c5110cd1b085a1 expref: 8 pid: 114804 timeout: 0 lvb_type: 0 Apr 28 01:59:21 fir-md1-s1 kernel: LustreError: 114804:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 28 01:59:44 fir-md1-s1 kernel: LustreError: 104909:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441894, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4e8a7b86c0/0x378007fab51f25a9 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 263 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104909 timeout: 0 lvb_type: 0 Apr 28 01:59:44 fir-md1-s1 kernel: LustreError: 104909:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 28 01:59:51 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.13.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4ef3fa5e80/0x378007faaedaaa21 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 263 type: IBT flags: 0x60200400000020 nid: 10.8.13.22@o2ib6 remote: 0x504b1cac6f600a96 expref: 3638 pid: 114991 timeout: 441825 lvb_type: 0 Apr 28 01:59:51 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 01:59:51 fir-md1-s1 kernel: LustreError: 105076:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b4d2b5ada00 x1631591707888080/t0(0) o104->fir-MDT0002@10.8.13.22@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 01:59:51 fir-md1-s1 kernel: LustreError: 23536:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.13.22@o2ib6 arrived at 1556441991 with bad export cookie 3999205221519726894 Apr 28 01:59:51 fir-md1-s1 kernel: LustreError: 105068:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b724ba25400 ns: mdt-fir-MDT0002_UUID lock: ffff8b5254383a80/0x378007faaedbd5fb lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 256 type: IBT flags: 0x50200400000020 nid: 10.8.7.30@o2ib6 remote: 0x4262b8f4ee803789 expref: 6 pid: 105068 timeout: 0 lvb_type: 0 Apr 28 01:59:51 fir-md1-s1 kernel: Lustre: 105068:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (371:31s); client may timeout. req@ffff8b457736a400 x1631535358573104/t0(0) o101->38fc721f-2581-5cc7-2331-7b71af28244a@10.8.7.30@o2ib6:9/0 lens 576/1792 e 0 to 0 dl 1556441960 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 01:59:51 fir-md1-s1 kernel: Lustre: 105068:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Apr 28 02:01:21 fir-md1-s1 kernel: LustreError: 114865:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556441991, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b49cd7fad00/0x378007fab710bf86 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 258 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114865 timeout: 0 lvb_type: 0 Apr 28 02:01:21 fir-md1-s1 kernel: LustreError: 114865:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 46 previous similar messages Apr 28 02:01:35 fir-md1-s1 kernel: LNet: Service thread pid 114815 was inactive for 200.61s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:01:35 fir-md1-s1 kernel: Pid: 114815, comm: mdt01_080 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:01:35 fir-md1-s1 kernel: Call Trace: Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:01:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:01:35 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:01:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442095.114815 Apr 28 02:01:35 fir-md1-s1 kernel: Pid: 104909, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:01:35 fir-md1-s1 kernel: Call Trace: Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:01:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:01:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:01:35 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:01:40 fir-md1-s1 kernel: LNet: Service thread pid 114828 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:01:40 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 28 02:01:40 fir-md1-s1 kernel: Pid: 114828, comm: mdt01_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:01:40 fir-md1-s1 kernel: Call Trace: Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:01:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:01:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:01:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442100.114828 Apr 28 02:01:40 fir-md1-s1 kernel: Pid: 114798, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:01:40 fir-md1-s1 kernel: Call Trace: Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:01:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:01:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:01:40 fir-md1-s1 kernel: Pid: 105002, comm: mdt01_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:01:40 fir-md1-s1 kernel: Call Trace: Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:01:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:01:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:01:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:01:40 fir-md1-s1 kernel: LNet: Service thread pid 104957 was inactive for 200.90s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 02:01:40 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 28 02:01:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1eed9d25-9802-9a67-1bce-978ce6293b9f (at 10.8.25.13@o2ib6) reconnecting Apr 28 02:01:55 fir-md1-s1 kernel: Lustre: Skipped 574 previous similar messages Apr 28 02:02:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442129.105005 Apr 28 02:02:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8bfed59e-0ae2-8175-d0ee-2a61e691b9d0 (at 10.8.25.13@o2ib6) Apr 28 02:02:26 fir-md1-s1 kernel: Lustre: Skipped 465 previous similar messages Apr 28 02:02:41 fir-md1-s1 kernel: LNet: Service thread pid 104939 was inactive for 200.24s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 02:02:41 fir-md1-s1 kernel: LNet: Skipped 27 previous similar messages Apr 28 02:02:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442161.104939 Apr 28 02:03:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442191.105301 Apr 28 02:03:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442192.105397 Apr 28 02:08:49 fir-md1-s1 kernel: Lustre: 105035:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5d20fcce00 x1631677115520944/t0(0) o101->c731af70-4a29-f5b0-77a4-e7d55674e36d@10.8.13.5@o2ib6:24/0 lens 584/3264 e 0 to 0 dl 1556442534 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 02:08:49 fir-md1-s1 kernel: Lustre: 105035:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 98 previous similar messages Apr 28 02:09:54 fir-md1-s1 kernel: LustreError: 104958:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556442504, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5f76994380/0x378007fac182f548 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 259 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104958 timeout: 0 lvb_type: 0 Apr 28 02:09:54 fir-md1-s1 kernel: LustreError: 104958:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 48 previous similar messages Apr 28 02:10:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 365c3577-8854-7eeb-7de5-093c8b1d1134 (at 10.8.13.5@o2ib6) Apr 28 02:10:59 fir-md1-s1 kernel: Lustre: Skipped 1040 previous similar messages Apr 28 02:11:44 fir-md1-s1 kernel: LNet: Service thread pid 104958 was inactive for 200.15s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:11:44 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 28 02:11:44 fir-md1-s1 kernel: Pid: 104958, comm: mdt02_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:11:44 fir-md1-s1 kernel: Call Trace: Apr 28 02:11:44 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:11:44 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:11:44 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:11:44 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:11:44 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:11:44 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:11:44 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:11:44 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:11:44 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:11:44 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:11:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442704.104958 Apr 28 02:12:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c731af70-4a29-f5b0-77a4-e7d55674e36d (at 10.8.13.5@o2ib6) reconnecting Apr 28 02:12:01 fir-md1-s1 kernel: Lustre: Skipped 1225 previous similar messages Apr 28 02:14:05 fir-md1-s1 kernel: LNet: Service thread pid 105232 was inactive for 200.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:14:05 fir-md1-s1 kernel: Pid: 105232, comm: mdt02_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:14:05 fir-md1-s1 kernel: Call Trace: Apr 28 02:14:05 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:14:05 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:14:05 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:14:05 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:14:05 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:14:05 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:14:05 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:14:05 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:14:05 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:14:05 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:14:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442845.105232 Apr 28 02:14:28 fir-md1-s1 kernel: Lustre: 114878:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3f60110600 x1631898778247456/t0(0) o101->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:3/0 lens 584/3264 e 0 to 0 dl 1556442873 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 02:14:28 fir-md1-s1 kernel: Lustre: 114878:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 28 02:15:07 fir-md1-s1 kernel: LNet: Service thread pid 105052 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:15:07 fir-md1-s1 kernel: Pid: 105052, comm: mdt01_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:15:07 fir-md1-s1 kernel: Call Trace: Apr 28 02:15:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:15:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:15:07 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:15:07 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:15:07 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:15:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:15:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:15:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:15:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:15:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:15:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556442907.105052 Apr 28 02:15:33 fir-md1-s1 kernel: LustreError: 105074:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556442843, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b34ec260240/0x378007fac861e7d7 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 263 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105074 timeout: 0 lvb_type: 0 Apr 28 02:15:33 fir-md1-s1 kernel: LustreError: 105074:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 28 02:17:23 fir-md1-s1 kernel: LNet: Service thread pid 105074 was inactive for 200.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:17:23 fir-md1-s1 kernel: Pid: 105074, comm: mdt00_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:17:23 fir-md1-s1 kernel: Call Trace: Apr 28 02:17:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:17:23 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:17:23 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:17:23 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:17:23 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:17:23 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:17:23 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:17:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:17:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:17:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:17:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556443043.105074 Apr 28 02:17:37 fir-md1-s1 kernel: Pid: 104330, comm: mdt01_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:17:37 fir-md1-s1 kernel: Call Trace: Apr 28 02:17:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:17:37 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:17:37 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:17:37 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:17:37 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:17:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:17:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:17:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:17:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:17:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:17:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556443057.104330 Apr 28 02:21:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8bfed59e-0ae2-8175-d0ee-2a61e691b9d0 (at 10.8.25.13@o2ib6) Apr 28 02:21:02 fir-md1-s1 kernel: Lustre: Skipped 1236 previous similar messages Apr 28 02:22:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1eed9d25-9802-9a67-1bce-978ce6293b9f (at 10.8.25.13@o2ib6) reconnecting Apr 28 02:22:04 fir-md1-s1 kernel: Lustre: Skipped 1245 previous similar messages Apr 28 02:31:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dcc834ec-a6c2-8290-f9d2-2cc7bbcfa1ab (at 10.8.17.6@o2ib6) Apr 28 02:31:02 fir-md1-s1 kernel: Lustre: Skipped 1313 previous similar messages Apr 28 02:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5483b434-bcd6-aaf1-da57-b35b3900df09 (at 10.8.17.6@o2ib6) reconnecting Apr 28 02:32:06 fir-md1-s1 kernel: Lustre: Skipped 1312 previous similar messages Apr 28 02:39:54 fir-md1-s1 kernel: Lustre: 105275:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-3), not sending early reply req@ffff8b6faae5c500 x1631555500821728/t0(0) o101->42800284-789e-e9cc-0ebd-dbacb154f6ac@10.9.107.31@o2ib4:29/0 lens 584/3264 e 1 to 0 dl 1556444399 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 02:39:54 fir-md1-s1 kernel: Lustre: 105275:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 28 02:41:01 fir-md1-s1 kernel: LustreError: 104336:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556444371, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b72e17021c0/0x378007fae931b07f lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 265 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104336 timeout: 0 lvb_type: 0 Apr 28 02:41:01 fir-md1-s1 kernel: LustreError: 104336:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 28 02:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.107.31@o2ib4) Apr 28 02:41:03 fir-md1-s1 kernel: Lustre: Skipped 1257 previous similar messages Apr 28 02:41:15 fir-md1-s1 kernel: Lustre: 104997:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b5d69197200 x1631814295236256/t0(0) o101->6da928ad-923b-cec3-5920-76a1fc1b7ec3@10.9.107.30@o2ib4:20/0 lens 584/3264 e 1 to 0 dl 1556444480 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 02:42:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 42800284-789e-e9cc-0ebd-dbacb154f6ac (at 10.9.107.31@o2ib4) reconnecting Apr 28 02:42:06 fir-md1-s1 kernel: Lustre: Skipped 1262 previous similar messages Apr 28 02:42:30 fir-md1-s1 kernel: LustreError: 104948:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556444460, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b42d73d3180/0x378007faeafb5970 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 266 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104948 timeout: 0 lvb_type: 0 Apr 28 02:42:51 fir-md1-s1 kernel: LNet: Service thread pid 104336 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:42:51 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 28 02:42:51 fir-md1-s1 kernel: Pid: 104336, comm: mdt03_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:42:51 fir-md1-s1 kernel: Call Trace: Apr 28 02:42:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:42:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:42:51 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:42:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:42:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:42:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:42:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:42:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:42:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:42:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:42:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556444571.104336 Apr 28 02:44:17 fir-md1-s1 kernel: Lustre: 104933:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b531923f200 x1631542865819568/t0(0) o101->fe368c15-0041-26b7-6d7c-54456281630d@10.8.17.9@o2ib6:22/0 lens 584/3264 e 0 to 0 dl 1556444662 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 02:44:17 fir-md1-s1 kernel: Lustre: 104933:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 28 02:44:20 fir-md1-s1 kernel: LNet: Service thread pid 104948 was inactive for 200.04s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:44:20 fir-md1-s1 kernel: Pid: 104948, comm: mdt02_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:44:20 fir-md1-s1 kernel: Call Trace: Apr 28 02:44:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:44:20 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:44:20 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:44:20 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:44:20 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:44:20 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:44:20 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:44:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:44:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:44:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:44:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556444660.104948 Apr 28 02:44:42 fir-md1-s1 kernel: LNet: Service thread pid 104726 was inactive for 200.63s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:44:42 fir-md1-s1 kernel: Pid: 104726, comm: mdt00_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:44:42 fir-md1-s1 kernel: Call Trace: Apr 28 02:44:42 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:44:42 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:44:42 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:44:42 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:44:42 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:44:42 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:44:42 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:44:42 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:44:42 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:44:42 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:44:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556444682.104726 Apr 28 02:45:22 fir-md1-s1 kernel: LustreError: 114836:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556444632, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4eff260d80/0x378007faee97695a lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 268 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114836 timeout: 0 lvb_type: 0 Apr 28 02:45:22 fir-md1-s1 kernel: LustreError: 114836:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 28 02:46:21 fir-md1-s1 kernel: LNet: Service thread pid 114906 was inactive for 200.41s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:46:21 fir-md1-s1 kernel: Pid: 114906, comm: mdt01_106 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:46:21 fir-md1-s1 kernel: Call Trace: Apr 28 02:46:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:46:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:46:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:46:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:46:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:46:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:46:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:46:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:46:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:46:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:46:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556444781.114906 Apr 28 02:47:13 fir-md1-s1 kernel: Pid: 114836, comm: mdt01_087 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:47:13 fir-md1-s1 kernel: Call Trace: Apr 28 02:47:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:47:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:47:13 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:47:13 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:47:13 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:47:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:47:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:47:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:47:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:47:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:47:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556444833.114836 Apr 28 02:49:57 fir-md1-s1 kernel: Lustre: 105122:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b47f4bd4500 x1631677241385248/t0(0) o101->1a944508-a353-6d52-a7d4-1133aba4850b@10.9.101.40@o2ib4:2/0 lens 584/3264 e 1 to 0 dl 1556445002 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 02:51:05 fir-md1-s1 kernel: LustreError: 105296:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556444975, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b626fa06c00/0x378007faf5912aac lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 280 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105296 timeout: 0 lvb_type: 0 Apr 28 02:51:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ee0973f3-679b-df7d-b6f9-105ad6d1a066 (at 10.9.101.40@o2ib4) Apr 28 02:51:07 fir-md1-s1 kernel: Lustre: Skipped 1408 previous similar messages Apr 28 02:52:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client fe368c15-0041-26b7-6d7c-54456281630d (at 10.8.17.9@o2ib6) reconnecting Apr 28 02:52:09 fir-md1-s1 kernel: Lustre: Skipped 1436 previous similar messages Apr 28 02:52:56 fir-md1-s1 kernel: LNet: Service thread pid 105296 was inactive for 200.59s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:52:56 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 28 02:52:56 fir-md1-s1 kernel: Pid: 105296, comm: mdt02_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:52:56 fir-md1-s1 kernel: Call Trace: Apr 28 02:52:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:52:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:52:56 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:52:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:52:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:52:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:52:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:52:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:52:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:52:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:52:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445176.105296 Apr 28 02:53:01 fir-md1-s1 kernel: Pid: 105422, comm: mdt00_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:53:01 fir-md1-s1 kernel: Call Trace: Apr 28 02:53:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:53:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:53:01 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:53:01 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:53:01 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:53:01 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:53:01 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:53:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:53:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:53:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:53:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445181.105422 Apr 28 02:53:02 fir-md1-s1 kernel: Pid: 104708, comm: mdt03_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:53:02 fir-md1-s1 kernel: Call Trace: Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: LNet: Service thread pid 104909 completed after 3288.03s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 02:53:02 fir-md1-s1 kernel: LNet: Service thread pid 114798 completed after 3282.80s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 02:53:02 fir-md1-s1 kernel: LNet: Skipped 83 previous similar messages Apr 28 02:53:02 fir-md1-s1 kernel: LNet: Skipped 83 previous similar messages Apr 28 02:53:02 fir-md1-s1 kernel: LustreError: 105288:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b728b257400 ns: mdt-fir-MDT0002_UUID lock: ffff8b515d379440/0x378007fab53a519c lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 288 type: IBT flags: 0x50200400000020 nid: 10.8.13.5@o2ib6 remote: 0x410a36eab34a07d1 expref: 2 pid: 105288 timeout: 0 lvb_type: 0 Apr 28 02:53:02 fir-md1-s1 kernel: LustreError: 105288:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Apr 28 02:53:02 fir-md1-s1 kernel: Lustre: 105288:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3253s); client may timeout. req@ffff8b5d64a3cb00 x1631677115368528/t0(0) o101->c731af70-4a29-f5b0-77a4-e7d55674e36d@10.8.13.5@o2ib6:19/0 lens 568/2296 e 0 to 0 dl 1556441929 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:53:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:53:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:53:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445182.104708 Apr 28 02:53:02 fir-md1-s1 kernel: Pid: 114794, comm: mdt03_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:53:02 fir-md1-s1 kernel: Call Trace: Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:53:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:53:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:53:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:53:35 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.22.30@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b45702d9b00/0x378007fafa0190c9 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x40/0x0 rrc: 212 type: IBT flags: 0x60200400000020 nid: 10.8.22.30@o2ib6 remote: 0x126ce2b3a89eeeec expref: 3479 pid: 114805 timeout: 445049 lvb_type: 0 Apr 28 02:53:45 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.26.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4dbb3f3a80/0x378007fafa33621a lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 400 type: IBT flags: 0x60200400000020 nid: 10.8.26.17@o2ib6 remote: 0xdfb21d65f5eec468 expref: 3483 pid: 105301 timeout: 445059 lvb_type: 0 Apr 28 02:54:05 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.18.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b62c529b840/0x378007fafa0249a1 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x40/0x0 rrc: 208 type: IBT flags: 0x60200400000020 nid: 10.8.18.23@o2ib6 remote: 0x98d73027e3623c1e expref: 645 pid: 105017 timeout: 445079 lvb_type: 0 Apr 28 02:54:05 fir-md1-s1 kernel: LustreError: 114862:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7225e2ec00 ns: mdt-fir-MDT0002_UUID lock: ffff8b4c8068c5c0/0x378007fafa3c427a lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x1b/0x0 rrc: 204 type: IBT flags: 0x50200000000000 nid: 10.8.26.17@o2ib6 remote: 0xdfb21d65f5eec54f expref: 13 pid: 114862 timeout: 0 lvb_type: 0 Apr 28 02:54:05 fir-md1-s1 kernel: LustreError: 114862:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Apr 28 02:54:05 fir-md1-s1 kernel: Lustre: 114812:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:20s); client may timeout. req@ffff8b4ae43a5700 x1631550840440192/t0(0) o101->782f60b1-717d-ff4f-8bab-0951282de63b@10.9.112.11@o2ib4:15/0 lens 1768/0 e 0 to 0 dl 1556445225 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 02:54:05 fir-md1-s1 kernel: LustreError: 114804:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.31@o2ib6: deadline 30:1s ago req@ffff8b4c26f3e900 x1631543697367376/t0(0) o101->cdee964a-caf8-9055-00d7-e4e0b6d655dc@10.8.27.31@o2ib6:4/0 lens 584/0 e 0 to 0 dl 1556445244 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 02:54:05 fir-md1-s1 kernel: Lustre: 114812:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 224 previous similar messages Apr 28 02:54:27 fir-md1-s1 kernel: LustreError: 104725:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b728c232c00 ns: mdt-fir-MDT0002_UUID lock: ffff8b7200e72f40/0x378007fafa3d36e0 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x40/0x0 rrc: 228 type: IBT flags: 0x50200400000020 nid: 10.8.17.21@o2ib6 remote: 0x8fb41e17adb7cee3 expref: 4 pid: 104725 timeout: 0 lvb_type: 0 Apr 28 02:54:27 fir-md1-s1 kernel: LustreError: 104725:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 7 previous similar messages Apr 28 02:56:35 fir-md1-s1 kernel: LNet: Service thread pid 104957 was inactive for 200.15s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 02:56:35 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 28 02:56:35 fir-md1-s1 kernel: Pid: 104957, comm: mdt01_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:56:35 fir-md1-s1 kernel: Call Trace: Apr 28 02:56:35 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:56:35 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:56:35 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 02:56:35 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 02:56:35 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 02:56:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:56:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:56:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:56:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:56:35 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:56:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445395.104957 Apr 28 02:56:35 fir-md1-s1 kernel: LNet: Service thread pid 104982 was inactive for 200.81s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 02:56:35 fir-md1-s1 kernel: LNet: Skipped 65 previous similar messages Apr 28 02:56:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445396.114841 Apr 28 02:56:57 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.25.16@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b6269bc4c80/0x378007fafa375d42 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 449 type: IBT flags: 0x60200400000020 nid: 10.8.25.16@o2ib6 remote: 0xc82853231afe1433 expref: 268 pid: 105284 timeout: 445131 lvb_type: 0 Apr 28 02:56:57 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 02:56:57 fir-md1-s1 kernel: LNet: Service thread pid 104728 completed after 222.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 02:56:57 fir-md1-s1 kernel: LNet: Skipped 114 previous similar messages Apr 28 02:57:09 fir-md1-s1 kernel: LustreError: 114863:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b730283f000 ns: mdt-fir-MDT0002_UUID lock: ffff8b62ec953a80/0x378007fafa887d54 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x40/0x0 rrc: 190 type: IBT flags: 0x50200400000020 nid: 10.8.25.16@o2ib6 remote: 0xc82853231afe16fd expref: 4 pid: 114863 timeout: 0 lvb_type: 0 Apr 28 02:57:09 fir-md1-s1 kernel: Lustre: 114863:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:8s); client may timeout. req@ffff8b5f9dd5d400 x1631834588278480/t0(0) o101->97f8c111-b167-46bf-e3cb-fb5ee2fe0022@10.8.25.16@o2ib6:27/0 lens 480/536 e 0 to 0 dl 1556445421 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 02:57:26 fir-md1-s1 kernel: LNet: Service thread pid 114889 was inactive for 200.45s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 02:57:26 fir-md1-s1 kernel: LNet: Skipped 97 previous similar messages Apr 28 02:57:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445446.114889 Apr 28 02:57:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445455.104966 Apr 28 02:57:47 fir-md1-s1 kernel: LNet: Service thread pid 114810 was inactive for 200.08s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 02:57:47 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 02:57:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445467.114810 Apr 28 02:57:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445468.105011 Apr 28 02:57:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445469.105084 Apr 28 02:58:17 fir-md1-s1 kernel: Pid: 114955, comm: mdt00_090 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 02:58:17 fir-md1-s1 kernel: Call Trace: Apr 28 02:58:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 02:58:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 02:58:17 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 02:58:17 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 02:58:17 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 02:58:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 02:58:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 02:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 02:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 02:58:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 02:58:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445497.114955 Apr 28 02:59:32 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 143s: evicting client at 10.8.27.9@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b62ec953f00/0x378007fafa887e5e lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x40/0x0 rrc: 184 type: IBT flags: 0x60200400000020 nid: 10.8.27.9@o2ib6 remote: 0xc1cb05cd5789bb34 expref: 717 pid: 114861 timeout: 445293 lvb_type: 0 Apr 28 02:59:32 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 02:59:32 fir-md1-s1 kernel: LNet: Service thread pid 105255 completed after 377.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 02:59:32 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Apr 28 02:59:37 fir-md1-s1 kernel: LustreError: 114933:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72faa86400 ns: mdt-fir-MDT0002_UUID lock: ffff8b3fcfee5100/0x378007fafa8895d4 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x1b/0x0 rrc: 178 type: IBT flags: 0x50200400000020 nid: 10.8.27.9@o2ib6 remote: 0xc1cb05cd5789bb57 expref: 4 pid: 114933 timeout: 0 lvb_type: 0 Apr 28 02:59:37 fir-md1-s1 kernel: Lustre: 114933:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (309:1s); client may timeout. req@ffff8b41cb6fec00 x1631558902992144/t0(0) o101->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:27/0 lens 584/1792 e 0 to 0 dl 1556445576 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:00:02 fir-md1-s1 kernel: Lustre: 105430:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b58cdaed700 x1631558903025344/t0(0) o101->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:7/0 lens 584/3264 e 0 to 0 dl 1556445607 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 03:00:02 fir-md1-s1 kernel: Lustre: 105430:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 696 previous similar messages Apr 28 03:00:04 fir-md1-s1 kernel: Pid: 105002, comm: mdt01_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:00:04 fir-md1-s1 kernel: Call Trace: Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:00:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:00:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:00:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445604.105002 Apr 28 03:00:04 fir-md1-s1 kernel: Pid: 105376, comm: mdt03_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:00:04 fir-md1-s1 kernel: Call Trace: Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:00:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:00:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:00:04 fir-md1-s1 kernel: Pid: 114949, comm: mdt02_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:00:04 fir-md1-s1 kernel: Call Trace: Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:00:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:00:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:00:04 fir-md1-s1 kernel: Pid: 104336, comm: mdt03_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:00:04 fir-md1-s1 kernel: Call Trace: Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:00:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:00:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:00:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:00:04 fir-md1-s1 kernel: LNet: Service thread pid 114912 was inactive for 201.21s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:00:04 fir-md1-s1 kernel: LNet: Skipped 109 previous similar messages Apr 28 03:00:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445605.114954 Apr 28 03:00:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445607.104937 Apr 28 03:00:07 fir-md1-s1 kernel: LNet: Service thread pid 114906 completed after 340.19s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:00:07 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Apr 28 03:00:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445613.114857 Apr 28 03:00:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445614.105281 Apr 28 03:00:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445622.104908 Apr 28 03:00:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445623.105276 Apr 28 03:00:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445629.105409 Apr 28 03:00:47 fir-md1-s1 kernel: LNet: Service thread pid 105018 completed after 452.13s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:00:47 fir-md1-s1 kernel: LNet: Skipped 27 previous similar messages Apr 28 03:00:47 fir-md1-s1 kernel: Lustre: 105000:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (355:25s); client may timeout. req@ffff8b4c26fd2400 x1631738110083584/t0(0) o101->25512127-e6de-b60b-cf78-f84b6ec57480@10.8.21.14@o2ib6:12/0 lens 584/1792 e 0 to 0 dl 1556445622 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:00:47 fir-md1-s1 kernel: Lustre: 105000:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 28 03:01:07 fir-md1-s1 kernel: LustreError: 114861:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556445577, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b6207be6780/0x378007fb00fe040b lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x13/0x8 rrc: 182 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114861 timeout: 0 lvb_type: 0 Apr 28 03:01:07 fir-md1-s1 kernel: LustreError: 114861:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 266 previous similar messages Apr 28 03:01:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9477ce19-71c8-c414-dbfd-f8cdde96a32b (at 10.9.101.34@o2ib4) Apr 28 03:01:10 fir-md1-s1 kernel: Lustre: Skipped 1868 previous similar messages Apr 28 03:02:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6d4d8c33-ecef-fdb4-378f-8ac8e4e1e0ce (at 10.9.101.34@o2ib4) reconnecting Apr 28 03:02:12 fir-md1-s1 kernel: Lustre: Skipped 1918 previous similar messages Apr 28 03:02:52 fir-md1-s1 kernel: LNet: Service thread pid 114829 was inactive for 200.30s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:02:52 fir-md1-s1 kernel: LNet: Skipped 29 previous similar messages Apr 28 03:02:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445772.114829 Apr 28 03:02:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445777.105134 Apr 28 03:03:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445780.104944 Apr 28 03:03:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445784.104389 Apr 28 03:03:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445785.104946 Apr 28 03:03:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445790.114915 Apr 28 03:03:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445792.105255 Apr 28 03:03:17 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 146s: evicting client at 10.8.7.28@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b6248f7ad00/0x378007fafa3a46e5 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 438 type: IBT flags: 0x60200400000020 nid: 10.8.7.28@o2ib6 remote: 0x4d12180a7dbc6887 expref: 637 pid: 104334 timeout: 445515 lvb_type: 0 Apr 28 03:03:17 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 03:03:17 fir-md1-s1 kernel: LNet: Service thread pid 114804 completed after 529.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:03:17 fir-md1-s1 kernel: LNet: Skipped 24 previous similar messages Apr 28 03:03:33 fir-md1-s1 kernel: LNet: Service thread pid 114816 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 03:03:33 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Apr 28 03:03:33 fir-md1-s1 kernel: Pid: 114816, comm: mdt01_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:03:33 fir-md1-s1 kernel: Call Trace: Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:03:33 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:03:33 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:03:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445813.114816 Apr 28 03:03:33 fir-md1-s1 kernel: Pid: 105286, comm: mdt01_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:03:33 fir-md1-s1 kernel: Call Trace: Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:03:33 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:03:33 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:03:33 fir-md1-s1 kernel: Pid: 104982, comm: mdt00_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:03:33 fir-md1-s1 kernel: Call Trace: Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:03:33 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:03:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:03:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:03:34 fir-md1-s1 kernel: Pid: 105123, comm: mdt02_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:03:34 fir-md1-s1 kernel: Call Trace: Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:03:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:03:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:03:34 fir-md1-s1 kernel: Pid: 114856, comm: mdt00_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:03:34 fir-md1-s1 kernel: Call Trace: Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:03:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:03:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:03:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:03:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445818.104724 Apr 28 03:04:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445847.114809 Apr 28 03:04:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445852.114838 Apr 28 03:04:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445877.105000 Apr 28 03:05:47 fir-md1-s1 kernel: LNet: Service thread pid 105025 completed after 752.05s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:05:47 fir-md1-s1 kernel: LustreError: 105290:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b523735d000 ns: mdt-fir-MDT0002_UUID lock: ffff8b6198faf080/0x378007fafa8ae62d lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x1b/0x0 rrc: 180 type: IBT flags: 0x50200400000020 nid: 10.8.22.30@o2ib6 remote: 0x126ce2b3a89f05dd expref: 361 pid: 105290 timeout: 0 lvb_type: 0 Apr 28 03:05:47 fir-md1-s1 kernel: LustreError: 105290:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 28 03:05:47 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Apr 28 03:05:56 fir-md1-s1 kernel: Lustre: 105027:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (423:338s); client may timeout. req@ffff8b45892b6f00 x1631834512441984/t0(0) o101->ed6bfb82-5106-e0fc-d3c8-7326144df664@10.8.25.19@o2ib6:6/0 lens 584/1792 e 0 to 0 dl 1556445618 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:05:56 fir-md1-s1 kernel: Lustre: 105027:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 28 03:06:05 fir-md1-s1 kernel: LNet: Service thread pid 105076 was inactive for 200.13s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:06:05 fir-md1-s1 kernel: LNet: Skipped 66 previous similar messages Apr 28 03:06:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556445965.105076 Apr 28 03:06:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446000.105094 Apr 28 03:07:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446020.105419 Apr 28 03:08:17 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.18.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b5279cddc40/0x378007fafa8ac97e lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x40/0x0 rrc: 185 type: IBT flags: 0x60200400000020 nid: 10.8.18.23@o2ib6 remote: 0x98d73027e3624ae2 expref: 112 pid: 114865 timeout: 445931 lvb_type: 0 Apr 28 03:08:17 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Apr 28 03:08:17 fir-md1-s1 kernel: Lustre: 114860:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:77s); client may timeout. req@ffff8b47b3e4d100 x1631745148183936/t0(0) o101->2651344c-4389-fab5-235e-abeae20a611d@10.8.19.6@o2ib6:0/0 lens 1768/0 e 0 to 0 dl 1556446020 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:08:17 fir-md1-s1 kernel: Lustre: 114874:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:77s); client may timeout. req@ffff8b4677ff7200 x1631543209894944/t0(0) o101->40db60e6-2b5f-e52d-2610-43b84e2f829d@10.8.29.1@o2ib6:0/0 lens 1768/0 e 0 to 0 dl 1556446020 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:08:17 fir-md1-s1 kernel: Lustre: 114874:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 03:08:17 fir-md1-s1 kernel: LustreError: 105069:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b724ba23800 ns: mdt-fir-MDT0002_UUID lock: ffff8b3c60f19d40/0x378007fafa8b4ca2 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe27:0x0].0x0 bits 0x1b/0x0 rrc: 182 type: IBT flags: 0x50200400000020 nid: 10.8.7.28@o2ib6 remote: 0x4d12180a7dbc6b43 expref: 6 pid: 105069 timeout: 0 lvb_type: 0 Apr 28 03:08:17 fir-md1-s1 kernel: LustreError: 105069:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Apr 28 03:08:17 fir-md1-s1 kernel: LustreError: 114874:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.23.19@o2ib6: deadline 30:1s ago req@ffff8b4bf872cb00 x1631714171939056/t0(0) o101->471bce3a-337b-2e06-9586-dd9fb5434029@10.8.23.19@o2ib6:16/0 lens 1768/0 e 0 to 0 dl 1556446096 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:08:17 fir-md1-s1 kernel: LustreError: 114874:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 28 03:08:17 fir-md1-s1 kernel: Lustre: 114860:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 781 previous similar messages Apr 28 03:09:06 fir-md1-s1 kernel: Lustre: 104909:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (694:185s); client may timeout. req@ffff8b4f732fe300 x1631543347715744/t0(0) o101->78ab2c22-394d-bdd4-0b8e-3553d6a47e28@10.8.17.2@o2ib6:10/0 lens 480/536 e 0 to 0 dl 1556445961 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:09:06 fir-md1-s1 kernel: Lustre: 104909:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Apr 28 03:09:08 fir-md1-s1 kernel: Pid: 114865, comm: mdt01_099 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:09:08 fir-md1-s1 kernel: Call Trace: Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:09:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:09:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:09:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446148.114865 Apr 28 03:09:08 fir-md1-s1 kernel: Pid: 104973, comm: mdt01_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:09:08 fir-md1-s1 kernel: Call Trace: Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:09:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:09:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:09:08 fir-md1-s1 kernel: Pid: 105293, comm: mdt01_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:09:08 fir-md1-s1 kernel: Call Trace: Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:09:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:09:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:09:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:09:17 fir-md1-s1 kernel: Pid: 104958, comm: mdt02_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:09:17 fir-md1-s1 kernel: Call Trace: Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:09:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:09:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:09:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446157.104958 Apr 28 03:09:17 fir-md1-s1 kernel: Pid: 114956, comm: mdt02_088 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:09:17 fir-md1-s1 kernel: Call Trace: Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:09:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:09:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:09:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:09:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446184.114964 Apr 28 03:09:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446185.114794 Apr 28 03:09:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446187.104338 Apr 28 03:09:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446190.114799 Apr 28 03:10:05 fir-md1-s1 kernel: Lustre: 104691:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5c60a61200 x1631559784667616/t0(0) o101->0db2d4e0-bf1e-3689-817d-00b10dcb4858@10.9.102.20@o2ib4:10/0 lens 584/3264 e 0 to 0 dl 1556446210 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 03:10:05 fir-md1-s1 kernel: Lustre: 104691:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1119 previous similar messages Apr 28 03:10:31 fir-md1-s1 kernel: LNet: Service thread pid 105236 was inactive for 200.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:10:31 fir-md1-s1 kernel: LNet: Skipped 22 previous similar messages Apr 28 03:10:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446231.105236 Apr 28 03:11:10 fir-md1-s1 kernel: LustreError: 114827:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556446180, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5f579c1680/0x378007fb0b78bf18 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 512 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114827 timeout: 0 lvb_type: 0 Apr 28 03:11:10 fir-md1-s1 kernel: LustreError: 114827:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 165 previous similar messages Apr 28 03:11:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8bfed59e-0ae2-8175-d0ee-2a61e691b9d0 (at 10.8.25.13@o2ib6) Apr 28 03:11:10 fir-md1-s1 kernel: Lustre: Skipped 2897 previous similar messages Apr 28 03:12:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446327.104696 Apr 28 03:12:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446328.114839 Apr 28 03:12:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1eed9d25-9802-9a67-1bce-978ce6293b9f (at 10.8.25.13@o2ib6) reconnecting Apr 28 03:12:12 fir-md1-s1 kernel: Lustre: Skipped 2944 previous similar messages Apr 28 03:12:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446346.104994 Apr 28 03:12:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446347.114892 Apr 28 03:12:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446348.105295 Apr 28 03:12:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446360.114910 Apr 28 03:12:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446370.104327 Apr 28 03:13:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446380.114827 Apr 28 03:13:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446384.114952 Apr 28 03:13:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446386.114826 Apr 28 03:13:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446387.114866 Apr 28 03:13:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446388.114856 Apr 28 03:13:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446389.114809 Apr 28 03:13:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446391.114878 Apr 28 03:13:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446424.105134 Apr 28 03:16:16 fir-md1-s1 kernel: LNet: Service thread pid 104329 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 03:16:16 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 03:16:16 fir-md1-s1 kernel: Pid: 104329, comm: mdt00_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:16:16 fir-md1-s1 kernel: Call Trace: Apr 28 03:16:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:16:16 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:16:16 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:16:16 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:16:16 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:16:16 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:16:16 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:16:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:16:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:16:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:16:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446576.104329 Apr 28 03:16:17 fir-md1-s1 kernel: Pid: 105052, comm: mdt01_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:16:17 fir-md1-s1 kernel: Call Trace: Apr 28 03:16:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:16:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:16:17 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:16:17 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:16:17 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:16:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:16:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:16:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:16:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:16:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:16:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446577.105052 Apr 28 03:16:25 fir-md1-s1 kernel: Pid: 114837, comm: mdt01_088 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:16:25 fir-md1-s1 kernel: Call Trace: Apr 28 03:16:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:16:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:16:25 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:16:25 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:16:25 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:16:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:16:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:16:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:16:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:16:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:16:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446585.114837 Apr 28 03:16:28 fir-md1-s1 kernel: Pid: 105270, comm: mdt02_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:16:28 fir-md1-s1 kernel: Call Trace: Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:16:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:16:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:16:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446588.105270 Apr 28 03:16:28 fir-md1-s1 kernel: Pid: 105034, comm: mdt00_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:16:28 fir-md1-s1 kernel: Call Trace: Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:16:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:16:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:16:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:20:15 fir-md1-s1 kernel: LNet: Service thread pid 104330 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:20:15 fir-md1-s1 kernel: LNet: Skipped 80 previous similar messages Apr 28 03:20:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446815.104330 Apr 28 03:20:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446816.114893 Apr 28 03:20:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556446817.105404 Apr 28 03:21:00 fir-md1-s1 kernel: Lustre: 105232:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5e49e0a050 x1631609826191616/t0(0) o101->56a57cb9-3d9f-bc51-dd74-55bd81619cfc@10.9.108.8@o2ib4:5/0 lens 584/3264 e 0 to 0 dl 1556446865 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 03:21:00 fir-md1-s1 kernel: Lustre: 105069:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3c2d03a400 x1631559571033984/t0(0) o101->94129cd9-9d4a-69ce-096f-5982bc910092@10.9.107.64@o2ib4:5/0 lens 584/3264 e 0 to 0 dl 1556446865 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 03:21:00 fir-md1-s1 kernel: Lustre: 105069:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Apr 28 03:21:00 fir-md1-s1 kernel: Lustre: 105232:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 28 03:21:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 034dbae0-fbe3-61a3-de53-979fb1d61338 (at 10.9.101.65@o2ib4) Apr 28 03:21:10 fir-md1-s1 kernel: Lustre: Skipped 2881 previous similar messages Apr 28 03:22:05 fir-md1-s1 kernel: LustreError: 104979:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556446835, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3d7a6d2d00/0x378007fb1a4be1ad lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 532 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104979 timeout: 0 lvb_type: 0 Apr 28 03:22:05 fir-md1-s1 kernel: LustreError: 104979:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 27 previous similar messages Apr 28 03:22:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2faef2d8-dc67-f384-07b6-111f344194c1 (at 10.9.101.65@o2ib4) reconnecting Apr 28 03:22:12 fir-md1-s1 kernel: Lustre: Skipped 2921 previous similar messages Apr 28 03:23:55 fir-md1-s1 kernel: Pid: 105289, comm: mdt02_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:23:55 fir-md1-s1 kernel: Call Trace: Apr 28 03:23:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:23:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:23:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:23:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447035.105289 Apr 28 03:23:55 fir-md1-s1 kernel: Pid: 114798, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:23:55 fir-md1-s1 kernel: Call Trace: Apr 28 03:23:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:23:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:23:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:23:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:23:56 fir-md1-s1 kernel: Pid: 114863, comm: mdt02_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:23:56 fir-md1-s1 kernel: Call Trace: Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:23:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:23:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:23:56 fir-md1-s1 kernel: Pid: 114823, comm: mdt00_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:23:56 fir-md1-s1 kernel: Call Trace: Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:23:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:23:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:23:56 fir-md1-s1 kernel: Pid: 104979, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:23:56 fir-md1-s1 kernel: Call Trace: Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:23:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:23:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:23:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:24:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447041.105068 Apr 28 03:24:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447042.114805 Apr 28 03:27:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447237.114882 Apr 28 03:28:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447332.104695 Apr 28 03:30:55 fir-md1-s1 kernel: LNet: Service thread pid 114861 was inactive for 200.71s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 03:30:55 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 03:30:55 fir-md1-s1 kernel: Pid: 114861, comm: mdt02_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:30:55 fir-md1-s1 kernel: Call Trace: Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:30:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:30:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:30:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447455.114861 Apr 28 03:30:55 fir-md1-s1 kernel: Pid: 105065, comm: mdt00_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:30:55 fir-md1-s1 kernel: Call Trace: Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:30:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:30:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:30:55 fir-md1-s1 kernel: Pid: 115000, comm: mdt02_107 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:30:55 fir-md1-s1 kernel: Call Trace: Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:30:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:30:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:30:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:30:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:30:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:30:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:30:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:30:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:30:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:30:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:31:00 fir-md1-s1 kernel: Lustre: 104911:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b4ae8b33300 x1631547542800080/t0(0) o101->d956f7b9-44a2-9ed9-108d-fc8f2e06e858@10.9.113.1@o2ib4:5/0 lens 1768/0 e 1 to 0 dl 1556447465 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 03:31:00 fir-md1-s1 kernel: Lustre: 104911:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5697 previous similar messages Apr 28 03:31:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d5599319-4de4-dba9-1833-3b742e852126 (at 10.9.102.43@o2ib4) Apr 28 03:31:10 fir-md1-s1 kernel: Lustre: Skipped 6879 previous similar messages Apr 28 03:31:28 fir-md1-s1 kernel: Pid: 114965, comm: mdt02_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:31:28 fir-md1-s1 kernel: Call Trace: Apr 28 03:31:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:31:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:31:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:31:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:31:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:31:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:31:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:31:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:31:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:31:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:31:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447488.114965 Apr 28 03:32:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c0855e8e-4398-d036-706b-ca397c044b92 (at 10.8.30.12@o2ib6) reconnecting Apr 28 03:32:12 fir-md1-s1 kernel: Lustre: Skipped 7615 previous similar messages Apr 28 03:33:08 fir-md1-s1 kernel: Pid: 114978, comm: mdt00_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:33:08 fir-md1-s1 kernel: Call Trace: Apr 28 03:33:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:33:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:33:08 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:33:08 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:33:08 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:33:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:33:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:33:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:33:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:33:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:33:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447588.114978 Apr 28 03:33:22 fir-md1-s1 kernel: LustreError: 105113:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556447511, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b72696cd580/0x378007fb1eec3d7d lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 545 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105113 timeout: 0 lvb_type: 0 Apr 28 03:33:22 fir-md1-s1 kernel: LustreError: 105113:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages Apr 28 03:33:39 fir-md1-s1 kernel: LNet: Service thread pid 105302 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:33:39 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Apr 28 03:33:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447619.105302 Apr 28 03:33:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447625.105232 Apr 28 03:35:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447712.105113 Apr 28 03:36:14 fir-md1-s1 kernel: Pid: 114943, comm: mdt00_088 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:36:14 fir-md1-s1 kernel: Call Trace: Apr 28 03:36:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:36:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:36:14 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:36:14 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:36:14 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:36:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:36:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:36:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:36:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:36:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:36:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447774.114943 Apr 28 03:37:06 fir-md1-s1 kernel: Pid: 105069, comm: mdt00_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:37:06 fir-md1-s1 kernel: Call Trace: Apr 28 03:37:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:37:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:37:06 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:37:06 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:37:06 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:37:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:37:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:37:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:37:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:37:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:37:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447826.105069 Apr 28 03:37:16 fir-md1-s1 kernel: Pid: 114969, comm: mdt02_095 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:37:16 fir-md1-s1 kernel: Call Trace: Apr 28 03:37:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:37:16 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:37:16 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:37:16 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:37:16 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:37:16 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:37:16 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:37:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:37:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:37:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:37:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447836.114969 Apr 28 03:37:21 fir-md1-s1 kernel: Pid: 114933, comm: mdt00_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:37:21 fir-md1-s1 kernel: Call Trace: Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:37:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:37:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:37:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447841.114933 Apr 28 03:37:21 fir-md1-s1 kernel: Pid: 114890, comm: mdt00_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:37:21 fir-md1-s1 kernel: Call Trace: Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:37:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:37:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:37:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:39:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556447973.105084 Apr 28 03:39:58 fir-md1-s1 kernel: LNet: Service thread pid 105296 completed after 2802.30s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:39:58 fir-md1-s1 kernel: LNet: Skipped 115 previous similar messages Apr 28 03:39:58 fir-md1-s1 kernel: Lustre: 105126:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:931s); client may timeout. req@ffff8b4cf3a8dd00 x1631295389000048/t0(0) o101->766c6e9e-6589-78e8-fb69-8836dc850825@10.8.28.2@o2ib6:27/0 lens 1768/0 e 0 to 0 dl 1556447067 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:39:58 fir-md1-s1 kernel: LustreError: 114833:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7225e2ec00 ns: mdt-fir-MDT0002_UUID lock: ffff8b359ee14c80/0x378007fafa3c6d06 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 535 type: IBT flags: 0x50200400000020 nid: 10.8.26.17@o2ib6 remote: 0xdfb21d65f5eec62f expref: 2 pid: 114833 timeout: 0 lvb_type: 0 Apr 28 03:39:58 fir-md1-s1 kernel: LustreError: 114833:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 18 previous similar messages Apr 28 03:39:58 fir-md1-s1 kernel: LustreError: 114828:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.26@o2ib6: deadline 30:1s ago req@ffff8b4c5da9d700 x1631546162955584/t0(0) o101->b4d83e54-8cb6-ea71-956c-e7a98e667a27@10.8.27.26@o2ib6:27/0 lens 584/0 e 0 to 0 dl 1556447997 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:39:58 fir-md1-s1 kernel: LustreError: 114828:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Apr 28 03:39:58 fir-md1-s1 kernel: Lustre: 105126:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 14711 previous similar messages Apr 28 03:40:27 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.7.33@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b36c87ab600/0x378007fafa3c1309 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 542 type: IBT flags: 0x60200400000020 nid: 10.8.7.33@o2ib6 remote: 0x24c49ba86c381f1d expref: 633 pid: 114854 timeout: 447861 lvb_type: 0 Apr 28 03:40:27 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 03:40:57 fir-md1-s1 kernel: LNet: Service thread pid 114970 completed after 2801.95s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:40:57 fir-md1-s1 kernel: LNet: Skipped 16 previous similar messages Apr 28 03:40:58 fir-md1-s1 kernel: LustreError: 23522:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.12.5@o2ib6 arrived at 1556448058 with bad export cookie 3999205221519731535 Apr 28 03:41:00 fir-md1-s1 kernel: Lustre: 114847:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4550aac800 x1631815007915424/t0(0) o101->74fb56c5-8bc6-38a9-8624-788945b7232f@10.9.115.2@o2ib4:5/0 lens 1768/0 e 0 to 0 dl 1556448065 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 03:41:00 fir-md1-s1 kernel: Lustre: 114847:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10480 previous similar messages Apr 28 03:41:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d616c982-d211-cf3d-a0e4-3f503ca75de5 (at 10.8.20.20@o2ib6) Apr 28 03:41:10 fir-md1-s1 kernel: Lustre: Skipped 10417 previous similar messages Apr 28 03:41:27 fir-md1-s1 kernel: LustreError: 114885:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7282f1d000 ns: mdt-fir-MDT0002_UUID lock: ffff8b3c5eb41b00/0x378007fafa8b622e lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 536 type: IBT flags: 0x50200400000020 nid: 10.8.12.25@o2ib6 remote: 0x63fc28c5cbf9c7e4 expref: 265 pid: 114885 timeout: 0 lvb_type: 0 Apr 28 03:41:27 fir-md1-s1 kernel: LustreError: 114885:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 28 03:41:27 fir-md1-s1 kernel: Lustre: 114821:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (2820:12s); client may timeout. req@ffff8b3dfafd3300 x1631673179006256/t0(0) o101->478054d9-fd04-55b5-a1af-bb0ac33a3631@10.8.12.5@o2ib6:15/0 lens 584/1792 e 0 to 0 dl 1556448075 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:41:27 fir-md1-s1 kernel: Lustre: 114821:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 03:41:57 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.12.27@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b571ea03180/0x378007fafa54dfd8 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 527 type: IBT flags: 0x60200400000020 nid: 10.8.12.27@o2ib6 remote: 0xc9009d550bee7dc1 expref: 276 pid: 104966 timeout: 447951 lvb_type: 0 Apr 28 03:41:57 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 03:41:57 fir-md1-s1 kernel: Lustre: 104985:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:89s); client may timeout. req@ffff8b48d862bf00 x1631654110356320/t0(0) o101->e891cc28-9c10-be1b-29fe-00592513d891@10.9.101.41@o2ib4:28/0 lens 480/0 e 0 to 0 dl 1556448028 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:41:57 fir-md1-s1 kernel: LustreError: 105048:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.7.33@o2ib6: deadline 30:79s ago req@ffff8b4ae43a0c00 x1631542816078288/t0(0) o101->170d6268-ca7a-a7d0-0083-35fb42e90690@10.8.7.33@o2ib6:8/0 lens 584/0 e 0 to 0 dl 1556448038 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 03:41:57 fir-md1-s1 kernel: LustreError: 105048:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 32 previous similar messages Apr 28 03:41:57 fir-md1-s1 kernel: Lustre: 104985:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1633 previous similar messages Apr 28 03:42:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ac0b0a5c-beb8-3040-e55d-d0c9dcd3f011 (at 10.8.1.18@o2ib6) reconnecting Apr 28 03:42:12 fir-md1-s1 kernel: Lustre: Skipped 10235 previous similar messages Apr 28 03:42:14 fir-md1-s1 kernel: LNet: Service thread pid 105423 completed after 2866.82s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:42:14 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 28 03:42:46 fir-md1-s1 kernel: Lustre: 105289:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8b5bc8f80300 x1631680649822192/t0(0) o101->b00eab08-a42a-c7f0-ff6c-15e54339a208@10.9.101.38@o2ib4:15/0 lens 576/1792 e 0 to 0 dl 1556448165 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 03:42:46 fir-md1-s1 kernel: Lustre: 105289:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 24 previous similar messages Apr 28 03:42:46 fir-md1-s1 kernel: LustreError: 105038:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7276317800 ns: mdt-fir-MDT0002_UUID lock: ffff8b4107b31680/0x378007fb1f3d9c1f lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 638 type: IBT flags: 0x50200400000020 nid: 10.9.107.31@o2ib4 remote: 0xbd12da8169fe7346 expref: 12 pid: 105038 timeout: 0 lvb_type: 0 Apr 28 03:42:46 fir-md1-s1 kernel: LustreError: 105038:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 19 previous similar messages Apr 28 03:43:46 fir-md1-s1 kernel: LustreError: 105134:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556448136, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b40a12633c0/0x378007fb1f447e3c lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 588 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105134 timeout: 0 lvb_type: 0 Apr 28 03:43:46 fir-md1-s1 kernel: LustreError: 105134:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 45 previous similar messages Apr 28 03:43:50 fir-md1-s1 kernel: Lustre: 105134:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (93:1s); client may timeout. req@ffff8b3bc5aaf500 x1631534695979008/t0(0) o101->a7e7b1ed-d2c3-ab2b-b707-67c13f24564d@10.9.101.69@o2ib4:17/0 lens 576/1792 e 0 to 0 dl 1556448229 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 03:43:50 fir-md1-s1 kernel: Lustre: 105134:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Apr 28 03:46:23 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.107.31@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b725ab5e300/0x378007fb21b71779 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 601 type: IBT flags: 0x60200400000020 nid: 10.9.107.31@o2ib4 remote: 0xbd12da8169fe838b expref: 3488 pid: 105080 timeout: 448217 lvb_type: 0 Apr 28 03:46:23 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 03:46:53 fir-md1-s1 kernel: LustreError: 105251:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7225e28400 ns: mdt-fir-MDT0002_UUID lock: ffff8b464e740000/0x378007fb21b79aff lrc: 3/0,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 596 type: IBT flags: 0x54a01400000020 nid: 10.9.101.38@o2ib4 remote: 0xba2f4f14bc276b95 expref: 16 pid: 105251 timeout: 0 lvb_type: 0 Apr 28 03:46:53 fir-md1-s1 kernel: LustreError: 105251:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 7 previous similar messages Apr 28 03:47:13 fir-md1-s1 kernel: LNet: Service thread pid 105025 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 03:47:13 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 03:47:13 fir-md1-s1 kernel: Pid: 105025, comm: mdt01_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:47:13 fir-md1-s1 kernel: Call Trace: Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:47:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:47:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:47:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448433.105025 Apr 28 03:47:13 fir-md1-s1 kernel: Pid: 114915, comm: mdt01_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:47:13 fir-md1-s1 kernel: Call Trace: Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:47:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:47:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:47:13 fir-md1-s1 kernel: Pid: 114798, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:47:13 fir-md1-s1 kernel: Call Trace: Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 03:47:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:47:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:47:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:47:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:47:14 fir-md1-s1 kernel: Pid: 105018, comm: mdt01_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:47:14 fir-md1-s1 kernel: Call Trace: Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:47:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:47:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:47:14 fir-md1-s1 kernel: Pid: 104971, comm: mdt01_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:47:14 fir-md1-s1 kernel: Call Trace: Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:47:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:47:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:47:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:47:14 fir-md1-s1 kernel: LNet: Service thread pid 105419 was inactive for 201.16s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:47:14 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 28 03:47:23 fir-md1-s1 kernel: LNet: Service thread pid 105376 completed after 210.48s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:47:23 fir-md1-s1 kernel: LNet: Skipped 254 previous similar messages Apr 28 03:48:49 fir-md1-s1 kernel: Lustre: 104727:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (184:112s); client may timeout. req@ffff8b4a37b00f00 x1631680650617408/t0(0) o101->b00eab08-a42a-c7f0-ff6c-15e54339a208@10.9.101.38@o2ib4:23/0 lens 480/536 e 0 to 0 dl 1556448417 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:48:49 fir-md1-s1 kernel: Lustre: 104727:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Apr 28 03:49:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448589.114909 Apr 28 03:50:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448613.105289 Apr 28 03:50:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448619.114908 Apr 28 03:51:02 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3d2f676f00 x1631814296217344/t0(0) o101->6da928ad-923b-cec3-5920-76a1fc1b7ec3@10.9.107.30@o2ib4:7/0 lens 480/568 e 0 to 0 dl 1556448667 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 03:51:02 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1834 previous similar messages Apr 28 03:51:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b1f80054-5beb-ff05-4714-cd8a29a93dfa (at 10.9.108.72@o2ib4) Apr 28 03:51:39 fir-md1-s1 kernel: Lustre: Skipped 3096 previous similar messages Apr 28 03:52:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ca15d879-1cb2-8780-e5e2-20230d9e27cf (at 10.8.28.3@o2ib6) reconnecting Apr 28 03:52:41 fir-md1-s1 kernel: Lustre: Skipped 2471 previous similar messages Apr 28 03:53:40 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.20.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4a9b2686c0/0x378007fb2c11d5d6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 834 type: IBT flags: 0x60200400000020 nid: 10.8.20.11@o2ib6 remote: 0xb2c5110cd1bc0a84 expref: 4380 pid: 105076 timeout: 448654 lvb_type: 0 Apr 28 03:53:40 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 28 03:53:43 fir-md1-s1 kernel: LustreError: 105109:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b45ebbcb000 ns: mdt-fir-MDT0002_UUID lock: ffff8b4b0fea8240/0x378007fb2c11d79d lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 828 type: IBT flags: 0x50200400000020 nid: 10.8.20.11@o2ib6 remote: 0xb2c5110cd1bc0a8b expref: 11 pid: 105109 timeout: 0 lvb_type: 0 Apr 28 03:53:43 fir-md1-s1 kernel: LustreError: 105109:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 12 previous similar messages Apr 28 03:53:43 fir-md1-s1 kernel: Lustre: 105109:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:1s); client may timeout. req@ffff8b4c4e6e5100 x1631930408090112/t0(0) o101->cf27766b-7a06-85c3-e1d8-3a06956d665b@10.8.20.11@o2ib6:7/0 lens 480/536 e 0 to 0 dl 1556448822 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 03:53:43 fir-md1-s1 kernel: Lustre: 105109:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 29 previous similar messages Apr 28 03:53:58 fir-md1-s1 kernel: Pid: 114947, comm: mdt02_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:53:58 fir-md1-s1 kernel: Call Trace: Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:53:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:53:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:53:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448838.114947 Apr 28 03:53:58 fir-md1-s1 kernel: Pid: 114867, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:53:58 fir-md1-s1 kernel: Call Trace: Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:53:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:53:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:53:58 fir-md1-s1 kernel: Pid: 114933, comm: mdt00_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:53:58 fir-md1-s1 kernel: Call Trace: Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:53:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:53:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:53:58 fir-md1-s1 kernel: Pid: 114892, comm: mdt01_104 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:53:58 fir-md1-s1 kernel: Call Trace: Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:53:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:53:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:53:58 fir-md1-s1 kernel: Pid: 105423, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:53:58 fir-md1-s1 kernel: Call Trace: Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:53:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:53:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:53:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:53:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448839.105234 Apr 28 03:54:25 fir-md1-s1 kernel: LustreError: 104355:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556448775, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b425ae7e9c0/0x378007fb2eb44213 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 829 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104355 timeout: 0 lvb_type: 0 Apr 28 03:54:25 fir-md1-s1 kernel: LustreError: 104355:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 853 previous similar messages Apr 28 03:54:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448868.104331 Apr 28 03:54:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448869.105052 Apr 28 03:54:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448870.105093 Apr 28 03:55:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448900.105100 Apr 28 03:56:13 fir-md1-s1 kernel: LNet: Service thread pid 104951 completed after 335.79s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 03:56:13 fir-md1-s1 kernel: LustreError: 105002:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.108.12@o2ib4: deadline 30:1s ago req@ffff8b466c2aa100 x1631573538314064/t0(0) o101->4dbe6048-7f70-8f0f-700e-3b78f70d5297@10.9.108.12@o2ib4:12/0 lens 624/0 e 0 to 0 dl 1556448972 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 03:56:13 fir-md1-s1 kernel: LustreError: 105002:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 03:56:13 fir-md1-s1 kernel: LNet: Skipped 208 previous similar messages Apr 28 03:56:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556448975.104355 Apr 28 03:57:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449023.115000 Apr 28 03:57:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449024.114815 Apr 28 03:58:30 fir-md1-s1 kernel: LNet: Service thread pid 105076 was inactive for 200.44s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 03:58:30 fir-md1-s1 kernel: LNet: Skipped 516 previous similar messages Apr 28 03:58:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449110.105076 Apr 28 03:59:33 fir-md1-s1 kernel: LNet: Service thread pid 104979 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 03:59:33 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 03:59:33 fir-md1-s1 kernel: Pid: 104979, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:59:33 fir-md1-s1 kernel: Call Trace: Apr 28 03:59:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:59:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:59:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:59:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:59:33 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:59:33 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:59:34 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:59:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:59:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:59:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:59:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:59:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:59:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:59:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:59:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:59:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:59:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:59:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449174.104979 Apr 28 03:59:39 fir-md1-s1 kernel: Pid: 114916, comm: mdt02_068 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:59:39 fir-md1-s1 kernel: Call Trace: Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:59:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:59:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:59:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449179.114916 Apr 28 03:59:39 fir-md1-s1 kernel: Pid: 105035, comm: mdt02_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:59:39 fir-md1-s1 kernel: Call Trace: Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:59:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:59:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:59:39 fir-md1-s1 kernel: Pid: 105002, comm: mdt01_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:59:39 fir-md1-s1 kernel: Call Trace: Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:59:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:59:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:59:39 fir-md1-s1 kernel: Pid: 104957, comm: mdt01_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 03:59:39 fir-md1-s1 kernel: Call Trace: Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 03:59:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 03:59:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 03:59:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 03:59:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449180.114799 Apr 28 04:00:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449229.105025 Apr 28 04:01:02 fir-md1-s1 kernel: Lustre: 114865:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4f03335700 x1631729846282544/t0(0) o101->f0428d50-4566-1843-edbe-bdc8dbd6d1bb@10.8.19.4@o2ib6:7/0 lens 1768/0 e 0 to 0 dl 1556449267 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 04:01:02 fir-md1-s1 kernel: Lustre: 114865:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2610 previous similar messages Apr 28 04:01:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.31.6@o2ib6) Apr 28 04:01:39 fir-md1-s1 kernel: Lustre: Skipped 3820 previous similar messages Apr 28 04:01:58 fir-md1-s1 kernel: LustreError: 114802:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.107.25@o2ib4: deadline 30:1s ago req@ffff8b4677b75400 x1631558044971904/t0(0) o101->86f9cd29-c493-e920-8336-c19de9946cf3@10.9.107.25@o2ib4:27/0 lens 576/0 e 0 to 0 dl 1556449317 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:01:58 fir-md1-s1 kernel: LustreError: 114802:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 11 previous similar messages Apr 28 04:02:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449329.114985 Apr 28 04:02:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449333.104696 Apr 28 04:02:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449334.105067 Apr 28 04:02:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d36980b7-2b04-f724-0e6b-cf989e4d7da2 (at 10.8.1.34@o2ib6) reconnecting Apr 28 04:02:41 fir-md1-s1 kernel: Lustre: Skipped 4246 previous similar messages Apr 28 04:02:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449363.104329 Apr 28 04:04:28 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.30.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b61c2f9ba80/0x378007fb2c159ce7 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 762 type: IBT flags: 0x60200400000020 nid: 10.8.30.20@o2ib6 remote: 0xe533b347a6919dde expref: 3486 pid: 105304 timeout: 449302 lvb_type: 0 Apr 28 04:04:28 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 04:04:28 fir-md1-s1 kernel: Lustre: 114874:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff8b52af749500 x1631672693090384/t0(0) o101->f28b9d7c-ae20-e506-5bbf-0fe9ac4b3bdd@10.9.108.53@o2ib4:28/0 lens 480/0 e 0 to 0 dl 1556449348 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:04:28 fir-md1-s1 kernel: LustreError: 114933:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b710bfb0000 ns: mdt-fir-MDT0002_UUID lock: ffff8b6246b5b3c0/0x378007fb2c169b64 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 753 type: IBT flags: 0x50200400000020 nid: 10.8.30.20@o2ib6 remote: 0xe533b347a6919f74 expref: 1740 pid: 114933 timeout: 0 lvb_type: 0 Apr 28 04:04:28 fir-md1-s1 kernel: LustreError: 114933:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Apr 28 04:04:28 fir-md1-s1 kernel: Lustre: 114874:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1980 previous similar messages Apr 28 04:04:28 fir-md1-s1 kernel: LustreError: 105126:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.23.26@o2ib6: deadline 30:1s ago req@ffff8b4b5b2a0c00 x1631386099238560/t0(0) o101->c09269e2-f043-3732-7a38-b13ea289c361@10.8.23.26@o2ib6:27/0 lens 576/0 e 0 to 0 dl 1556449467 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:04:28 fir-md1-s1 kernel: LustreError: 105126:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 26 previous similar messages Apr 28 04:05:19 fir-md1-s1 kernel: Pid: 114817, comm: mdt00_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:05:19 fir-md1-s1 kernel: Call Trace: Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:05:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:05:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:05:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449519.114817 Apr 28 04:05:19 fir-md1-s1 kernel: Pid: 114810, comm: mdt03_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:05:19 fir-md1-s1 kernel: Call Trace: Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:05:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:05:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:05:19 fir-md1-s1 kernel: Pid: 114844, comm: mdt00_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:05:19 fir-md1-s1 kernel: Call Trace: Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:05:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:05:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:05:19 fir-md1-s1 kernel: Pid: 114908, comm: mdt00_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:05:19 fir-md1-s1 kernel: Call Trace: Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:05:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:05:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:05:19 fir-md1-s1 kernel: Pid: 114947, comm: mdt02_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:05:19 fir-md1-s1 kernel: Call Trace: Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:05:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:05:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:05:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:05:58 fir-md1-s1 kernel: LustreError: 114818:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556449468, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3f266c3180/0x378007fb36176947 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 771 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114818 timeout: 0 lvb_type: 0 Apr 28 04:05:58 fir-md1-s1 kernel: LustreError: 114818:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 121 previous similar messages Apr 28 04:06:58 fir-md1-s1 kernel: LNet: Service thread pid 114864 completed after 980.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 04:06:58 fir-md1-s1 kernel: LNet: Skipped 159 previous similar messages Apr 28 04:06:58 fir-md1-s1 kernel: LustreError: 114867:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.104.9@o2ib4: deadline 30:1s ago req@ffff8b4838236c00 x1631542576240144/t0(0) o101->6576db86-576c-58c4-907d-b54174076c6b@10.9.104.9@o2ib4:27/0 lens 576/0 e 0 to 0 dl 1556449617 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 04:07:03 fir-md1-s1 kernel: LustreError: 105368:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8b53076aac50 x1631612299319168/t0(0) o4->e9c5a421-c400-967f-fe3d-134ef9bd0037@10.8.7.6@o2ib6:9/0 lens 488/448 e 0 to 0 dl 1556449629 ref 1 fl Interpret:/0/0 rc 0/0 Apr 28 04:07:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with e9c5a421-c400-967f-fe3d-134ef9bd0037 (at 10.8.7.6@o2ib6), client will retry: rc = -110 Apr 28 04:07:03 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Apr 28 04:07:36 fir-md1-s1 kernel: LustreError: 114862:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.20.17@o2ib6: deadline 30:1s ago req@ffff8b4653230000 x1631750587937792/t0(0) o101->e6780d30-74e1-317f-cff3-98cea097e023@10.8.20.17@o2ib6:5/0 lens 1776/0 e 0 to 0 dl 1556449655 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:07:36 fir-md1-s1 kernel: LustreError: 114862:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 40 previous similar messages Apr 28 04:08:05 fir-md1-s1 kernel: LNetError: 20277:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 28 04:08:35 fir-md1-s1 kernel: LNetError: 20278:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 28 04:10:25 fir-md1-s1 kernel: LustreError: 105306:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.108.34@o2ib4: deadline 30:41s ago req@ffff8b4c4c72f800 x1631534746893008/t0(0) o101->081ba7d9-3e8f-6768-7b15-6d13e53f4563@10.9.108.34@o2ib4:14/0 lens 576/0 e 0 to 0 dl 1556449784 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:10:25 fir-md1-s1 kernel: LustreError: 105306:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 04:11:07 fir-md1-s1 kernel: Lustre: 104944:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b46476f7b00 x1631546792388688/t219283655406(0) o36->f7f29fbd-f06d-1e4f-a662-2d2ae362522d@10.8.8.7@o2ib6:12/0 lens 488/3152 e 0 to 0 dl 1556449872 ref 2 fl Interpret:/2/0 rc 0/0 Apr 28 04:11:07 fir-md1-s1 kernel: Lustre: 104944:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8205 previous similar messages Apr 28 04:11:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.1.2@o2ib6) Apr 28 04:11:43 fir-md1-s1 kernel: Lustre: Skipped 6008 previous similar messages Apr 28 04:12:12 fir-md1-s1 kernel: LNet: Service thread pid 104985 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 04:12:12 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 04:12:12 fir-md1-s1 kernel: Pid: 104985, comm: mdt01_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:12:12 fir-md1-s1 kernel: Call Trace: Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:12:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:12:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:12:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449932.104985 Apr 28 04:12:12 fir-md1-s1 kernel: Pid: 105069, comm: mdt00_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:12:12 fir-md1-s1 kernel: Call Trace: Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:12:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:12:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:12:12 fir-md1-s1 kernel: Pid: 114911, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:12:12 fir-md1-s1 kernel: Call Trace: Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:12:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:12:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:12:12 fir-md1-s1 kernel: Pid: 105109, comm: mdt01_040 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:12:12 fir-md1-s1 kernel: Call Trace: Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:12:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:12:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:12:12 fir-md1-s1 kernel: Pid: 114892, comm: mdt01_104 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:12:12 fir-md1-s1 kernel: Call Trace: Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 04:12:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:12:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:12:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:12:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:12:12 fir-md1-s1 kernel: LNet: Service thread pid 114846 was inactive for 200.38s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 04:12:12 fir-md1-s1 kernel: LNet: Skipped 100 previous similar messages Apr 28 04:12:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449933.114797 Apr 28 04:12:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449934.114937 Apr 28 04:12:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449935.114867 Apr 28 04:12:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449949.114802 Apr 28 04:12:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449954.105296 Apr 28 04:12:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449955.105288 Apr 28 04:12:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6a714f21-ad45-ebd6-03a1-3db057c2e0f0 (at 10.8.1.2@o2ib6) reconnecting Apr 28 04:12:45 fir-md1-s1 kernel: Lustre: Skipped 5555 previous similar messages Apr 28 04:13:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556449984.105127 Apr 28 04:13:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450015.114856 Apr 28 04:13:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450025.114835 Apr 28 04:14:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450042.114968 Apr 28 04:14:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450043.114914 Apr 28 04:14:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450052.114823 Apr 28 04:14:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450056.114849 Apr 28 04:14:33 fir-md1-s1 kernel: LustreError: 114978:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b71367cd400 ns: mdt-fir-MDT0002_UUID lock: ffff8b3867349680/0x378007fb362d0b6d lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 760 type: IBT flags: 0x50200400000020 nid: 10.9.102.26@o2ib4 remote: 0x489ef17ad1fda6d2 expref: 4 pid: 114978 timeout: 0 lvb_type: 0 Apr 28 04:14:33 fir-md1-s1 kernel: LustreError: 114978:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 16 previous similar messages Apr 28 04:14:33 fir-md1-s1 kernel: Lustre: 114978:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:250s); client may timeout. req@ffff8b3cc3a39500 x1631651149554912/t0(0) o101->5924c705-ac90-422d-3e46-a0ea5d70203c@10.9.102.26@o2ib4:21/0 lens 576/1792 e 0 to 0 dl 1556449823 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 04:14:33 fir-md1-s1 kernel: Lustre: 114978:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4410 previous similar messages Apr 28 04:14:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450087.115004 Apr 28 04:14:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450088.114931 Apr 28 04:14:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450098.105306 Apr 28 04:15:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450118.104968 Apr 28 04:16:03 fir-md1-s1 kernel: LustreError: 114969:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556450073, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5ea8011d40/0x378007fb3cfffcaf lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 768 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114969 timeout: 0 lvb_type: 0 Apr 28 04:16:03 fir-md1-s1 kernel: LustreError: 114969:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 579 previous similar messages Apr 28 04:17:03 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.8@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b4e1b25c5c0/0x378007fb362d0c93 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 769 type: IBT flags: 0x60200400000020 nid: 10.9.108.8@o2ib4 remote: 0x4ccfc25b6f096e50 expref: 89 pid: 104691 timeout: 450057 lvb_type: 0 Apr 28 04:17:03 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 11 previous similar messages Apr 28 04:17:03 fir-md1-s1 kernel: LNet: Service thread pid 114945 completed after 491.44s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 04:17:03 fir-md1-s1 kernel: LNet: Skipped 268 previous similar messages Apr 28 04:17:53 fir-md1-s1 kernel: Pid: 105035, comm: mdt02_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:17:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450273.105035 Apr 28 04:17:53 fir-md1-s1 kernel: Pid: 105282, comm: mdt01_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:17:53 fir-md1-s1 kernel: Pid: 114847, comm: mdt01_093 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:17:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:17:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:17:54 fir-md1-s1 kernel: Pid: 105233, comm: mdt02_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:17:54 fir-md1-s1 kernel: Call Trace: Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:17:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:17:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:17:54 fir-md1-s1 kernel: Pid: 104692, comm: mdt01_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:17:54 fir-md1-s1 kernel: Call Trace: Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:17:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:17:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:17:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:19:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450376.114804 Apr 28 04:20:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450428.114956 Apr 28 04:21:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450460.105409 Apr 28 04:21:21 fir-md1-s1 kernel: Lustre: 115018:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b42d20d3000 x1631559785188912/t0(0) o101->0db2d4e0-bf1e-3689-817d-00b10dcb4858@10.9.102.20@o2ib4:26/0 lens 584/3264 e 0 to 0 dl 1556450486 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 04:21:21 fir-md1-s1 kernel: Lustre: 115018:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Apr 28 04:21:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ef3b86f6-00d0-6322-ff96-5d5c11067d63 (at 10.9.101.32@o2ib4) Apr 28 04:21:43 fir-md1-s1 kernel: Lustre: Skipped 2434 previous similar messages Apr 28 04:22:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9101e47c-5087-9ebf-bb20-6ff2bf817bf0 (at 10.9.101.32@o2ib4) reconnecting Apr 28 04:22:45 fir-md1-s1 kernel: Lustre: Skipped 2425 previous similar messages Apr 28 04:23:28 fir-md1-s1 kernel: LNet: Service thread pid 104979 was inactive for 200.18s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 04:23:28 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 04:23:28 fir-md1-s1 kernel: Pid: 104979, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:23:28 fir-md1-s1 kernel: Call Trace: Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:23:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:23:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:23:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450608.104979 Apr 28 04:23:28 fir-md1-s1 kernel: Pid: 105088, comm: mdt03_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:23:28 fir-md1-s1 kernel: Call Trace: Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:23:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:23:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:23:28 fir-md1-s1 kernel: Pid: 114866, comm: mdt01_100 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:23:28 fir-md1-s1 kernel: Call Trace: Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:23:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:23:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:23:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:23:34 fir-md1-s1 kernel: Pid: 105100, comm: mdt00_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:23:34 fir-md1-s1 kernel: Call Trace: Apr 28 04:23:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:23:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:23:34 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:23:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:23:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:23:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:23:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:23:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:23:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450614.105100 Apr 28 04:24:16 fir-md1-s1 kernel: Pid: 114938, comm: mdt00_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:24:16 fir-md1-s1 kernel: Call Trace: Apr 28 04:24:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:24:16 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:24:16 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:24:16 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:24:16 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:24:16 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:24:16 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:24:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:24:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:24:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:24:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450656.114938 Apr 28 04:24:17 fir-md1-s1 kernel: LNet: Service thread pid 114833 was inactive for 200.48s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 04:24:17 fir-md1-s1 kernel: LNet: Skipped 324 previous similar messages Apr 28 04:24:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450657.114833 Apr 28 04:25:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450701.104780 Apr 28 04:25:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450707.114964 Apr 28 04:25:49 fir-md1-s1 kernel: Lustre: 114910:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:12s); client may timeout. req@ffff8b4b4e30e300 x1631297012600576/t0(0) o101->92a5fc1a-0f67-1260-3d67-1ac1c4c2c6d6@10.8.28.1@o2ib6:7/0 lens 1768/0 e 0 to 0 dl 1556450737 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:25:49 fir-md1-s1 kernel: LustreError: 114910:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.7.7@o2ib6: deadline 30:1s ago req@ffff8b5215a98000 x1631584902615184/t0(0) o101->691e4f7c-24cc-f758-5354-96c1b01f1439@10.8.7.7@o2ib6:18/0 lens 576/0 e 0 to 0 dl 1556450748 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:26:11 fir-md1-s1 kernel: LustreError: 105119:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556450681, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b457aeee540/0x378007fb4dcbf939 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 731 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105119 timeout: 0 lvb_type: 0 Apr 28 04:26:11 fir-md1-s1 kernel: LustreError: 105119:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 56 previous similar messages Apr 28 04:26:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450780.114948 Apr 28 04:27:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450853.114810 Apr 28 04:28:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450881.105119 Apr 28 04:28:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450888.104957 Apr 28 04:28:19 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.2.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4d87fb7080/0x378007fb362dae03 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 732 type: IBT flags: 0x60200400000020 nid: 10.8.2.17@o2ib6 remote: 0x98f02196675d6ab expref: 176 pid: 114910 timeout: 450733 lvb_type: 0 Apr 28 04:28:19 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Apr 28 04:28:19 fir-md1-s1 kernel: LNet: Service thread pid 115010 completed after 1167.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 04:28:19 fir-md1-s1 kernel: LNet: Skipped 59 previous similar messages Apr 28 04:28:21 fir-md1-s1 kernel: LustreError: 105300:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.20.13@o2ib6: deadline 30:2s ago req@ffff8b4b99235700 x1631942670613744/t0(0) o101->4f54a8f9-38f2-76c9-a8e3-5882d8ecdde5@10.8.20.13@o2ib6:19/0 lens 1768/0 e 0 to 0 dl 1556450899 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:28:21 fir-md1-s1 kernel: LustreError: 105300:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 04:28:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450902.105289 Apr 28 04:28:28 fir-md1-s1 kernel: Pid: 104332, comm: mdt01_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:28:28 fir-md1-s1 kernel: Call Trace: Apr 28 04:28:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:28:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:28:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:28:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:28:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:28:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:28:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:28:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:28:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:28:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:28:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450908.104332 Apr 28 04:29:10 fir-md1-s1 kernel: Pid: 104950, comm: mdt00_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:29:10 fir-md1-s1 kernel: Call Trace: Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:29:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:29:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:29:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556450950.104950 Apr 28 04:29:10 fir-md1-s1 kernel: Pid: 114910, comm: mdt01_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:29:10 fir-md1-s1 kernel: Call Trace: Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:29:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:29:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:29:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:29:22 fir-md1-s1 kernel: LustreError: 23538:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.6@o2ib4 arrived at 1556450962 with bad export cookie 3999205221519724416 Apr 28 04:29:50 fir-md1-s1 kernel: LustreError: 104940:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7336143000 ns: mdt-fir-MDT0002_UUID lock: ffff8b51cc0f33c0/0x378007fb362dcc41 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 716 type: IBT flags: 0x50200400000020 nid: 10.8.2.17@o2ib6 remote: 0x98f02196675d6b2 expref: 8 pid: 104940 timeout: 0 lvb_type: 0 Apr 28 04:29:50 fir-md1-s1 kernel: LustreError: 104940:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 04:31:22 fir-md1-s1 kernel: Lustre: 114860:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b700a74a700 x1631558772497408/t0(0) o101->5af85e95-71ec-5689-9879-f126f8845b44@10.8.27.1@o2ib6:27/0 lens 1768/0 e 0 to 0 dl 1556451087 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 04:31:22 fir-md1-s1 kernel: Lustre: 114860:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4477 previous similar messages Apr 28 04:31:41 fir-md1-s1 kernel: Pid: 105286, comm: mdt01_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:31:41 fir-md1-s1 kernel: Call Trace: Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:31:41 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:31:41 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:31:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451101.105286 Apr 28 04:31:41 fir-md1-s1 kernel: Pid: 114929, comm: mdt02_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:31:41 fir-md1-s1 kernel: Call Trace: Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:31:41 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:31:41 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:31:41 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:31:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to f134f44b-e342-3bfa-78c5-0f433694e51b (at 10.8.24.11@o2ib6) Apr 28 04:31:43 fir-md1-s1 kernel: Lustre: Skipped 4866 previous similar messages Apr 28 04:32:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451130.114998 Apr 28 04:32:20 fir-md1-s1 kernel: LustreError: 104977:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.30.3@o2ib6: deadline 30:1s ago req@ffff8b480affb600 x1631762134432992/t0(0) o101->d7625fd7-bc27-0101-5a98-8752bce99283@10.8.30.3@o2ib6:19/0 lens 1792/0 e 0 to 0 dl 1556451139 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 04:32:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1c578c74-5128-6e3f-cdf7-83221a90bc4e (at 10.8.27.8@o2ib6) reconnecting Apr 28 04:32:47 fir-md1-s1 kernel: Lustre: Skipped 5269 previous similar messages Apr 28 04:33:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451191.114922 Apr 28 04:35:40 fir-md1-s1 kernel: LNet: Service thread pid 104951 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 04:35:40 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 04:35:40 fir-md1-s1 kernel: Pid: 104951, comm: mdt01_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:35:40 fir-md1-s1 kernel: Call Trace: Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:35:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:35:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:35:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451340.104951 Apr 28 04:35:40 fir-md1-s1 kernel: Pid: 104708, comm: mdt03_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:35:40 fir-md1-s1 kernel: Call Trace: Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:35:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:35:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:35:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:35:59 fir-md1-s1 kernel: Pid: 105010, comm: mdt00_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:35:59 fir-md1-s1 kernel: Call Trace: Apr 28 04:35:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:35:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:35:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:35:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:35:59 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:35:59 fir-md1-s1 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Apr 28 04:35:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:35:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:35:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:35:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:35:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:35:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:35:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451359.105010 Apr 28 04:36:11 fir-md1-s1 kernel: Pid: 105305, comm: mdt02_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:36:11 fir-md1-s1 kernel: Call Trace: Apr 28 04:36:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:36:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:36:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:36:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:36:11 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:36:11 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:36:11 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:36:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:36:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:36:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:36:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:36:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:36:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:36:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:36:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:36:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:36:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:36:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451372.105305 Apr 28 04:37:14 fir-md1-s1 kernel: Lustre: 114885:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (91:1611s); client may timeout. req@ffff8b42b0a9b000 x1631651149555456/t0(0) o101->5924c705-ac90-422d-3e46-a0ea5d70203c@10.9.102.26@o2ib4:22/0 lens 584/1792 e 0 to 0 dl 1556449823 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 04:37:14 fir-md1-s1 kernel: Lustre: 114885:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3784 previous similar messages Apr 28 04:37:45 fir-md1-s1 kernel: LustreError: 104727:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.11.29@o2ib6: deadline 30:283s ago req@ffff8b465325dd00 x1631377968755328/t0(0) o400->@:2/0 lens 224/0 e 0 to 0 dl 1556451182 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:37:45 fir-md1-s1 kernel: LustreError: 104727:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Apr 28 04:37:45 fir-md1-s1 kernel: LustreError: 114921:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b36d3fe1b00 x1631591725682880/t0(0) o104->fir-MDT0002@10.8.28.5@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 04:38:44 fir-md1-s1 kernel: LustreError: 114916:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556451434, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b565a64ca40/0x378007fb4e93c8b6 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 712 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114916 timeout: 0 lvb_type: 0 Apr 28 04:38:44 fir-md1-s1 kernel: LustreError: 114916:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 38 previous similar messages Apr 28 04:39:00 fir-md1-s1 kernel: LNet: Service thread pid 114846 completed after 1807.95s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 04:39:00 fir-md1-s1 kernel: LustreError: 104977:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.28.5@o2ib6: deadline 30:26s ago req@ffff8b4cfcefbc00 x1631542397260432/t0(0) o400->@:4/0 lens 224/0 e 0 to 0 dl 1556451514 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:39:00 fir-md1-s1 kernel: LustreError: 104977:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Apr 28 04:39:00 fir-md1-s1 kernel: LNet: Skipped 55 previous similar messages Apr 28 04:39:30 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.38@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b41fb64d100/0x378007fb362eadb4 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 703 type: IBT flags: 0x60200400000020 nid: 10.9.101.38@o2ib4 remote: 0xba2f4f14bc2ef42a expref: 3496 pid: 114817 timeout: 451404 lvb_type: 0 Apr 28 04:39:30 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Apr 28 04:40:00 fir-md1-s1 kernel: LustreError: 104772:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72ff226800 ns: mdt-fir-MDT0002_UUID lock: ffff8b4210719440/0x378007fb362ee83f lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 697 type: IBT flags: 0x50200400000020 nid: 10.9.101.68@o2ib4 remote: 0xf1b661a3e3ac02a1 expref: 4 pid: 104772 timeout: 0 lvb_type: 0 Apr 28 04:40:00 fir-md1-s1 kernel: LustreError: 104772:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 8 previous similar messages Apr 28 04:40:34 fir-md1-s1 kernel: Pid: 114916, comm: mdt02_068 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:40:34 fir-md1-s1 kernel: Call Trace: Apr 28 04:40:34 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:40:34 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:40:34 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:40:34 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:40:34 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:40:34 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:40:34 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:40:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:40:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:40:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:40:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451634.114916 Apr 28 04:40:34 fir-md1-s1 kernel: LNet: Service thread pid 115014 was inactive for 200.29s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 04:40:34 fir-md1-s1 kernel: LNet: Skipped 62 previous similar messages Apr 28 04:41:05 fir-md1-s1 kernel: Pid: 114860, comm: mdt01_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:41:05 fir-md1-s1 kernel: Call Trace: Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:41:05 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:41:05 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:41:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451665.114860 Apr 28 04:41:05 fir-md1-s1 kernel: Pid: 105117, comm: mdt00_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:41:05 fir-md1-s1 kernel: Call Trace: Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:41:05 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:41:05 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:41:05 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:41:08 fir-md1-s1 kernel: Pid: 104907, comm: mdt01_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:41:08 fir-md1-s1 kernel: Call Trace: Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:41:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:41:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:41:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451668.104907 Apr 28 04:41:08 fir-md1-s1 kernel: Pid: 114928, comm: mdt00_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:41:08 fir-md1-s1 kernel: Call Trace: Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:41:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:41:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:41:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:41:29 fir-md1-s1 kernel: Lustre: 105096:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3e04568000 x1631725391676384/t0(0) o101->2f6da7ed-4a14-56cc-7753-8619bcc532e0@10.8.30.20@o2ib6:4/0 lens 568/0 e 0 to 0 dl 1556451694 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 04:41:29 fir-md1-s1 kernel: Lustre: 105096:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7273 previous similar messages Apr 28 04:41:29 fir-md1-s1 kernel: Pid: 105270, comm: mdt02_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:41:29 fir-md1-s1 kernel: Call Trace: Apr 28 04:41:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:41:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:41:29 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:41:29 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:41:29 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:41:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:41:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:41:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:41:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:41:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:41:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:41:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451690.105270 Apr 28 04:41:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to fc06a8e8-d130-d628-466b-ac815f3efcc0 (at 10.8.26.16@o2ib6) Apr 28 04:41:43 fir-md1-s1 kernel: Lustre: Skipped 7000 previous similar messages Apr 28 04:42:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451741.104390 Apr 28 04:42:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451752.105302 Apr 28 04:42:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451756.104933 Apr 28 04:42:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client fe368c15-0041-26b7-6d7c-54456281630d (at 10.8.17.9@o2ib6) reconnecting Apr 28 04:42:47 fir-md1-s1 kernel: Lustre: Skipped 6602 previous similar messages Apr 28 04:42:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451771.104977 Apr 28 04:43:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451800.104391 Apr 28 04:43:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451803.114826 Apr 28 04:43:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451833.114885 Apr 28 04:43:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451834.104973 Apr 28 04:44:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556451864.105244 Apr 28 04:46:54 fir-md1-s1 kernel: LNet: Service thread pid 104725 was inactive for 200.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 04:46:54 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 04:46:54 fir-md1-s1 kernel: Pid: 104725, comm: mdt00_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:46:54 fir-md1-s1 kernel: Call Trace: Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:46:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:46:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:46:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452014.104725 Apr 28 04:46:54 fir-md1-s1 kernel: Pid: 114993, comm: mdt02_104 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:46:54 fir-md1-s1 kernel: Call Trace: Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:46:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:46:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:46:54 fir-md1-s1 kernel: Pid: 105415, comm: mdt01_065 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:46:54 fir-md1-s1 kernel: Call Trace: Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:46:54 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:46:54 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:46:54 fir-md1-s1 kernel: Pid: 105301, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:46:54 fir-md1-s1 kernel: Call Trace: Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:46:54 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:46:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:46:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:46:55 fir-md1-s1 kernel: Pid: 104910, comm: mdt01_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:46:55 fir-md1-s1 kernel: Call Trace: Apr 28 04:46:55 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:46:55 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:46:55 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 04:46:55 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 04:46:55 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 04:46:55 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:46:55 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:46:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:46:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:46:55 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:46:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452019.114814 Apr 28 04:47:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452020.105293 Apr 28 04:47:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452044.105423 Apr 28 04:47:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452045.105247 Apr 28 04:49:10 fir-md1-s1 kernel: LNet: Service thread pid 105251 completed after 2416.48s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 04:49:10 fir-md1-s1 kernel: LNet: Skipped 70 previous similar messages Apr 28 04:49:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.34@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b7249e17740/0x378007fb36311e89 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 620 type: IBT flags: 0x60200400000020 nid: 10.9.101.34@o2ib4 remote: 0x68c246fce08d6203 expref: 3356 pid: 105235 timeout: 452013 lvb_type: 0 Apr 28 04:49:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Apr 28 04:49:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452194.104994 Apr 28 04:50:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452200.104356 Apr 28 04:50:40 fir-md1-s1 kernel: LustreError: 105235:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556452149, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5defa318c0/0x378007fb5fbc319b lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 620 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105235 timeout: 0 lvb_type: 0 Apr 28 04:50:40 fir-md1-s1 kernel: LustreError: 105235:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 78 previous similar messages Apr 28 04:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ef890f65-2634-b761-1191-08785e321b11 (at 10.8.7.11@o2ib6) Apr 28 04:51:44 fir-md1-s1 kernel: Lustre: Skipped 2498 previous similar messages Apr 28 04:52:30 fir-md1-s1 kernel: Pid: 105112, comm: mdt01_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:52:30 fir-md1-s1 kernel: Call Trace: Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:52:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:52:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:52:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452350.105112 Apr 28 04:52:30 fir-md1-s1 kernel: Pid: 105235, comm: mdt02_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:52:30 fir-md1-s1 kernel: Call Trace: Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:52:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:52:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:52:30 fir-md1-s1 kernel: Pid: 114912, comm: mdt01_110 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:52:30 fir-md1-s1 kernel: Call Trace: Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:52:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:52:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:52:30 fir-md1-s1 kernel: Pid: 114850, comm: mdt00_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:52:30 fir-md1-s1 kernel: Call Trace: Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:52:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:52:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:52:30 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:52:31 fir-md1-s1 kernel: Pid: 104911, comm: mdt01_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:52:31 fir-md1-s1 kernel: Call Trace: Apr 28 04:52:31 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:52:31 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:52:31 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:52:31 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:52:31 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:52:31 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:52:31 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:52:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:52:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:52:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:52:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452351.104911 Apr 28 04:52:31 fir-md1-s1 kernel: LNet: Service thread pid 114877 was inactive for 200.88s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 04:52:31 fir-md1-s1 kernel: LNet: Skipped 67 previous similar messages Apr 28 04:52:34 fir-md1-s1 kernel: Lustre: 114903:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b464f634e00 x1631535401822544/t0(0) o101->5fdf1bc5-187c-a555-f9d1-818d91c0bfa4@10.9.105.9@o2ib4:9/0 lens 576/3264 e 0 to 0 dl 1556452359 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 04:52:34 fir-md1-s1 kernel: Lustre: 114903:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 44 previous similar messages Apr 28 04:52:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b2d6ba71-31e6-985c-c04f-54e302ddc48e (at 10.9.102.3@o2ib4) reconnecting Apr 28 04:52:47 fir-md1-s1 kernel: Lustre: Skipped 2507 previous similar messages Apr 28 04:53:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452382.105310 Apr 28 04:55:18 fir-md1-s1 kernel: LustreError: 104908:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72e7b5d800 ns: mdt-fir-MDT0002_UUID lock: ffff8b50e629cec0/0x378007fb36327607 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 590 type: IBT flags: 0x50200400000020 nid: 10.9.107.70@o2ib4 remote: 0xafe07b3ba251430e expref: 4 pid: 104908 timeout: 0 lvb_type: 0 Apr 28 04:55:18 fir-md1-s1 kernel: LustreError: 104908:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 28 04:55:18 fir-md1-s1 kernel: Lustre: 104908:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (2603:182s); client may timeout. req@ffff8b3e1eb1dd00 x1631573362313264/t0(0) o101->fff48c29-b405-5a73-271e-23103edf7e4a@10.9.107.70@o2ib4:23/0 lens 584/1792 e 0 to 0 dl 1556452336 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 04:55:18 fir-md1-s1 kernel: Lustre: 104908:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5754 previous similar messages Apr 28 04:55:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452529.114846 Apr 28 04:55:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452532.114921 Apr 28 04:55:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452542.104918 Apr 28 04:58:01 fir-md1-s1 kernel: LNet: Service thread pid 105087 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 04:58:01 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 04:58:01 fir-md1-s1 kernel: Pid: 105087, comm: mdt01_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:58:01 fir-md1-s1 kernel: Call Trace: Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:58:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:58:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:58:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452681.105087 Apr 28 04:58:01 fir-md1-s1 kernel: Pid: 114842, comm: mdt03_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:58:01 fir-md1-s1 kernel: Call Trace: Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:58:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:58:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:58:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:58:29 fir-md1-s1 kernel: Pid: 104997, comm: mdt02_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:58:29 fir-md1-s1 kernel: Call Trace: Apr 28 04:58:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:58:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:58:29 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 04:58:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:58:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:58:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:58:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:58:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:58:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452709.104997 Apr 28 04:58:37 fir-md1-s1 kernel: Pid: 114991, comm: mdt02_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:58:37 fir-md1-s1 kernel: Call Trace: Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:58:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:58:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:58:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452718.114991 Apr 28 04:58:38 fir-md1-s1 kernel: Pid: 105419, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 04:58:38 fir-md1-s1 kernel: Call Trace: Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 04:58:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 04:58:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 04:58:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 04:59:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452749.114865 Apr 28 04:59:23 fir-md1-s1 kernel: LNet: Service thread pid 115004 completed after 2876.89s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 04:59:23 fir-md1-s1 kernel: LNet: Skipped 110 previous similar messages Apr 28 05:00:53 fir-md1-s1 kernel: LustreError: 114987:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556452763, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b7034b94140/0x378007fb70bbfc23 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 555 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114987 timeout: 0 lvb_type: 0 Apr 28 05:00:53 fir-md1-s1 kernel: LustreError: 114987:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 85 previous similar messages Apr 28 05:01:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452897.114858 Apr 28 05:01:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452904.105094 Apr 28 05:01:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 896b03a8-6292-1c6c-39b9-e7b9e23f6a07 (at 10.8.29.6@o2ib6) Apr 28 05:01:44 fir-md1-s1 kernel: Lustre: Skipped 3415 previous similar messages Apr 28 05:01:53 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.2.34@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b42fd1d1680/0x378007fb37df265c lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 555 type: IBT flags: 0x60200400000020 nid: 10.8.2.34@o2ib6 remote: 0x7616bd6f302f0e20 expref: 3798 pid: 115004 timeout: 452747 lvb_type: 0 Apr 28 05:01:53 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Apr 28 05:01:59 fir-md1-s1 kernel: LustreError: 114804:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.115.2@o2ib4: deadline 30:1s ago req@ffff8b3d2f674b00 x1631815097371424/t0(0) o101->74fb56c5-8bc6-38a9-8624-788945b7232f@10.9.115.2@o2ib4:27/0 lens 1768/0 e 0 to 0 dl 1556452917 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 05:01:59 fir-md1-s1 kernel: LustreError: 114804:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 05:02:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452934.105237 Apr 28 05:02:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452935.105127 Apr 28 05:02:34 fir-md1-s1 kernel: Lustre: 105058:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4be070c500 x1631782089221856/t0(0) o101->d2ab40ab-8888-3abb-75f9-9c32b2196967@10.8.26.26@o2ib6:9/0 lens 1768/0 e 0 to 0 dl 1556452959 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 05:02:34 fir-md1-s1 kernel: Lustre: 105058:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2625 previous similar messages Apr 28 05:02:44 fir-md1-s1 kernel: LNet: Service thread pid 114849 was inactive for 200.41s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 05:02:44 fir-md1-s1 kernel: LNet: Skipped 74 previous similar messages Apr 28 05:02:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556452964.114849 Apr 28 05:02:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2faf88a4-e542-43bc-139f-15cb7cd9b030 (at 10.9.108.36@o2ib4) reconnecting Apr 28 05:02:47 fir-md1-s1 kernel: Lustre: Skipped 3737 previous similar messages Apr 28 05:05:54 fir-md1-s1 kernel: LustreError: 104328:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72dbf7e000 ns: mdt-fir-MDT0002_UUID lock: ffff8b33585f4a40/0x378007fb70d9dc05 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 463 type: IBT flags: 0x50200400000020 nid: 10.8.20.20@o2ib6 remote: 0xc5e25a5da5e96af2 expref: 9 pid: 104328 timeout: 0 lvb_type: 0 Apr 28 05:05:54 fir-md1-s1 kernel: LustreError: 104328:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 56 previous similar messages Apr 28 05:05:54 fir-md1-s1 kernel: Lustre: 114845:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:1s); client may timeout. req@ffff8b3c43a91b00 x1631545532044144/t0(0) o101->59d5c5f3-7800-62b5-895a-6920fabd87eb@10.9.102.25@o2ib4:19/0 lens 584/1792 e 0 to 0 dl 1556453153 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 05:05:54 fir-md1-s1 kernel: Lustre: 114845:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2354 previous similar messages Apr 28 05:06:40 fir-md1-s1 kernel: Pid: 114968, comm: mdt02_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:06:40 fir-md1-s1 kernel: Call Trace: Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:06:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:06:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:06:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453200.114968 Apr 28 05:06:40 fir-md1-s1 kernel: Pid: 105257, comm: mdt00_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:06:40 fir-md1-s1 kernel: Call Trace: Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:06:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:06:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:06:40 fir-md1-s1 kernel: Pid: 105419, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:06:40 fir-md1-s1 kernel: Call Trace: Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:06:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:06:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:06:40 fir-md1-s1 kernel: Pid: 105252, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:06:40 fir-md1-s1 kernel: Call Trace: Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:06:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:06:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:06:40 fir-md1-s1 kernel: Pid: 114804, comm: mdt01_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:06:40 fir-md1-s1 kernel: Call Trace: Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 05:06:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:06:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:06:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:06:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:06:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453201.105407 Apr 28 05:06:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453212.114941 Apr 28 05:06:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453218.114915 Apr 28 05:07:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453254.105121 Apr 28 05:09:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453354.114931 Apr 28 05:09:26 fir-md1-s1 kernel: LNet: Service thread pid 105252 completed after 366.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 05:09:26 fir-md1-s1 kernel: LNet: Skipped 275 previous similar messages Apr 28 05:09:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453384.114972 Apr 28 05:09:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453386.105441 Apr 28 05:10:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453410.115011 Apr 28 05:10:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453417.114793 Apr 28 05:10:56 fir-md1-s1 kernel: LustreError: 114859:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556453366, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b606da23840/0x378007fb7bcdf613 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 474 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114859 timeout: 0 lvb_type: 0 Apr 28 05:10:56 fir-md1-s1 kernel: LustreError: 114859:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 227 previous similar messages Apr 28 05:11:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0a337f14-c566-4a41-b96a-b82bdd96439d (at 10.8.7.16@o2ib6) Apr 28 05:11:44 fir-md1-s1 kernel: Lustre: Skipped 2627 previous similar messages Apr 28 05:12:40 fir-md1-s1 kernel: Lustre: 105122:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b48836f8000 x1631834697988464/t0(0) o101->f33d576f-86a5-f295-c683-255421011a31@10.8.11.8@o2ib6:15/0 lens 584/3264 e 0 to 0 dl 1556453565 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 05:12:40 fir-md1-s1 kernel: Lustre: 105122:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 857 previous similar messages Apr 28 05:12:46 fir-md1-s1 kernel: LNet: Service thread pid 114859 was inactive for 200.54s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 05:12:46 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 05:12:46 fir-md1-s1 kernel: Pid: 114859, comm: mdt02_058 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:12:46 fir-md1-s1 kernel: Call Trace: Apr 28 05:12:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:12:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:12:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:12:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:12:46 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 05:12:46 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 05:12:46 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 05:12:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:12:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:12:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:12:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:12:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:12:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:12:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:12:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:12:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:12:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:12:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453567.114859 Apr 28 05:12:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5db7ce18-3e24-dca8-3c1c-cbb3c3f8c6de (at 10.8.1.14@o2ib6) reconnecting Apr 28 05:12:48 fir-md1-s1 kernel: Lustre: Skipped 2208 previous similar messages Apr 28 05:12:52 fir-md1-s1 kernel: Pid: 105266, comm: mdt02_036 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:12:52 fir-md1-s1 kernel: Call Trace: Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:12:52 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:12:52 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:12:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453572.105266 Apr 28 05:12:52 fir-md1-s1 kernel: Pid: 114820, comm: mdt00_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:12:52 fir-md1-s1 kernel: Call Trace: Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:12:52 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:12:52 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:12:52 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:12:56 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.102.60@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b40ab20c800/0x378007fb70e1369c lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 466 type: IBT flags: 0x60200400000020 nid: 10.9.102.60@o2ib4 remote: 0xe9bd5eddffb99229 expref: 3481 pid: 104946 timeout: 453410 lvb_type: 0 Apr 28 05:12:56 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Apr 28 05:13:17 fir-md1-s1 kernel: Pid: 104971, comm: mdt01_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:13:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:13:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:13:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453597.104971 Apr 28 05:13:17 fir-md1-s1 kernel: Pid: 105295, comm: mdt01_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:13:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:13:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:13:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:13:17 fir-md1-s1 kernel: LNet: Service thread pid 115001 was inactive for 200.52s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 05:13:17 fir-md1-s1 kernel: LNet: Skipped 212 previous similar messages Apr 28 05:13:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453628.105234 Apr 28 05:15:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453735.114911 Apr 28 05:16:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453777.114878 Apr 28 05:18:36 fir-md1-s1 kernel: Pid: 105252, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:18:36 fir-md1-s1 kernel: Call Trace: Apr 28 05:18:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:18:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:18:36 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:18:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:18:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:18:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:18:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:18:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:18:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:18:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:18:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453916.105252 Apr 28 05:18:37 fir-md1-s1 kernel: Pid: 114863, comm: mdt02_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:18:37 fir-md1-s1 kernel: Call Trace: Apr 28 05:18:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:18:37 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:18:37 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:18:37 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:18:37 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:18:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:18:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:18:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:18:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:18:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:18:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453917.114863 Apr 28 05:18:38 fir-md1-s1 kernel: Pid: 105244, comm: mdt00_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:18:38 fir-md1-s1 kernel: Call Trace: Apr 28 05:18:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:18:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:18:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:18:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:18:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:18:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:18:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:18:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:18:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:18:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:18:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453918.105244 Apr 28 05:18:46 fir-md1-s1 kernel: Pid: 105257, comm: mdt00_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:18:46 fir-md1-s1 kernel: Call Trace: Apr 28 05:18:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:18:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:18:46 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 05:18:46 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 05:18:46 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 05:18:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:18:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:18:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:18:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:18:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:18:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453926.105257 Apr 28 05:18:52 fir-md1-s1 kernel: Pid: 104772, comm: mdt03_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:18:52 fir-md1-s1 kernel: Call Trace: Apr 28 05:18:52 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:18:52 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:18:52 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:18:52 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:18:52 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:18:52 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:18:52 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:18:52 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:18:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453932.104772 Apr 28 05:18:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453938.105052 Apr 28 05:19:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453942.105303 Apr 28 05:19:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453944.105422 Apr 28 05:19:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453950.114952 Apr 28 05:19:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453957.114939 Apr 28 05:19:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453958.105084 Apr 28 05:19:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453978.114823 Apr 28 05:19:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556453981.114795 Apr 28 05:20:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454006.114918 Apr 28 05:20:56 fir-md1-s1 kernel: LNet: Service thread pid 105397 completed after 1055.95s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 05:20:56 fir-md1-s1 kernel: LNet: Skipped 29 previous similar messages Apr 28 05:21:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454063.105109 Apr 28 05:21:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454078.114802 Apr 28 05:21:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454088.104728 Apr 28 05:21:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454104.104907 Apr 28 05:21:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 92c3afe3-9630-8eac-9d50-8bb4a15ca47e (at 10.8.26.26@o2ib6) Apr 28 05:21:47 fir-md1-s1 kernel: Lustre: Skipped 1919 previous similar messages Apr 28 05:21:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454110.105075 Apr 28 05:22:28 fir-md1-s1 kernel: LustreError: 105127:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556454058, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5db77b4ec0/0x378007fb915588de lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 483 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105127 timeout: 0 lvb_type: 0 Apr 28 05:22:28 fir-md1-s1 kernel: LustreError: 105127:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 55 previous similar messages Apr 28 05:22:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454160.104355 Apr 28 05:22:47 fir-md1-s1 kernel: Lustre: 104950:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3359f79e00 x1631575321758448/t0(0) o101->7b8c2334-5441-fafb-761f-7bfdc2fe1e61@10.8.18.30@o2ib6:22/0 lens 584/3264 e 0 to 0 dl 1556454172 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 05:22:47 fir-md1-s1 kernel: Lustre: 104950:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 57 previous similar messages Apr 28 05:22:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8bdeb34b-1463-e572-e0ea-aa14c9b9e68b (at 10.8.27.27@o2ib6) reconnecting Apr 28 05:22:49 fir-md1-s1 kernel: Lustre: Skipped 1924 previous similar messages Apr 28 05:24:19 fir-md1-s1 kernel: LNet: Service thread pid 114866 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 05:24:19 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 05:24:19 fir-md1-s1 kernel: Pid: 114866, comm: mdt01_100 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:24:19 fir-md1-s1 kernel: Call Trace: Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:24:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:24:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:24:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454259.114866 Apr 28 05:24:19 fir-md1-s1 kernel: Pid: 114909, comm: mdt01_107 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:24:19 fir-md1-s1 kernel: Call Trace: Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:24:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:24:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:24:19 fir-md1-s1 kernel: Pid: 105301, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:24:19 fir-md1-s1 kernel: Call Trace: Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:24:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:24:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:24:19 fir-md1-s1 kernel: Pid: 114804, comm: mdt01_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:24:19 fir-md1-s1 kernel: Call Trace: Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:24:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:24:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:24:19 fir-md1-s1 kernel: Pid: 114938, comm: mdt00_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:24:19 fir-md1-s1 kernel: Call Trace: Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:24:19 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:24:19 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:24:19 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:24:19 fir-md1-s1 kernel: LNet: Service thread pid 104335 was inactive for 200.84s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 05:24:19 fir-md1-s1 kernel: LNet: Skipped 38 previous similar messages Apr 28 05:24:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454264.104338 Apr 28 05:24:30 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.8.13@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4d5ab7ee40/0x378007fb70e4a1c6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 486 type: IBT flags: 0x60200400000020 nid: 10.8.8.13@o2ib6 remote: 0x9d3cf04f808fb1d5 expref: 3639 pid: 104911 timeout: 454104 lvb_type: 0 Apr 28 05:24:30 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Apr 28 05:24:30 fir-md1-s1 kernel: LustreError: 114797:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4b992cc400 ns: mdt-fir-MDT0002_UUID lock: ffff8b5e65729200/0x378007fb70e4a538 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 480 type: IBT flags: 0x50200400000020 nid: 10.8.8.13@o2ib6 remote: 0x9d3cf04f808fb1e3 expref: 2405 pid: 114797 timeout: 0 lvb_type: 0 Apr 28 05:24:30 fir-md1-s1 kernel: LustreError: 114797:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 28 05:24:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454288.114929 Apr 28 05:24:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454290.114946 Apr 28 05:25:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454307.114906 Apr 28 05:25:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454312.105293 Apr 28 05:25:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454319.114953 Apr 28 05:25:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454321.105104 Apr 28 05:25:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454327.114849 Apr 28 05:25:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454343.114850 Apr 28 05:25:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454352.105237 Apr 28 05:26:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454365.114809 Apr 28 05:27:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454433.104946 Apr 28 05:27:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454471.104973 Apr 28 05:28:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454492.114926 Apr 28 05:28:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454501.104692 Apr 28 05:30:51 fir-md1-s1 kernel: Pid: 114947, comm: mdt02_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:30:51 fir-md1-s1 kernel: Call Trace: Apr 28 05:30:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:30:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:30:51 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:30:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:30:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:30:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:30:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:30:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:30:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454651.114947 Apr 28 05:31:07 fir-md1-s1 kernel: LNet: Service thread pid 114922 completed after 1666.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 05:31:07 fir-md1-s1 kernel: LNet: Skipped 48 previous similar messages Apr 28 05:31:11 fir-md1-s1 kernel: Lustre: 104332:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (1276:395s); client may timeout. req@ffff8b46476f2100 x1631542901352672/t0(0) o101->f92870b0-a964-d954-570c-1314ccd43119@10.8.8.13@o2ib6:26/0 lens 576/1792 e 0 to 0 dl 1556454276 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 05:31:11 fir-md1-s1 kernel: Lustre: 104332:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 05:31:28 fir-md1-s1 kernel: Pid: 105306, comm: mdt01_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:31:28 fir-md1-s1 kernel: Call Trace: Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:31:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:31:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:31:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454688.105306 Apr 28 05:31:28 fir-md1-s1 kernel: Pid: 105035, comm: mdt02_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:31:28 fir-md1-s1 kernel: Call Trace: Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:31:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:31:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:31:28 fir-md1-s1 kernel: Pid: 105407, comm: mdt00_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:31:28 fir-md1-s1 kernel: Call Trace: Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:31:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:31:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:31:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:31:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c85fc87d-cd6c-b94a-fffd-7f66ed06a5f5 (at 10.8.30.11@o2ib6) Apr 28 05:31:48 fir-md1-s1 kernel: Lustre: Skipped 2149 previous similar messages Apr 28 05:32:41 fir-md1-s1 kernel: LustreError: 105119:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556454671, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b615cfaf980/0x378007fba4f1cc20 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 455 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105119 timeout: 0 lvb_type: 0 Apr 28 05:32:41 fir-md1-s1 kernel: LustreError: 105014:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556454671, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b498fb9d340/0x378007fba4f1cde0 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 454 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105014 timeout: 0 lvb_type: 0 Apr 28 05:32:41 fir-md1-s1 kernel: LustreError: 105014:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 47 previous similar messages Apr 28 05:32:41 fir-md1-s1 kernel: LustreError: 105119:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 28 05:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f8938193-b6f4-691f-a9ed-5d03b37d98de (at 10.8.30.11@o2ib6) reconnecting Apr 28 05:32:50 fir-md1-s1 kernel: Lustre: Skipped 2129 previous similar messages Apr 28 05:32:50 fir-md1-s1 kernel: Lustre: 114948:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-18), not sending early reply req@ffff8b5561649500 x1631738114974240/t0(0) o101->25512127-e6de-b60b-cf78-f84b6ec57480@10.8.21.14@o2ib6:25/0 lens 568/0 e 0 to 0 dl 1556454775 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 05:32:50 fir-md1-s1 kernel: Lustre: 114948:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 115 previous similar messages Apr 28 05:33:57 fir-md1-s1 kernel: Pid: 104967, comm: mdt01_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:33:57 fir-md1-s1 kernel: Call Trace: Apr 28 05:33:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:33:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:33:57 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 05:33:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:33:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:33:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:33:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:33:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:33:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454837.104967 Apr 28 05:34:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454843.104391 Apr 28 05:34:31 fir-md1-s1 kernel: LNet: Service thread pid 104933 was inactive for 200.15s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 05:34:31 fir-md1-s1 kernel: LNet: Skipped 44 previous similar messages Apr 28 05:34:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454871.104933 Apr 28 05:35:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454902.105068 Apr 28 05:35:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454916.104332 Apr 28 05:35:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454931.105284 Apr 28 05:35:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454932.114830 Apr 28 05:36:15 fir-md1-s1 kernel: LNet: Service thread pid 105308 was inactive for 200.47s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 05:36:15 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 05:36:15 fir-md1-s1 kernel: Pid: 105308, comm: mdt01_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:36:15 fir-md1-s1 kernel: Call Trace: Apr 28 05:36:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:36:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:36:15 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:36:15 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:36:15 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:36:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:36:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:36:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:36:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:36:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:36:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556454975.105308 Apr 28 05:41:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ac80776b-5bd4-cddf-776c-c2f3659b6c51 (at 10.9.101.70@o2ib4) Apr 28 05:41:52 fir-md1-s1 kernel: Lustre: Skipped 2098 previous similar messages Apr 28 05:42:38 fir-md1-s1 kernel: Pid: 114870, comm: mdt03_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:42:38 fir-md1-s1 kernel: Call Trace: Apr 28 05:42:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:42:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:42:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:42:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:42:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:42:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:42:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:42:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:42:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:42:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:42:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556455359.114870 Apr 28 05:42:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 15e027c4-b3a6-300e-0908-3492ed7f423b (at 10.9.101.70@o2ib4) reconnecting Apr 28 05:42:54 fir-md1-s1 kernel: Lustre: Skipped 2120 previous similar messages Apr 28 05:47:48 fir-md1-s1 kernel: Lustre: 105002:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b4c9e3d6300 x1631546225773024/t0(0) o101->f653589b-eefb-abf7-a1b5-4c7dd788fc78@10.8.7.16@o2ib6:23/0 lens 584/3264 e 1 to 0 dl 1556455673 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 05:47:48 fir-md1-s1 kernel: Lustre: 105002:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 28 05:49:03 fir-md1-s1 kernel: LustreError: 104932:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556455653, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5313f0a880/0x378007fbc5c03e62 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 454 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104932 timeout: 0 lvb_type: 0 Apr 28 05:49:03 fir-md1-s1 kernel: LustreError: 104932:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 84 previous similar messages Apr 28 05:50:53 fir-md1-s1 kernel: LNet: Service thread pid 104932 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 05:50:53 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 28 05:50:53 fir-md1-s1 kernel: Pid: 104932, comm: mdt01_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 05:50:53 fir-md1-s1 kernel: Call Trace: Apr 28 05:50:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 05:50:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 05:50:53 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 05:50:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 05:50:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 05:50:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 05:50:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 05:50:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 05:50:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 05:50:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 05:50:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556455853.104932 Apr 28 05:51:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 24ffd2cc-a9cf-9df8-4147-9549d755d6d1 (at 10.9.108.34@o2ib4) Apr 28 05:51:52 fir-md1-s1 kernel: Lustre: Skipped 2189 previous similar messages Apr 28 05:52:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 081ba7d9-3e8f-6768-7b15-6d13e53f4563 (at 10.9.108.34@o2ib4) reconnecting Apr 28 05:52:54 fir-md1-s1 kernel: Lustre: Skipped 2190 previous similar messages Apr 28 06:00:54 fir-md1-s1 kernel: LustreError: 114793:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b71367cbc00 ns: mdt-fir-MDT0002_UUID lock: ffff8b62ad957500/0x378007fb774d3ca2 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 452 type: IBT flags: 0x50200400000020 nid: 10.9.101.65@o2ib4 remote: 0x10868ac5f029d91f expref: 4 pid: 114793 timeout: 0 lvb_type: 0 Apr 28 06:00:54 fir-md1-s1 kernel: LustreError: 114793:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 6 previous similar messages Apr 28 06:00:54 fir-md1-s1 kernel: Lustre: 114793:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:3053s); client may timeout. req@ffff8b7014a0a100 x1631546295391712/t0(0) o101->2faef2d8-dc67-f384-07b6-111f344194c1@10.9.101.65@o2ib4:26/0 lens 480/536 e 0 to 0 dl 1556453401 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 06:00:54 fir-md1-s1 kernel: LNet: Service thread pid 114971 completed after 3237.50s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:00:54 fir-md1-s1 kernel: LNet: Skipped 113 previous similar messages Apr 28 06:00:54 fir-md1-s1 kernel: Lustre: 114793:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Apr 28 06:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8399d63a-5bf1-2891-e648-7bdf89b4d1ea (at 10.8.21.14@o2ib6) Apr 28 06:01:52 fir-md1-s1 kernel: Lustre: Skipped 2171 previous similar messages Apr 28 06:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25512127-e6de-b60b-cf78-f84b6ec57480 (at 10.8.21.14@o2ib6) reconnecting Apr 28 06:02:54 fir-md1-s1 kernel: Lustre: Skipped 2171 previous similar messages Apr 28 06:03:54 fir-md1-s1 kernel: LustreError: 114820:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b71367cbc00 ns: mdt-fir-MDT0002_UUID lock: ffff8b5c023ed100/0x378007fb7bf5259f lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 444 type: IBT flags: 0x50200400000020 nid: 10.9.101.65@o2ib4 remote: 0x10868ac5f029db95 expref: 2 pid: 114820 timeout: 0 lvb_type: 0 Apr 28 06:03:54 fir-md1-s1 kernel: LNet: Service thread pid 114859 completed after 3268.34s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:03:54 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 28 06:03:54 fir-md1-s1 kernel: LustreError: 114820:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 06:03:54 fir-md1-s1 kernel: Lustre: 114820:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3233s); client may timeout. req@ffff8b37c0834b00 x1631546295418784/t0(0) o101->2faef2d8-dc67-f384-07b6-111f344194c1@10.9.101.65@o2ib4:1/0 lens 568/2296 e 0 to 0 dl 1556453401 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 06:04:19 fir-md1-s1 kernel: Lustre: 114815:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b469c343300 x1631681048544112/t0(0) o101->7da2364c-273e-9791-279a-dee1848c518b@10.8.25.6@o2ib6:24/0 lens 576/3264 e 0 to 0 dl 1556456664 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 06:05:24 fir-md1-s1 kernel: LustreError: 105093:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556456634, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b72d8a73180/0x378007fbe62dc961 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 437 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105093 timeout: 0 lvb_type: 0 Apr 28 06:05:24 fir-md1-s1 kernel: LustreError: 105093:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Apr 28 06:07:15 fir-md1-s1 kernel: LNet: Service thread pid 105233 was inactive for 200.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 06:07:15 fir-md1-s1 kernel: Pid: 105233, comm: mdt02_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:07:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:07:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:07:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:07:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556456835.105233 Apr 28 06:07:15 fir-md1-s1 kernel: Pid: 105295, comm: mdt01_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:07:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:07:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:07:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:07:15 fir-md1-s1 kernel: Pid: 105282, comm: mdt01_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:07:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:07:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:07:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:07:15 fir-md1-s1 kernel: Pid: 105109, comm: mdt01_040 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:07:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:07:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:07:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:07:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:07:15 fir-md1-s1 kernel: Pid: 105052, comm: mdt01_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:07:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:07:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:07:16 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:07:16 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:07:16 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:07:16 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:07:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:07:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:07:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:07:16 fir-md1-s1 kernel: LNet: Service thread pid 114918 was inactive for 201.27s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 06:07:16 fir-md1-s1 kernel: LNet: Skipped 99 previous similar messages Apr 28 06:08:15 fir-md1-s1 kernel: LustreError: 114955:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72f82efc00 ns: mdt-fir-MDT0002_UUID lock: ffff8b5d63378240/0x378007fb98bd1eb3 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 433 type: IBT flags: 0x50200400000020 nid: 10.9.102.12@o2ib4 remote: 0x49ddef5e615e2a0f expref: 2 pid: 114955 timeout: 0 lvb_type: 0 Apr 28 06:08:15 fir-md1-s1 kernel: LNet: Service thread pid 114833 completed after 2774.79s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:08:15 fir-md1-s1 kernel: LNet: Skipped 72 previous similar messages Apr 28 06:08:15 fir-md1-s1 kernel: LustreError: 114955:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 13 previous similar messages Apr 28 06:08:15 fir-md1-s1 kernel: Lustre: 114947:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (42:2403s); client may timeout. req@ffff8b5b1163e300 x1631559232372848/t0(0) o101->d1612639-ba09-5523-fd87-6391497129b4@10.8.18.19@o2ib6:12/0 lens 568/2296 e 0 to 0 dl 1556454492 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 06:08:15 fir-md1-s1 kernel: Lustre: 114947:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 19 previous similar messages Apr 28 06:08:45 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b41dbef9d40/0x378007fba5f9de1b lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 400 type: IBT flags: 0x60200400000020 nid: 10.9.108.35@o2ib4 remote: 0x15f9b4b2e38adcae expref: 3154 pid: 105406 timeout: 456759 lvb_type: 0 Apr 28 06:08:45 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Apr 28 06:10:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.3@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b50bcb3d580/0x378007fba5f9df02 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 401 type: IBT flags: 0x60200400000020 nid: 10.9.108.3@o2ib4 remote: 0x8f235a2668745e0d expref: 5711 pid: 105025 timeout: 456852 lvb_type: 0 Apr 28 06:10:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 06:10:18 fir-md1-s1 kernel: Lustre: 114857:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2287s); client may timeout. req@ffff8b423e1fd700 x1631562804374848/t0(0) o101->cade85fd-aa4d-482b-3569-860c3006b004@10.8.18.8@o2ib6:11/0 lens 568/2296 e 0 to 0 dl 1556454731 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 06:11:36 fir-md1-s1 kernel: LNet: Service thread pid 114948 was inactive for 200.75s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 06:11:36 fir-md1-s1 kernel: LNet: Skipped 47 previous similar messages Apr 28 06:11:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457096.114948 Apr 28 06:11:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a5582659-448e-35d9-4344-7de8958c550e (at 10.8.2.33@o2ib6) Apr 28 06:11:52 fir-md1-s1 kernel: Lustre: Skipped 2108 previous similar messages Apr 28 06:12:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457127.104356 Apr 28 06:12:37 fir-md1-s1 kernel: Pid: 104330, comm: mdt01_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:12:37 fir-md1-s1 kernel: Call Trace: Apr 28 06:12:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:12:37 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:12:37 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:12:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:12:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:12:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:12:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:12:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:12:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457157.104330 Apr 28 06:12:38 fir-md1-s1 kernel: Pid: 105406, comm: mdt00_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:12:38 fir-md1-s1 kernel: Call Trace: Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:12:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:12:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:12:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457158.105406 Apr 28 06:12:38 fir-md1-s1 kernel: Pid: 105376, comm: mdt03_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:12:38 fir-md1-s1 kernel: Call Trace: Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:12:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:12:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:12:38 fir-md1-s1 kernel: Pid: 114947, comm: mdt02_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:12:38 fir-md1-s1 kernel: Call Trace: Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:12:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:12:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:12:38 fir-md1-s1 kernel: Pid: 105289, comm: mdt02_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:12:38 fir-md1-s1 kernel: Call Trace: Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:12:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:12:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:12:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:12:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:12:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:12:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:12:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:12:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:12:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 15e027c4-b3a6-300e-0908-3492ed7f423b (at 10.9.101.70@o2ib4) reconnecting Apr 28 06:12:54 fir-md1-s1 kernel: Lustre: Skipped 2096 previous similar messages Apr 28 06:13:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457189.105088 Apr 28 06:13:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457218.114793 Apr 28 06:13:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457227.105025 Apr 28 06:13:48 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.113.15@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b4ba1be8b40/0x378007fba70cca60 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 400 type: IBT flags: 0x60200400000020 nid: 10.9.113.15@o2ib4 remote: 0xe738849d962e946a expref: 3122 pid: 105048 timeout: 457062 lvb_type: 0 Apr 28 06:13:48 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 06:13:48 fir-md1-s1 kernel: LNet: Service thread pid 104997 completed after 2495.84s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:13:48 fir-md1-s1 kernel: LNet: Skipped 83 previous similar messages Apr 28 06:14:08 fir-md1-s1 kernel: LNet: Service thread pid 105035 was inactive for 200.33s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 06:14:08 fir-md1-s1 kernel: LNet: Skipped 55 previous similar messages Apr 28 06:14:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457248.105035 Apr 28 06:14:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457250.114862 Apr 28 06:14:28 fir-md1-s1 kernel: Lustre: 105293:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-19), not sending early reply req@ffff8b52f233e300 x1631546249462080/t0(0) o101->49530de5-f172-5bb3-a0d3-bd0ce56d3339@10.8.7.17@o2ib6:3/0 lens 568/0 e 0 to 0 dl 1556457273 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 06:14:28 fir-md1-s1 kernel: Lustre: 105293:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 123 previous similar messages Apr 28 06:14:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457278.105423 Apr 28 06:14:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457280.114857 Apr 28 06:16:49 fir-md1-s1 kernel: LustreError: 104990:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72dbf7b000 ns: mdt-fir-MDT0002_UUID lock: ffff8b5d9d7cb600/0x378007fba70ccb32 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 397 type: IBT flags: 0x50200400000020 nid: 10.9.108.32@o2ib4 remote: 0xe64f46a1a60dc244 expref: 108 pid: 104990 timeout: 0 lvb_type: 0 Apr 28 06:16:49 fir-md1-s1 kernel: LustreError: 104990:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 7 previous similar messages Apr 28 06:17:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457428.114844 Apr 28 06:17:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457430.105306 Apr 28 06:17:51 fir-md1-s1 kernel: Lustre: 114917:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (836:1s); client may timeout. req@ffff8b41cc76c200 x1631750594573856/t0(0) o101->e20ef941-e8d4-d272-64ab-d749dcd906bb@10.8.20.12@o2ib6:24/0 lens 568/2296 e 0 to 0 dl 1556457470 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 06:17:51 fir-md1-s1 kernel: Lustre: 114917:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 06:19:21 fir-md1-s1 kernel: LustreError: 114929:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556457471, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b62a2e28b40/0x378007fc024399d2 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 367 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114929 timeout: 0 lvb_type: 0 Apr 28 06:19:21 fir-md1-s1 kernel: LustreError: 114929:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 95 previous similar messages Apr 28 06:20:22 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.20.20@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b5314112d00/0x378007fc0243990e lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 367 type: IBT flags: 0x60200400000020 nid: 10.8.20.20@o2ib6 remote: 0xc5e25a5da5e9f826 expref: 294 pid: 105269 timeout: 457456 lvb_type: 0 Apr 28 06:20:22 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 06:21:12 fir-md1-s1 kernel: LNet: Service thread pid 114809 was inactive for 200.18s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 06:21:12 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 06:21:12 fir-md1-s1 kernel: Pid: 114809, comm: mdt01_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:21:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:21:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:21:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:21:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457672.114809 Apr 28 06:21:12 fir-md1-s1 kernel: Pid: 114923, comm: mdt00_080 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:21:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:21:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:21:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:21:12 fir-md1-s1 kernel: Pid: 114798, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:21:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:21:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:21:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:21:12 fir-md1-s1 kernel: Pid: 114797, comm: mdt01_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:21:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:21:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:21:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:21:12 fir-md1-s1 kernel: Pid: 105038, comm: mdt00_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:21:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:21:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:21:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:21:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:21:12 fir-md1-s1 kernel: LNet: Service thread pid 114803 was inactive for 200.62s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 06:21:12 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 28 06:21:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457673.104331 Apr 28 06:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0c6b44b5-116d-14e5-5e8a-f0eb3e5c9404 (at 10.8.11.29@o2ib6) Apr 28 06:21:55 fir-md1-s1 kernel: Lustre: Skipped 1988 previous similar messages Apr 28 06:22:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 23cb0af4-8181-95ae-f3e1-ad9f3a79ec03 (at 10.8.11.29@o2ib6) reconnecting Apr 28 06:22:57 fir-md1-s1 kernel: Lustre: Skipped 1970 previous similar messages Apr 28 06:23:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457822.114929 Apr 28 06:23:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457828.105021 Apr 28 06:23:52 fir-md1-s1 kernel: LNet: Service thread pid 104724 completed after 360.88s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:23:52 fir-md1-s1 kernel: LNet: Skipped 181 previous similar messages Apr 28 06:24:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457853.105284 Apr 28 06:24:22 fir-md1-s1 kernel: Lustre: 105011:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (371:20s); client may timeout. req@ffff8b60e8ec4800 x1631750595485056/t0(0) o101->e20ef941-e8d4-d272-64ab-d749dcd906bb@10.8.20.12@o2ib6:21/0 lens 480/536 e 0 to 0 dl 1556457842 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 06:24:22 fir-md1-s1 kernel: Lustre: 105011:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 49 previous similar messages Apr 28 06:24:47 fir-md1-s1 kernel: Lustre: 104909:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b44bdaa5100 x1631680752361632/t0(0) o101->2bb73289-246f-eef4-271a-7f2b0f0e738c@10.8.1.19@o2ib6:22/0 lens 584/3264 e 0 to 0 dl 1556457892 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 06:24:47 fir-md1-s1 kernel: Lustre: 104909:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 187 previous similar messages Apr 28 06:26:12 fir-md1-s1 kernel: Pid: 114978, comm: mdt00_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:26:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:26:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:26:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:26:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457972.114978 Apr 28 06:26:12 fir-md1-s1 kernel: Pid: 104958, comm: mdt02_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:26:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:26:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:26:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:26:12 fir-md1-s1 kernel: Pid: 104939, comm: mdt00_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:26:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:26:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:26:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:26:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:26:13 fir-md1-s1 kernel: Pid: 114922, comm: mdt00_079 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:26:13 fir-md1-s1 kernel: Call Trace: Apr 28 06:26:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:26:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:26:13 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:26:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:26:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:26:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:26:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:26:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:26:17 fir-md1-s1 kernel: Pid: 104781, comm: mdt03_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:26:17 fir-md1-s1 kernel: Call Trace: Apr 28 06:26:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:26:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:26:17 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:26:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:26:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:26:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:26:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:26:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:26:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556457977.104781 Apr 28 06:26:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458002.105293 Apr 28 06:26:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458007.114914 Apr 28 06:26:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458008.104330 Apr 28 06:27:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458038.114890 Apr 28 06:27:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458063.114828 Apr 28 06:30:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458224.114882 Apr 28 06:31:02 fir-md1-s1 kernel: LustreError: 104937:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556458172, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b59ca2c33c0/0x378007fc1ac55a7a lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 374 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104937 timeout: 0 lvb_type: 0 Apr 28 06:31:02 fir-md1-s1 kernel: LustreError: 104937:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 169 previous similar messages Apr 28 06:31:52 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.7.16@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b5275a4b600/0x378007fc02444934 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 374 type: IBT flags: 0x60200400000020 nid: 10.8.7.16@o2ib6 remote: 0x87eb7568f5809535 expref: 10202 pid: 114847 timeout: 458146 lvb_type: 0 Apr 28 06:31:52 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Apr 28 06:32:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to dc676af6-3008-ab10-fc8d-3fa486e9c164 (at 10.8.17.9@o2ib6) Apr 28 06:32:07 fir-md1-s1 kernel: Lustre: Skipped 2051 previous similar messages Apr 28 06:32:53 fir-md1-s1 kernel: LNet: Service thread pid 114847 was inactive for 200.65s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 06:32:53 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 06:32:53 fir-md1-s1 kernel: Pid: 114847, comm: mdt01_093 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:32:53 fir-md1-s1 kernel: Call Trace: Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:32:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:32:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:32:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458373.114847 Apr 28 06:32:53 fir-md1-s1 kernel: Pid: 104937, comm: mdt02_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:32:53 fir-md1-s1 kernel: Call Trace: Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:32:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:32:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:32:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:33:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2bb73289-246f-eef4-271a-7f2b0f0e738c (at 10.8.1.19@o2ib6) reconnecting Apr 28 06:33:09 fir-md1-s1 kernel: Lustre: Skipped 2038 previous similar messages Apr 28 06:34:22 fir-md1-s1 kernel: Lustre: 105238:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (371:620s); client may timeout. req@ffff8b5f1463bf00 x1631750595485696/t219287421835(0) o36->e20ef941-e8d4-d272-64ab-d749dcd906bb@10.8.20.12@o2ib6:21/0 lens 488/424 e 0 to 0 dl 1556457842 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 06:34:22 fir-md1-s1 kernel: LNet: Service thread pid 114833 completed after 990.66s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:34:22 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Apr 28 06:34:22 fir-md1-s1 kernel: Pid: 105014, comm: mdt01_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:34:22 fir-md1-s1 kernel: Call Trace: Apr 28 06:34:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:34:22 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:34:22 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:34:22 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:34:22 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:34:22 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:34:22 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:34:22 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:34:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458462.105014 Apr 28 06:34:48 fir-md1-s1 kernel: Lustre: 104979:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-6), not sending early reply req@ffff8b3c515dc500 x1631545727646800/t0(0) o101->abea71ae-b956-1f71-0b98-5c238f1bb381@10.9.107.63@o2ib4:23/0 lens 584/3264 e 0 to 0 dl 1556458493 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 06:34:48 fir-md1-s1 kernel: Lustre: 104979:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Apr 28 06:35:12 fir-md1-s1 kernel: Pid: 105423, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:35:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:35:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:35:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:35:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:35:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458512.105423 Apr 28 06:35:12 fir-md1-s1 kernel: Pid: 114839, comm: mdt01_090 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:35:12 fir-md1-s1 kernel: Call Trace: Apr 28 06:35:12 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:35:12 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:35:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:35:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:35:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:35:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:35:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:35:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:35:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:35:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:35:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:35:13 fir-md1-s1 kernel: LNet: Service thread pid 114829 was inactive for 200.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 06:35:13 fir-md1-s1 kernel: LNet: Skipped 163 previous similar messages Apr 28 06:35:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458513.105010 Apr 28 06:36:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458582.114986 Apr 28 06:37:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458662.114983 Apr 28 06:37:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458668.105048 Apr 28 06:38:13 fir-md1-s1 kernel: Pid: 114939, comm: mdt00_085 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:38:13 fir-md1-s1 kernel: Call Trace: Apr 28 06:38:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:38:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:38:13 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 06:38:13 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 06:38:13 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 06:38:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:38:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:38:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:38:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:38:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:38:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458693.114939 Apr 28 06:38:15 fir-md1-s1 kernel: Pid: 105247, comm: mdt01_046 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:38:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:38:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:38:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:38:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458695.105247 Apr 28 06:38:15 fir-md1-s1 kernel: Pid: 105373, comm: mdt03_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:38:15 fir-md1-s1 kernel: Call Trace: Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:38:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:38:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:38:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:38:43 fir-md1-s1 kernel: Pid: 105134, comm: mdt00_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:38:43 fir-md1-s1 kernel: Call Trace: Apr 28 06:38:43 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:38:43 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:38:43 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:38:43 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:38:43 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:38:43 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:38:43 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:38:43 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:38:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458723.105134 Apr 28 06:38:44 fir-md1-s1 kernel: Pid: 104994, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:38:44 fir-md1-s1 kernel: Call Trace: Apr 28 06:38:44 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:38:44 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:38:44 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:38:44 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:38:44 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:38:44 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:38:44 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:38:44 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:38:44 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:38:44 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:38:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458724.104994 Apr 28 06:41:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556458879.114859 Apr 28 06:41:57 fir-md1-s1 kernel: LustreError: 114887:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556458827, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3d02700fc0/0x378007fc3224b386 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 370 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114887 timeout: 0 lvb_type: 0 Apr 28 06:41:57 fir-md1-s1 kernel: LustreError: 114887:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 19 previous similar messages Apr 28 06:42:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2cde83f6-2de4-32c9-c63f-55f47cbe66e9 (at 10.9.105.6@o2ib4) Apr 28 06:42:08 fir-md1-s1 kernel: Lustre: Skipped 1937 previous similar messages Apr 28 06:43:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ddef0525-fd05-baf0-eec8-55af7a82431b (at 10.8.24.4@o2ib6) reconnecting Apr 28 06:43:10 fir-md1-s1 kernel: Lustre: Skipped 1933 previous similar messages Apr 28 06:43:23 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.30.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b5df0bd0480/0x378007fc0245dc01 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 370 type: IBT flags: 0x60200400000020 nid: 10.8.30.11@o2ib6 remote: 0x2eb8a7f59e26ac8f expref: 212 pid: 114980 timeout: 458837 lvb_type: 0 Apr 28 06:43:23 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Apr 28 06:43:47 fir-md1-s1 kernel: LNet: Service thread pid 114887 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 06:43:47 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 06:43:47 fir-md1-s1 kernel: Pid: 114887, comm: mdt00_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:43:47 fir-md1-s1 kernel: Call Trace: Apr 28 06:43:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:43:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:43:47 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:43:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:43:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:43:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:43:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:43:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:43:48 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459028.114887 Apr 28 06:43:48 fir-md1-s1 kernel: Pid: 105027, comm: mdt02_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:43:48 fir-md1-s1 kernel: Call Trace: Apr 28 06:43:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:43:48 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:43:48 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:43:48 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:43:48 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:43:48 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:43:48 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:43:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:43:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:43:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:44:02 fir-md1-s1 kernel: Pid: 114833, comm: mdt00_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:44:02 fir-md1-s1 kernel: Call Trace: Apr 28 06:44:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:44:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:44:02 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:44:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:44:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:44:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:44:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:44:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:44:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:44:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:44:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459042.114833 Apr 28 06:44:21 fir-md1-s1 kernel: Pid: 105413, comm: mdt01_064 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:44:21 fir-md1-s1 kernel: Call Trace: Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:44:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:44:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:44:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459061.105413 Apr 28 06:44:21 fir-md1-s1 kernel: Pid: 114860, comm: mdt01_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:44:21 fir-md1-s1 kernel: Call Trace: Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:44:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:44:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:44:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:46:23 fir-md1-s1 kernel: LNet: Service thread pid 114795 completed after 1711.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:46:23 fir-md1-s1 kernel: LNet: Skipped 16 previous similar messages Apr 28 06:46:44 fir-md1-s1 kernel: LNet: Service thread pid 105404 was inactive for 200.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 06:46:44 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Apr 28 06:46:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459204.105404 Apr 28 06:46:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459210.104997 Apr 28 06:46:55 fir-md1-s1 kernel: Lustre: 104968:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b7309b9c500 x1631534749845952/t0(0) o101->081ba7d9-3e8f-6768-7b15-6d13e53f4563@10.9.108.34@o2ib4:0/0 lens 568/0 e 0 to 0 dl 1556459220 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 06:46:55 fir-md1-s1 kernel: Lustre: 104968:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Apr 28 06:47:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459233.104337 Apr 28 06:47:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459241.105266 Apr 28 06:49:50 fir-md1-s1 kernel: Pid: 114951, comm: mdt02_085 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:49:50 fir-md1-s1 kernel: Call Trace: Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:49:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:49:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:49:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459390.114951 Apr 28 06:49:50 fir-md1-s1 kernel: Pid: 105019, comm: mdt03_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:49:50 fir-md1-s1 kernel: Call Trace: Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:49:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:49:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:49:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:51:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 57561d34-eaf6-b386-87f8-353ac279f4f2 (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b4be0e8a000, cur 1556459479 expire 1556459329 last 1556459252 Apr 28 06:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.102.11@o2ib4) Apr 28 06:52:08 fir-md1-s1 kernel: Lustre: Skipped 2019 previous similar messages Apr 28 06:52:25 fir-md1-s1 kernel: Pid: 105286, comm: mdt01_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:52:25 fir-md1-s1 kernel: Call Trace: Apr 28 06:52:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:52:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:52:25 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:52:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:52:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:52:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:52:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:52:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:52:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459545.105286 Apr 28 06:52:26 fir-md1-s1 kernel: Pid: 114970, comm: mdt00_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:52:26 fir-md1-s1 kernel: Call Trace: Apr 28 06:52:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:52:26 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:52:26 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:52:26 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:52:26 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:52:26 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:52:26 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:52:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:52:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:52:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:52:26 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459546.114970 Apr 28 06:52:28 fir-md1-s1 kernel: LustreError: 115011:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556459457, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3f34afd7c0/0x378007fc47d6416c lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 374 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 115011 timeout: 0 lvb_type: 0 Apr 28 06:52:28 fir-md1-s1 kernel: LustreError: 115011:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 22 previous similar messages Apr 28 06:52:45 fir-md1-s1 kernel: Pid: 105113, comm: mdt00_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:52:45 fir-md1-s1 kernel: Call Trace: Apr 28 06:52:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:52:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:52:45 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:52:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:52:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:52:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:52:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:52:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:52:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459565.105113 Apr 28 06:53:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f7caad4b-1555-f2e5-0e86-1318e8bf79ed (at 10.9.102.11@o2ib4) reconnecting Apr 28 06:53:10 fir-md1-s1 kernel: Lustre: Skipped 2011 previous similar messages Apr 28 06:53:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459596.105104 Apr 28 06:53:23 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.24.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b37ec982880/0x378007fc02465252 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 374 type: IBT flags: 0x60200400000020 nid: 10.8.24.4@o2ib6 remote: 0x8a5ac3af4ea8c21c expref: 4222 pid: 114822 timeout: 459437 lvb_type: 0 Apr 28 06:53:23 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Apr 28 06:53:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459627.105124 Apr 28 06:54:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459658.105288 Apr 28 06:56:45 fir-md1-s1 kernel: LNet: Service thread pid 114941 was inactive for 200.71s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 06:56:45 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 06:56:45 fir-md1-s1 kernel: Pid: 114941, comm: mdt02_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:56:45 fir-md1-s1 kernel: Call Trace: Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:56:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:56:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:56:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459805.114941 Apr 28 06:56:45 fir-md1-s1 kernel: Pid: 105308, comm: mdt01_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:56:45 fir-md1-s1 kernel: Call Trace: Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:56:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:56:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:56:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:58:23 fir-md1-s1 kernel: LNet: Service thread pid 105109 completed after 2431.60s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 06:58:23 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 06:58:53 fir-md1-s1 kernel: Lustre: 105109:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b44e7a2e600 x1631558565100480/t0(0) o101->5102b83b-e407-f2c8-158f-7c896c03ad6a@10.9.108.66@o2ib4:28/0 lens 584/3264 e 0 to 0 dl 1556459938 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 06:58:53 fir-md1-s1 kernel: Lustre: 105109:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Apr 28 06:59:13 fir-md1-s1 kernel: Pid: 114876, comm: mdt03_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:59:13 fir-md1-s1 kernel: Call Trace: Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:59:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:59:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 06:59:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556459953.114876 Apr 28 06:59:13 fir-md1-s1 kernel: Pid: 114954, comm: mdt00_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 06:59:13 fir-md1-s1 kernel: Call Trace: Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 06:59:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 06:59:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 06:59:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:01:49 fir-md1-s1 kernel: Pid: 105251, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:01:49 fir-md1-s1 kernel: Call Trace: Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:01:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:01:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:01:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460109.105251 Apr 28 07:01:49 fir-md1-s1 kernel: Pid: 104389, comm: mdt01_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:01:49 fir-md1-s1 kernel: Call Trace: Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:01:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:01:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:01:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c49fd28a-b873-e174-b6dc-a92b5b8edf6b (at 10.9.108.8@o2ib4) Apr 28 07:02:10 fir-md1-s1 kernel: Lustre: Skipped 1947 previous similar messages Apr 28 07:02:58 fir-md1-s1 kernel: LustreError: 114792:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556460088, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b70bf760240/0x378007fc5cf3ba3f lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 374 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114792 timeout: 0 lvb_type: 0 Apr 28 07:02:58 fir-md1-s1 kernel: LustreError: 114792:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Apr 28 07:03:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 56a57cb9-3d9f-bc51-dd74-55bd81619cfc (at 10.9.108.8@o2ib4) reconnecting Apr 28 07:03:12 fir-md1-s1 kernel: Lustre: Skipped 1942 previous similar messages Apr 28 07:03:53 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.65@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b3decad5c40/0x378007fc024656b9 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 374 type: IBT flags: 0x60200400000020 nid: 10.9.101.65@o2ib4 remote: 0x10868ac5f029e734 expref: 13 pid: 114993 timeout: 460067 lvb_type: 0 Apr 28 07:03:53 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 07:04:17 fir-md1-s1 kernel: Pid: 114836, comm: mdt01_087 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:04:17 fir-md1-s1 kernel: Call Trace: Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:04:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:04:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:04:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460257.114836 Apr 28 07:04:17 fir-md1-s1 kernel: Pid: 105257, comm: mdt00_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:04:17 fir-md1-s1 kernel: Call Trace: Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:04:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:04:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:04:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:04:48 fir-md1-s1 kernel: Pid: 114792, comm: mdt03_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:04:48 fir-md1-s1 kernel: Call Trace: Apr 28 07:04:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:04:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:04:48 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:04:48 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:04:48 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:04:48 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:04:48 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:04:48 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:04:48 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:04:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:04:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:04:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:04:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:04:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:04:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:04:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460289.114792 Apr 28 07:04:49 fir-md1-s1 kernel: LNet: Service thread pid 104333 was inactive for 200.74s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:04:49 fir-md1-s1 kernel: LNet: Skipped 14 previous similar messages Apr 28 07:07:13 fir-md1-s1 kernel: LNet: Service thread pid 105305 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:07:13 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Apr 28 07:07:13 fir-md1-s1 kernel: Pid: 105305, comm: mdt02_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:07:13 fir-md1-s1 kernel: Call Trace: Apr 28 07:07:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:07:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:07:13 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:07:13 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:07:13 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:07:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:07:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:07:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:07:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:07:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:07:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460433.105305 Apr 28 07:07:13 fir-md1-s1 kernel: Pid: 114979, comm: mdt03_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:07:13 fir-md1-s1 kernel: Call Trace: Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:07:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:07:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:07:14 fir-md1-s1 kernel: Pid: 114823, comm: mdt00_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:07:14 fir-md1-s1 kernel: Call Trace: Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:07:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:07:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:07:14 fir-md1-s1 kernel: Pid: 114915, comm: mdt01_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:07:14 fir-md1-s1 kernel: Call Trace: Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:07:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:07:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:07:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:07:17 fir-md1-s1 kernel: Pid: 114894, comm: mdt03_046 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:07:17 fir-md1-s1 kernel: Call Trace: Apr 28 07:07:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:07:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:07:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:07:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:07:17 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:07:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:07:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:07:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:07:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:07:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:07:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:07:18 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:07:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:07:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:07:18 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:07:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460438.114894 Apr 28 07:07:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460463.105301 Apr 28 07:07:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460469.105005 Apr 28 07:09:23 fir-md1-s1 kernel: LNet: Service thread pid 115014 completed after 3091.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:09:23 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Apr 28 07:09:51 fir-md1-s1 kernel: Lustre: 114967:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b55ab24dd00 x1631562986794112/t0(0) o101->315cf750-5ce7-61a0-093d-91bfc52b74be@10.8.17.10@o2ib6:26/0 lens 480/568 e 0 to 0 dl 1556460596 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 07:09:51 fir-md1-s1 kernel: Lustre: 105094:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b48c9751500 x1631562986794096/t0(0) o101->315cf750-5ce7-61a0-093d-91bfc52b74be@10.8.17.10@o2ib6:26/0 lens 480/568 e 0 to 0 dl 1556460596 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 07:09:51 fir-md1-s1 kernel: Lustre: 105094:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages Apr 28 07:10:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460618.105295 Apr 28 07:12:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to fec67fa2-7566-c452-41ab-6f040647c599 (at 10.9.102.3@o2ib4) Apr 28 07:12:10 fir-md1-s1 kernel: Lustre: Skipped 2018 previous similar messages Apr 28 07:12:46 fir-md1-s1 kernel: Pid: 114831, comm: mdt02_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:12:46 fir-md1-s1 kernel: Call Trace: Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:12:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:12:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:12:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460766.114831 Apr 28 07:12:46 fir-md1-s1 kernel: Pid: 105252, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:12:46 fir-md1-s1 kernel: Call Trace: Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:12:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:12:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:12:46 fir-md1-s1 kernel: Pid: 105106, comm: mdt02_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:12:46 fir-md1-s1 kernel: Call Trace: Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:12:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:12:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:12:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:13:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 32b4877a-e8a3-e77c-9d56-903e3045a875 (at 10.8.17.22@o2ib6) reconnecting Apr 28 07:13:12 fir-md1-s1 kernel: Lustre: Skipped 2012 previous similar messages Apr 28 07:13:31 fir-md1-s1 kernel: LustreError: 114911:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556460721, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4997ad5c40/0x378007fc72689c60 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 367 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114911 timeout: 0 lvb_type: 0 Apr 28 07:13:31 fir-md1-s1 kernel: LustreError: 114911:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 16 previous similar messages Apr 28 07:14:30 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.101.43@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b4d332b7980/0x378007fc0246b811 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 363 type: IBT flags: 0x60200400000020 nid: 10.9.101.43@o2ib4 remote: 0x4c511184a3f5cc5e expref: 10 pid: 114804 timeout: 460704 lvb_type: 0 Apr 28 07:14:30 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 07:15:21 fir-md1-s1 kernel: Pid: 114877, comm: mdt02_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:15:21 fir-md1-s1 kernel: Call Trace: Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:15:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:15:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:15:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556460921.114877 Apr 28 07:15:21 fir-md1-s1 kernel: Pid: 114911, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:15:21 fir-md1-s1 kernel: Call Trace: Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:15:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:15:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:15:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:15:21 fir-md1-s1 kernel: LNet: Service thread pid 105119 was inactive for 200.48s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:15:21 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Apr 28 07:17:50 fir-md1-s1 kernel: LNet: Service thread pid 114798 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:17:50 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 07:17:50 fir-md1-s1 kernel: Pid: 114798, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:17:50 fir-md1-s1 kernel: Call Trace: Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:17:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:17:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:17:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461070.114798 Apr 28 07:17:50 fir-md1-s1 kernel: Pid: 114980, comm: mdt02_099 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:17:50 fir-md1-s1 kernel: Call Trace: Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:17:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:17:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:17:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:17:51 fir-md1-s1 kernel: Pid: 114814, comm: mdt01_079 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:17:51 fir-md1-s1 kernel: Call Trace: Apr 28 07:17:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:17:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:17:51 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:17:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:17:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:17:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:17:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:17:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:17:56 fir-md1-s1 kernel: Pid: 104965, comm: mdt00_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:17:56 fir-md1-s1 kernel: Call Trace: Apr 28 07:17:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:17:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:17:56 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:17:56 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:17:56 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:17:56 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:17:56 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:17:56 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:17:56 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:17:56 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:17:56 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:17:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:17:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:17:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:17:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:17:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461077.104965 Apr 28 07:17:57 fir-md1-s1 kernel: Pid: 104335, comm: mdt02_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:17:57 fir-md1-s1 kernel: Call Trace: Apr 28 07:17:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:17:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:17:57 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:17:57 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:17:57 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:17:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:17:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:17:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:17:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:17:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:18:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461107.114809 Apr 28 07:18:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461138.105299 Apr 28 07:20:37 fir-md1-s1 kernel: LNet: Service thread pid 105233 completed after 3765.42s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:20:37 fir-md1-s1 kernel: LNet: Skipped 18 previous similar messages Apr 28 07:20:42 fir-md1-s1 kernel: LustreError: 23547:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.108.33@o2ib4 arrived at 1556461242 with bad export cookie 3999205246285698999 Apr 28 07:21:07 fir-md1-s1 kernel: Lustre: 105011:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5b7a7d0f00 x1631715582547376/t0(0) o101->9017b2fd-d1de-a8da-328e-8aeae87aa675@10.9.102.60@o2ib4:12/0 lens 568/0 e 0 to 0 dl 1556461272 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:21:07 fir-md1-s1 kernel: Lustre: 105011:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Apr 28 07:21:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461293.105376 Apr 28 07:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to af5b7989-62c5-cdec-7106-6e583e65ea6b (at 10.9.102.26@o2ib4) Apr 28 07:22:14 fir-md1-s1 kernel: Lustre: Skipped 1936 previous similar messages Apr 28 07:23:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5924c705-ac90-422d-3e46-a0ea5d70203c (at 10.9.102.26@o2ib4) reconnecting Apr 28 07:23:16 fir-md1-s1 kernel: Lustre: Skipped 1932 previous similar messages Apr 28 07:24:02 fir-md1-s1 kernel: Pid: 105238, comm: mdt02_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:24:02 fir-md1-s1 kernel: Call Trace: Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:24:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:24:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:24:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461442.105238 Apr 28 07:24:02 fir-md1-s1 kernel: Pid: 114892, comm: mdt01_104 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:24:02 fir-md1-s1 kernel: Call Trace: Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:24:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:24:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:24:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:24:37 fir-md1-s1 kernel: LustreError: 114918:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556461387, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b60f406ec00/0x378007fc89dd8c63 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 356 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114918 timeout: 0 lvb_type: 0 Apr 28 07:24:37 fir-md1-s1 kernel: LustreError: 114918:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 28 previous similar messages Apr 28 07:25:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.8.13@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4862b39680/0x378007fc0249952e lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 354 type: IBT flags: 0x60200400000020 nid: 10.8.8.13@o2ib6 remote: 0x9d3cf04f8092711d expref: 164 pid: 114838 timeout: 461381 lvb_type: 0 Apr 28 07:25:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 28 07:25:47 fir-md1-s1 kernel: LustreError: 105093:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b724ba25000 ns: mdt-fir-MDT0002_UUID lock: ffff8b7280230fc0/0x378007fc024bdf8a lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 349 type: IBT flags: 0x50200400000020 nid: 10.9.101.72@o2ib4 remote: 0x9ca54af26aae84d8 expref: 6 pid: 105093 timeout: 0 lvb_type: 0 Apr 28 07:25:47 fir-md1-s1 kernel: LustreError: 105093:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 12 previous similar messages Apr 28 07:25:47 fir-md1-s1 kernel: Lustre: 105093:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (3439:636s); client may timeout. req@ffff8b70d0fd4500 x1631535207394592/t0(0) o101->17f43594-ae2e-f7ad-12ba-21540c4255a2@10.9.101.72@o2ib4:22/0 lens 576/1792 e 0 to 0 dl 1556460911 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 07:25:47 fir-md1-s1 kernel: Lustre: 105093:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 07:26:27 fir-md1-s1 kernel: Pid: 105063, comm: mdt01_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:26:27 fir-md1-s1 kernel: Call Trace: Apr 28 07:26:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:26:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:26:27 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:26:27 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:26:27 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:26:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:26:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:26:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:26:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:26:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:26:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461587.105063 Apr 28 07:26:28 fir-md1-s1 kernel: Pid: 104911, comm: mdt01_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:26:28 fir-md1-s1 kernel: Call Trace: Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:26:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:26:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:26:28 fir-md1-s1 kernel: Pid: 105419, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:26:28 fir-md1-s1 kernel: Call Trace: Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:26:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:26:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:26:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:26:28 fir-md1-s1 kernel: LNet: Service thread pid 114881 was inactive for 200.89s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:26:28 fir-md1-s1 kernel: LNet: Skipped 15 previous similar messages Apr 28 07:26:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461598.105237 Apr 28 07:29:07 fir-md1-s1 kernel: LNet: Service thread pid 114802 was inactive for 200.18s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:29:07 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 07:29:07 fir-md1-s1 kernel: Pid: 114802, comm: mdt01_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:29:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461747.114802 Apr 28 07:29:07 fir-md1-s1 kernel: Pid: 114928, comm: mdt00_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:29:07 fir-md1-s1 kernel: Pid: 114912, comm: mdt01_110 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:29:08 fir-md1-s1 kernel: Pid: 114948, comm: mdt02_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:29:08 fir-md1-s1 kernel: Call Trace: Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:29:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:29:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:29:08 fir-md1-s1 kernel: Pid: 104692, comm: mdt01_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:29:08 fir-md1-s1 kernel: Call Trace: Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:29:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:29:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:29:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:29:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461752.105121 Apr 28 07:29:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461782.114910 Apr 28 07:30:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461812.104932 Apr 28 07:30:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461820.105093 Apr 28 07:31:59 fir-md1-s1 kernel: LNet: Service thread pid 104332 completed after 4447.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:31:59 fir-md1-s1 kernel: LNet: Skipped 39 previous similar messages Apr 28 07:32:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.109.12@o2ib4) Apr 28 07:32:15 fir-md1-s1 kernel: Lustre: Skipped 1955 previous similar messages Apr 28 07:32:29 fir-md1-s1 kernel: Lustre: 105046:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b62dced8c00 x1631750590503936/t0(0) o101->e6780d30-74e1-317f-cff3-98cea097e023@10.8.20.17@o2ib6:4/0 lens 480/568 e 0 to 0 dl 1556461954 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 07:32:29 fir-md1-s1 kernel: Lustre: 105046:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 45 previous similar messages Apr 28 07:32:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556461969.105112 Apr 28 07:33:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9a15b23e-e39a-6029-5c05-ad2362b1e59e (at 10.9.109.12@o2ib4) reconnecting Apr 28 07:33:17 fir-md1-s1 kernel: Lustre: Skipped 1945 previous similar messages Apr 28 07:35:25 fir-md1-s1 kernel: Pid: 114799, comm: mdt01_071 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:35:25 fir-md1-s1 kernel: Call Trace: Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:35:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:35:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:35:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462125.114799 Apr 28 07:35:25 fir-md1-s1 kernel: Pid: 105025, comm: mdt01_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:35:25 fir-md1-s1 kernel: Call Trace: Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:35:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:35:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:35:25 fir-md1-s1 kernel: Pid: 104967, comm: mdt01_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:35:25 fir-md1-s1 kernel: Call Trace: Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:35:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:35:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:35:25 fir-md1-s1 kernel: Pid: 105017, comm: mdt02_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:35:25 fir-md1-s1 kernel: Call Trace: Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:35:25 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:35:25 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:35:25 fir-md1-s1 kernel: Pid: 105127, comm: mdt02_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:35:25 fir-md1-s1 kernel: Call Trace: Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:35:25 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:35:26 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:35:26 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:35:26 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:35:26 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:35:26 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:35:26 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:35:26 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:35:26 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:35:26 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:35:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462154.114975 Apr 28 07:36:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462161.114937 Apr 28 07:36:46 fir-md1-s1 kernel: LustreError: 114989:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556462116, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b73052e1440/0x378007fca37b62a4 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 369 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114989 timeout: 0 lvb_type: 0 Apr 28 07:36:46 fir-md1-s1 kernel: LustreError: 114989:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 51 previous similar messages Apr 28 07:37:41 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.113.15@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b6158255100/0x378007fc025267f0 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 367 type: IBT flags: 0x60200400000020 nid: 10.9.113.15@o2ib4 remote: 0xe738849d979ba7e0 expref: 36 pid: 114857 timeout: 462095 lvb_type: 0 Apr 28 07:37:41 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6 previous similar messages Apr 28 07:37:45 fir-md1-s1 kernel: LustreError: 104996:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7271b1a400 ns: mdt-fir-MDT0002_UUID lock: ffff8b721bba3840/0x378007fc0252a29e lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 362 type: IBT flags: 0x50200400000020 nid: 10.8.8.7@o2ib6 remote: 0x33085ce11471dee6 expref: 6 pid: 104996 timeout: 0 lvb_type: 0 Apr 28 07:37:45 fir-md1-s1 kernel: LustreError: 104996:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 07:37:45 fir-md1-s1 kernel: Lustre: 104996:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (4494:299s); client may timeout. req@ffff8b3f6e291b00 x1631546793817632/t0(0) o101->f7f29fbd-f06d-1e4f-a662-2d2ae362522d@10.8.8.7@o2ib6:22/0 lens 576/1792 e 0 to 0 dl 1556461966 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 07:37:45 fir-md1-s1 kernel: Lustre: 104996:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 07:38:36 fir-md1-s1 kernel: LNet: Service thread pid 114878 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:38:36 fir-md1-s1 kernel: LNet: Skipped 44 previous similar messages Apr 28 07:38:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462316.114878 Apr 28 07:40:20 fir-md1-s1 kernel: LustreError: 115000:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b625204c000 ns: mdt-fir-MDT0002_UUID lock: ffff8b430591f980/0x378007fc0255c21f lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 370 type: IBT flags: 0x50200000000000 nid: 10.8.20.20@o2ib6 remote: 0xc5e25a5da5e9f94c expref: 2 pid: 115000 timeout: 0 lvb_type: 0 Apr 28 07:40:20 fir-md1-s1 kernel: Lustre: 115000:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:4794s); client may timeout. req@ffff8b5829349800 x1631750589587280/t0(0) o101->2cc0bc1b-7a1f-9dab-b36c-c6206a02385d@10.8.20.20@o2ib6:22/0 lens 568/2296 e 0 to 0 dl 1556457626 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 07:41:06 fir-md1-s1 kernel: LNet: Service thread pid 105075 was inactive for 200.74s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:41:06 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 07:41:06 fir-md1-s1 kernel: Pid: 105075, comm: mdt03_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:41:06 fir-md1-s1 kernel: Call Trace: Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:41:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:41:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:41:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462466.105075 Apr 28 07:41:06 fir-md1-s1 kernel: Pid: 114854, comm: mdt00_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:41:06 fir-md1-s1 kernel: Call Trace: Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:41:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:41:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:41:06 fir-md1-s1 kernel: Pid: 104328, comm: mdt00_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:41:06 fir-md1-s1 kernel: Call Trace: Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:41:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:41:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:41:06 fir-md1-s1 kernel: Pid: 114971, comm: mdt02_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:41:06 fir-md1-s1 kernel: Call Trace: Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:41:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:41:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:41:06 fir-md1-s1 kernel: Pid: 114861, comm: mdt02_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:41:06 fir-md1-s1 kernel: Call Trace: Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:41:06 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:41:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:41:07 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:41:07 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:41:07 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:41:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:41:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:41:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:41:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:41:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:42:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e56d1481-3cdb-cbeb-b735-3d410d2549a0 (at 10.8.1.27@o2ib6) Apr 28 07:42:22 fir-md1-s1 kernel: Lustre: Skipped 1869 previous similar messages Apr 28 07:43:18 fir-md1-s1 kernel: Lustre: 114916:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b564172b600 x1631534533441600/t0(0) o101->7849f272-44ac-34e4-1524-52c5874c1815@10.9.107.32@o2ib4:23/0 lens 584/3264 e 0 to 0 dl 1556462603 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 07:43:18 fir-md1-s1 kernel: Lustre: 114916:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 128 previous similar messages Apr 28 07:43:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7849f272-44ac-34e4-1524-52c5874c1815 (at 10.9.107.32@o2ib4) reconnecting Apr 28 07:43:24 fir-md1-s1 kernel: Lustre: Skipped 1859 previous similar messages Apr 28 07:43:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462621.114803 Apr 28 07:46:13 fir-md1-s1 kernel: Pid: 104335, comm: mdt02_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:46:13 fir-md1-s1 kernel: Call Trace: Apr 28 07:46:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:46:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:46:13 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:46:13 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:46:13 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:46:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:46:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:46:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 07:46:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:46:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 07:46:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556462773.104335 Apr 28 07:47:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 07:47:22 fir-md1-s1 kernel: LustreError: Skipped 2776 previous similar messages Apr 28 07:48:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 07:50:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.26@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 07:50:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.26@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 07:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3c33f1b9-0922-3e31-0187-5e50f51dcf63 (at 10.9.108.32@o2ib4) Apr 28 07:52:25 fir-md1-s1 kernel: Lustre: Skipped 1956 previous similar messages Apr 28 07:52:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.15@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 07:52:32 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Apr 28 07:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d76fd0e2-c0e8-e1af-41b7-af513684736a (at 10.9.108.32@o2ib4) reconnecting Apr 28 07:53:27 fir-md1-s1 kernel: Lustre: Skipped 1958 previous similar messages Apr 28 07:55:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.24@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 07:55:03 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Apr 28 07:55:20 fir-md1-s1 kernel: LustreError: 105020:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8b5bccaee850 x1631568856920112/t0(0) o4->645c01b6-7440-897a-ad36-a9e0b6138a74@10.8.7.15@o2ib6:26/0 lens 488/448 e 0 to 0 dl 1556463326 ref 1 fl Interpret:/0/0 rc 0/0 Apr 28 07:55:20 fir-md1-s1 kernel: LustreError: 104912:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk WRITE failed: rc -107 req@ffff8b40b7ffec50 x1631568856920128/t0(0) o4->645c01b6-7440-897a-ad36-a9e0b6138a74@10.8.7.15@o2ib6:26/0 lens 488/448 e 0 to 0 dl 1556463326 ref 1 fl Interpret:/0/0 rc 0/0 Apr 28 07:55:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 645c01b6-7440-897a-ad36-a9e0b6138a74 (at 10.8.7.15@o2ib6), client will retry: rc = -107 Apr 28 08:00:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.24@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 08:00:04 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Apr 28 08:02:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e56d1481-3cdb-cbeb-b735-3d410d2549a0 (at 10.8.1.27@o2ib6) Apr 28 08:02:31 fir-md1-s1 kernel: Lustre: Skipped 1861 previous similar messages Apr 28 08:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 97fdc779-cf49-9cfe-e70e-4fa32248f62a (at 10.8.1.27@o2ib6) reconnecting Apr 28 08:03:33 fir-md1-s1 kernel: Lustre: Skipped 1862 previous similar messages Apr 28 08:05:30 fir-md1-s1 kernel: Lustre: 114916:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b5a62bfd100 x1631555503691920/t0(0) o101->42800284-789e-e9cc-0ebd-dbacb154f6ac@10.9.107.31@o2ib4:5/0 lens 584/3264 e 1 to 0 dl 1556463935 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 08:06:45 fir-md1-s1 kernel: LustreError: 114941:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556463915, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b357f7a1200/0x378007fce2a0334b lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 410 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 114941 timeout: 0 lvb_type: 0 Apr 28 08:06:45 fir-md1-s1 kernel: LustreError: 114941:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 106 previous similar messages Apr 28 08:08:36 fir-md1-s1 kernel: LNet: Service thread pid 114941 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:08:36 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Apr 28 08:08:36 fir-md1-s1 kernel: Pid: 114941, comm: mdt02_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:08:36 fir-md1-s1 kernel: Call Trace: Apr 28 08:08:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:08:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:08:36 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:08:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:08:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:08:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:08:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:08:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:08:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:08:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:08:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464116.114941 Apr 28 08:08:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.13.10@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 08:08:39 fir-md1-s1 kernel: LustreError: Skipped 34 previous similar messages Apr 28 08:12:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 3c33f1b9-0922-3e31-0187-5e50f51dcf63 (at 10.9.108.32@o2ib4) Apr 28 08:12:34 fir-md1-s1 kernel: Lustre: Skipped 1964 previous similar messages Apr 28 08:13:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d76fd0e2-c0e8-e1af-41b7-af513684736a (at 10.9.108.32@o2ib4) reconnecting Apr 28 08:13:36 fir-md1-s1 kernel: Lustre: Skipped 1965 previous similar messages Apr 28 08:14:32 fir-md1-s1 kernel: LustreError: 105264:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b733c574400 ns: mdt-fir-MDT0002_UUID lock: ffff8b4254595580/0x378007fc8fa48486 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 406 type: IBT flags: 0x50200000000000 nid: 10.8.13.20@o2ib6 remote: 0xdf15508116c217b5 expref: 2 pid: 105264 timeout: 0 lvb_type: 0 Apr 28 08:14:32 fir-md1-s1 kernel: LNet: Service thread pid 104932 completed after 2860.10s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:14:32 fir-md1-s1 kernel: LNet: Skipped 181 previous similar messages Apr 28 08:14:32 fir-md1-s1 kernel: LustreError: 105264:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 52 previous similar messages Apr 28 08:14:32 fir-md1-s1 kernel: Lustre: 105264:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (42:2878s); client may timeout. req@ffff8b378eae3f00 x1631727176871024/t0(0) o101->980c53c1-d60f-2717-9259-d8f7cc6e1f79@10.8.13.20@o2ib6:4/0 lens 568/2296 e 0 to 0 dl 1556461594 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 08:14:32 fir-md1-s1 kernel: Lustre: 105264:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 53 previous similar messages Apr 28 08:14:47 fir-md1-s1 kernel: Lustre: 114974:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b3e1c9b9b00 x1631814299621840/t0(0) o101->6da928ad-923b-cec3-5920-76a1fc1b7ec3@10.9.107.30@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1556464492 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 08:15:02 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.107.31@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b723c33dc40/0x378007fc9df1c6ce lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 335 type: IBT flags: 0x60200400000020 nid: 10.9.107.31@o2ib4 remote: 0xbd12da816a023bbc expref: 19 pid: 114937 timeout: 464336 lvb_type: 0 Apr 28 08:15:02 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 08:16:02 fir-md1-s1 kernel: LustreError: 104990:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556464472, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b3fdf6de540/0x378007fcf67d2b3b lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 331 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104990 timeout: 0 lvb_type: 0 Apr 28 08:16:02 fir-md1-s1 kernel: LustreError: 104990:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Apr 28 08:17:53 fir-md1-s1 kernel: LNet: Service thread pid 104967 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:17:53 fir-md1-s1 kernel: Pid: 104967, comm: mdt01_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:17:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464673.104967 Apr 28 08:17:53 fir-md1-s1 kernel: Pid: 105002, comm: mdt01_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:17:53 fir-md1-s1 kernel: Pid: 114910, comm: mdt01_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:17:53 fir-md1-s1 kernel: Pid: 104356, comm: mdt01_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:17:53 fir-md1-s1 kernel: Pid: 114937, comm: mdt03_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:17:53 fir-md1-s1 kernel: Call Trace: Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:17:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:17:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:17:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:17:53 fir-md1-s1 kernel: LNet: Service thread pid 104948 was inactive for 201.34s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:17:53 fir-md1-s1 kernel: LNet: Skipped 110 previous similar messages Apr 28 08:18:04 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.68@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b390f70ad00/0x378007fcadf25d29 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 329 type: IBT flags: 0x60200400000020 nid: 10.9.101.68@o2ib4 remote: 0xf1b661a3e3ac09d9 expref: 10 pid: 114821 timeout: 464518 lvb_type: 0 Apr 28 08:18:04 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 08:18:04 fir-md1-s1 kernel: LNet: Service thread pid 104728 completed after 2263.52s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:18:04 fir-md1-s1 kernel: LNet: Skipped 45 previous similar messages Apr 28 08:18:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464702.114971 Apr 28 08:18:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464703.105085 Apr 28 08:18:34 fir-md1-s1 kernel: Lustre: 104355:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b35ae71b900 x1631533069200960/t0(0) o101->ca15d879-1cb2-8780-e5e2-20230d9e27cf@10.8.28.3@o2ib6:9/0 lens 480/568 e 0 to 0 dl 1556464719 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 08:18:34 fir-md1-s1 kernel: Lustre: 104355:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32 previous similar messages Apr 28 08:18:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464732.114803 Apr 28 08:18:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464735.115001 Apr 28 08:19:39 fir-md1-s1 kernel: LustreError: 114852:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556464689, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b7275c5aac0/0x378007fcfe3f94af lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 326 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 114852 timeout: 0 lvb_type: 0 Apr 28 08:19:39 fir-md1-s1 kernel: LustreError: 114852:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 32 previous similar messages Apr 28 08:21:30 fir-md1-s1 kernel: LNet: Service thread pid 114836 was inactive for 200.42s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:21:30 fir-md1-s1 kernel: LNet: Skipped 29 previous similar messages Apr 28 08:21:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464890.114836 Apr 28 08:21:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.24.4@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 08:21:35 fir-md1-s1 kernel: LustreError: Skipped 27 previous similar messages Apr 28 08:22:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.11.29@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b6304a38d80/0x378007fcadf2af62 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 324 type: IBT flags: 0x60200400000020 nid: 10.8.11.29@o2ib6 remote: 0x8f6516595eb2856e expref: 13 pid: 114930 timeout: 464763 lvb_type: 0 Apr 28 08:22:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Apr 28 08:22:09 fir-md1-s1 kernel: LNet: Service thread pid 114906 completed after 2508.45s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:22:09 fir-md1-s1 kernel: LNet: Skipped 17 previous similar messages Apr 28 08:22:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464937.114843 Apr 28 08:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 97b33d4c-9335-2592-bf9a-6ed88b66c71c (at 10.9.107.30@o2ib4) Apr 28 08:22:35 fir-md1-s1 kernel: Lustre: Skipped 1837 previous similar messages Apr 28 08:22:39 fir-md1-s1 kernel: Lustre: 114801:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (2324:215s); client may timeout. req@ffff8b70cee32700 x1631574132748176/t219291095674(0) o36->9a15b23e-e39a-6029-5c05-ad2362b1e59e@10.9.109.12@o2ib4:20/0 lens 488/424 e 0 to 0 dl 1556464744 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 08:22:39 fir-md1-s1 kernel: Lustre: 114801:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Apr 28 08:22:57 fir-md1-s1 kernel: LNet: Service thread pid 104966 was inactive for 200.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:22:57 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:22:57 fir-md1-s1 kernel: Pid: 104966, comm: mdt02_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:22:57 fir-md1-s1 kernel: Call Trace: Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:22:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:22:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:22:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556464977.104966 Apr 28 08:22:57 fir-md1-s1 kernel: Pid: 114956, comm: mdt02_088 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:22:57 fir-md1-s1 kernel: Call Trace: Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:22:57 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:22:57 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:22:57 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:23:28 fir-md1-s1 kernel: Pid: 104977, comm: mdt01_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:23:28 fir-md1-s1 kernel: Call Trace: Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:23:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:23:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:23:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465008.104977 Apr 28 08:23:28 fir-md1-s1 kernel: Pid: 114864, comm: mdt01_098 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:23:28 fir-md1-s1 kernel: Call Trace: Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:23:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:23:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:23:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:23:34 fir-md1-s1 kernel: Lustre: 114982:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3fe0787800 x1631534604311424/t0(0) o101->a1458810-dd8e-0b24-b694-3bdddf660753@10.9.108.68@o2ib4:9/0 lens 576/3264 e 0 to 0 dl 1556465019 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 08:23:34 fir-md1-s1 kernel: Lustre: 114982:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Apr 28 08:23:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6da928ad-923b-cec3-5920-76a1fc1b7ec3 (at 10.9.107.30@o2ib4) reconnecting Apr 28 08:23:38 fir-md1-s1 kernel: Lustre: Skipped 1822 previous similar messages Apr 28 08:24:40 fir-md1-s1 kernel: LustreError: 104333:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556464990, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4bd26c1b00/0x378007fd08ded7f4 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 315 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104333 timeout: 0 lvb_type: 0 Apr 28 08:24:40 fir-md1-s1 kernel: LustreError: 104333:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 14 previous similar messages Apr 28 08:25:32 fir-md1-s1 kernel: Pid: 105306, comm: mdt01_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:25:32 fir-md1-s1 kernel: Call Trace: Apr 28 08:25:32 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:25:32 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:25:32 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:25:32 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:25:33 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:25:33 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:25:33 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:25:33 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:25:33 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:25:33 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:25:33 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:25:33 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:25:33 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:25:33 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:25:33 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:25:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465133.105306 Apr 28 08:25:33 fir-md1-s1 kernel: LNet: Service thread pid 104330 was inactive for 200.78s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:25:33 fir-md1-s1 kernel: LNet: Skipped 15 previous similar messages Apr 28 08:25:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465159.114854 Apr 28 08:26:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465163.114986 Apr 28 08:26:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465189.105406 Apr 28 08:26:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465190.104333 Apr 28 08:27:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.17.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b45c8f6e300/0x378007fcadf32e42 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 311 type: IBT flags: 0x60200400000020 nid: 10.8.17.22@o2ib6 remote: 0xc646eb3e400f889a expref: 3379 pid: 104957 timeout: 465063 lvb_type: 0 Apr 28 08:27:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 28 08:27:09 fir-md1-s1 kernel: LNet: Service thread pid 114829 completed after 2808.41s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:27:09 fir-md1-s1 kernel: LNet: Skipped 16 previous similar messages Apr 28 08:28:59 fir-md1-s1 kernel: LNet: Service thread pid 104336 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:28:59 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:28:59 fir-md1-s1 kernel: Pid: 104336, comm: mdt03_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:28:59 fir-md1-s1 kernel: Call Trace: Apr 28 08:28:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:28:59 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:28:59 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:28:59 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:28:59 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:28:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:28:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:28:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:28:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465339.104336 Apr 28 08:29:05 fir-md1-s1 kernel: Pid: 114951, comm: mdt02_085 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:29:05 fir-md1-s1 kernel: Call Trace: Apr 28 08:29:05 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:29:05 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:29:05 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:29:05 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:29:05 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:29:06 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:29:06 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:29:06 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:29:06 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:29:06 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:29:06 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:29:06 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:29:06 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:29:06 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:29:06 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:29:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465346.114951 Apr 28 08:29:36 fir-md1-s1 kernel: Pid: 114924, comm: mdt03_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:29:36 fir-md1-s1 kernel: Call Trace: Apr 28 08:29:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:29:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:29:36 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:29:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:29:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:29:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:29:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:29:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:29:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:29:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:29:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465376.114924 Apr 28 08:30:02 fir-md1-s1 kernel: Pid: 104940, comm: mdt01_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:30:02 fir-md1-s1 kernel: Call Trace: Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:30:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:30:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:30:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465402.104940 Apr 28 08:30:02 fir-md1-s1 kernel: Pid: 105123, comm: mdt02_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:30:02 fir-md1-s1 kernel: Call Trace: Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:30:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:30:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:30:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:30:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:30:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:30:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:30:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:30:36 fir-md1-s1 kernel: LNet: Service thread pid 105038 was inactive for 200.49s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:30:36 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 08:30:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465436.105038 Apr 28 08:31:06 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465466.104957 Apr 28 08:31:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 08:31:48 fir-md1-s1 kernel: LustreError: Skipped 19 previous similar messages Apr 28 08:32:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.102.1@o2ib4) Apr 28 08:32:36 fir-md1-s1 kernel: Lustre: Skipped 1860 previous similar messages Apr 28 08:33:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465616.114816 Apr 28 08:33:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 97fdc779-cf49-9cfe-e70e-4fa32248f62a (at 10.8.1.27@o2ib6) reconnecting Apr 28 08:33:39 fir-md1-s1 kernel: Lustre: Skipped 1848 previous similar messages Apr 28 08:35:45 fir-md1-s1 kernel: Lustre: 114909:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4a5bba6900 x1631534763023888/t0(0) o101->d5b6a969-5184-379c-8847-a4c9d97dce1a@10.9.109.4@o2ib4:20/0 lens 568/0 e 0 to 0 dl 1556465750 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:35:45 fir-md1-s1 kernel: Lustre: 114909:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Apr 28 08:36:10 fir-md1-s1 kernel: Pid: 104963, comm: mdt03_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:10 fir-md1-s1 kernel: Call Trace: Apr 28 08:36:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:10 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:36:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:36:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465771.104963 Apr 28 08:36:11 fir-md1-s1 kernel: Pid: 104982, comm: mdt00_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:11 fir-md1-s1 kernel: Call Trace: Apr 28 08:36:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:11 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:36:11 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:36:11 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:36:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:36:50 fir-md1-s1 kernel: LustreError: 105251:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556465720, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b448a346540/0x378007fd22341a67 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 301 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105251 timeout: 0 lvb_type: 0 Apr 28 08:36:50 fir-md1-s1 kernel: LustreError: 105251:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Apr 28 08:37:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.109.4@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b72ca7d9680/0x378007fcadf330aa lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 299 type: IBT flags: 0x60200400000020 nid: 10.9.109.4@o2ib4 remote: 0x11bb2c83d68fd078 expref: 11 pid: 105404 timeout: 465693 lvb_type: 0 Apr 28 08:37:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 08:37:39 fir-md1-s1 kernel: LNet: Service thread pid 104994 completed after 3438.40s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:37:39 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:38:40 fir-md1-s1 kernel: Pid: 105251, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:38:40 fir-md1-s1 kernel: Call Trace: Apr 28 08:38:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:38:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:38:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:38:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:38:40 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:38:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:38:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:38:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:38:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:38:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:38:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:38:41 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:38:41 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:38:41 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:38:41 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:38:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556465921.105251 Apr 28 08:41:13 fir-md1-s1 kernel: LNet: Service thread pid 105421 was inactive for 200.66s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:41:13 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 08:41:13 fir-md1-s1 kernel: Pid: 105421, comm: mdt00_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:41:13 fir-md1-s1 kernel: Call Trace: Apr 28 08:41:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:41:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:41:13 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:41:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:41:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:41:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:41:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:41:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:41:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466073.105421 Apr 28 08:42:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to c49fd28a-b873-e174-b6dc-a92b5b8edf6b (at 10.9.108.8@o2ib4) Apr 28 08:42:39 fir-md1-s1 kernel: Lustre: Skipped 1773 previous similar messages Apr 28 08:43:36 fir-md1-s1 kernel: Pid: 105413, comm: mdt01_064 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:43:36 fir-md1-s1 kernel: Call Trace: Apr 28 08:43:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:43:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:43:36 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:43:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:43:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:43:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:43:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:43:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:43:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466216.105413 Apr 28 08:43:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 56a57cb9-3d9f-bc51-dd74-55bd81619cfc (at 10.9.108.8@o2ib4) reconnecting Apr 28 08:43:41 fir-md1-s1 kernel: Lustre: Skipped 1759 previous similar messages Apr 28 08:44:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.1.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 08:44:07 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Apr 28 08:46:05 fir-md1-s1 kernel: Pid: 105104, comm: mdt02_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:46:05 fir-md1-s1 kernel: Call Trace: Apr 28 08:46:05 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:46:05 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:46:05 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:46:05 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:46:05 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:46:05 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:46:05 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:46:05 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:46:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466365.105104 Apr 28 08:46:50 fir-md1-s1 kernel: LustreError: 104338:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556466320, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b7216712640/0x378007fd36ccc1d9 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 255 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104338 timeout: 0 lvb_type: 0 Apr 28 08:46:50 fir-md1-s1 kernel: LustreError: 104338:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 28 08:47:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.107.66@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b36c06b7080/0x378007fcadf33159 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 255 type: IBT flags: 0x60200400000020 nid: 10.9.107.66@o2ib4 remote: 0xc437851239cdc266 expref: 489 pid: 105021 timeout: 466293 lvb_type: 0 Apr 28 08:47:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Apr 28 08:47:39 fir-md1-s1 kernel: LNet: Service thread pid 105234 completed after 4038.40s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:47:39 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 28 08:48:14 fir-md1-s1 kernel: Lustre: 115010:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3755a92700 x1631681050369488/t0(0) o101->7da2364c-273e-9791-279a-dee1848c518b@10.8.25.6@o2ib6:19/0 lens 568/0 e 0 to 0 dl 1556466499 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:48:14 fir-md1-s1 kernel: Lustre: 115010:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 28 08:48:41 fir-md1-s1 kernel: Pid: 104338, comm: mdt03_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:48:41 fir-md1-s1 kernel: Call Trace: Apr 28 08:48:41 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:48:41 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:48:41 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:48:41 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:48:41 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:48:41 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:48:41 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:48:41 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:48:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466521.104338 Apr 28 08:51:09 fir-md1-s1 kernel: Pid: 105048, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:09 fir-md1-s1 kernel: Call Trace: Apr 28 08:51:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:51:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466670.105048 Apr 28 08:51:10 fir-md1-s1 kernel: Pid: 114874, comm: mdt01_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:10 fir-md1-s1 kernel: Call Trace: Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:51:10 fir-md1-s1 kernel: Pid: 105053, comm: mdt03_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:10 fir-md1-s1 kernel: Call Trace: Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:51:10 fir-md1-s1 kernel: Pid: 105011, comm: mdt02_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:10 fir-md1-s1 kernel: Call Trace: Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:51:10 fir-md1-s1 kernel: LNet: Service thread pid 105303 was inactive for 200.63s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:51:10 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:52:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8399d63a-5bf1-2891-e648-7bdf89b4d1ea (at 10.8.21.14@o2ib6) Apr 28 08:52:40 fir-md1-s1 kernel: Lustre: Skipped 1690 previous similar messages Apr 28 08:53:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25512127-e6de-b60b-cf78-f84b6ec57480 (at 10.8.21.14@o2ib6) reconnecting Apr 28 08:53:42 fir-md1-s1 kernel: Lustre: Skipped 1682 previous similar messages Apr 28 08:53:45 fir-md1-s1 kernel: LNet: Service thread pid 105252 was inactive for 200.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:53:45 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 08:53:45 fir-md1-s1 kernel: Pid: 105252, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:53:45 fir-md1-s1 kernel: Call Trace: Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:53:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:53:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:53:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466825.105252 Apr 28 08:53:45 fir-md1-s1 kernel: Pid: 114914, comm: mdt00_076 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:53:45 fir-md1-s1 kernel: Call Trace: Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:53:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:53:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:53:45 fir-md1-s1 kernel: Pid: 105106, comm: mdt02_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:53:45 fir-md1-s1 kernel: Call Trace: Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:53:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:53:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:53:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:53:46 fir-md1-s1 kernel: Pid: 114821, comm: mdt00_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:53:46 fir-md1-s1 kernel: Call Trace: Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:53:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:53:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:53:46 fir-md1-s1 kernel: Pid: 114967, comm: mdt02_093 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:53:46 fir-md1-s1 kernel: Call Trace: Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:53:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:53:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:53:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:53:46 fir-md1-s1 kernel: LNet: Service thread pid 105300 was inactive for 201.18s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:53:46 fir-md1-s1 kernel: LNet: Skipped 10 previous similar messages Apr 28 08:54:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.8.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 08:54:13 fir-md1-s1 kernel: LustreError: Skipped 21 previous similar messages Apr 28 08:56:01 fir-md1-s1 kernel: LustreError: 114941:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b5796351000 ns: mdt-fir-MDT0002_UUID lock: ffff8b357f7a1200/0x378007fce2a0334b lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 289 type: IBT flags: 0x50200400000020 nid: 10.9.107.31@o2ib4 remote: 0xbd12da816a024b98 expref: 4 pid: 114941 timeout: 0 lvb_type: 0 Apr 28 08:56:01 fir-md1-s1 kernel: LustreError: 114941:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 7 previous similar messages Apr 28 08:56:01 fir-md1-s1 kernel: Lustre: 114941:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (587:2459s); client may timeout. req@ffff8b5a62bfd100 x1631555503691920/t0(0) o101->42800284-789e-e9cc-0ebd-dbacb154f6ac@10.9.107.31@o2ib4:5/0 lens 584/1792 e 1 to 0 dl 1556464502 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 08:56:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556466975.114857 Apr 28 08:56:59 fir-md1-s1 kernel: LustreError: 104951:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556466929, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4c76b18480/0x378007fd4bab5ac6 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 301 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104951 timeout: 0 lvb_type: 0 Apr 28 08:56:59 fir-md1-s1 kernel: LustreError: 104951:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 50 previous similar messages Apr 28 08:58:49 fir-md1-s1 kernel: Pid: 104951, comm: mdt01_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:58:49 fir-md1-s1 kernel: Call Trace: Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:58:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:58:49 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:58:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467129.104951 Apr 28 08:58:49 fir-md1-s1 kernel: Pid: 105376, comm: mdt03_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:58:49 fir-md1-s1 kernel: Call Trace: Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:58:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:58:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:58:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:59:00 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.107.30@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b37dfa9aac0/0x378007fcf67dce4f lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 301 type: IBT flags: 0x60200400000020 nid: 10.9.107.30@o2ib4 remote: 0x1387dd2daa9b9529 expref: 20 pid: 105257 timeout: 466974 lvb_type: 0 Apr 28 08:59:00 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 28 08:59:00 fir-md1-s1 kernel: LNet: Service thread pid 114910 completed after 2667.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:59:00 fir-md1-s1 kernel: LNet: Skipped 63 previous similar messages Apr 28 08:59:21 fir-md1-s1 kernel: Pid: 105262, comm: mdt00_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:59:21 fir-md1-s1 kernel: Call Trace: Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:59:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:59:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:59:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467161.105262 Apr 28 08:59:21 fir-md1-s1 kernel: Pid: 105423, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:59:21 fir-md1-s1 kernel: Call Trace: Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:59:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:59:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:59:21 fir-md1-s1 kernel: Pid: 114928, comm: mdt00_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:59:21 fir-md1-s1 kernel: Call Trace: Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:59:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 08:59:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:59:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 08:59:21 fir-md1-s1 kernel: LNet: Service thread pid 104910 was inactive for 200.73s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:59:21 fir-md1-s1 kernel: LNet: Skipped 29 previous similar messages Apr 28 08:59:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467163.114812 Apr 28 08:59:37 fir-md1-s1 kernel: Lustre: 104973:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b44e9e2b000 x1631295977876512/t0(0) o101->761fd9f8-5106-f638-0275-47efaff85a15@10.8.1.6@o2ib6:12/0 lens 568/0 e 0 to 0 dl 1556467182 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:59:37 fir-md1-s1 kernel: Lustre: 104973:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 68 previous similar messages Apr 28 08:59:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467192.104332 Apr 28 08:59:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467194.114911 Apr 28 09:02:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467352.114987 Apr 28 09:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8bfed59e-0ae2-8175-d0ee-2a61e691b9d0 (at 10.8.25.13@o2ib6) Apr 28 09:02:43 fir-md1-s1 kernel: Lustre: Skipped 1682 previous similar messages Apr 28 09:02:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467370.114881 Apr 28 09:03:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467383.114797 Apr 28 09:03:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1eed9d25-9802-9a67-1bce-978ce6293b9f (at 10.8.25.13@o2ib6) reconnecting Apr 28 09:03:45 fir-md1-s1 kernel: Lustre: Skipped 1672 previous similar messages Apr 28 09:04:44 fir-md1-s1 kernel: LustreError: 48689:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.30.20@o2ib6 arrived at 1556467484 with bad export cookie 3999205245175714375 Apr 28 09:04:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.1.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:04:47 fir-md1-s1 kernel: LustreError: Skipped 21 previous similar messages Apr 28 09:05:20 fir-md1-s1 kernel: LNet: Service thread pid 114969 was inactive for 200.59s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:05:20 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 09:05:20 fir-md1-s1 kernel: Pid: 114969, comm: mdt02_095 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:05:20 fir-md1-s1 kernel: Call Trace: Apr 28 09:05:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:05:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:05:20 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:05:20 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:05:20 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:05:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:05:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:05:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:05:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:05:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:05:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:05:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:05:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:05:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:05:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:05:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467521.114969 Apr 28 09:05:38 fir-md1-s1 kernel: Pid: 105284, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:05:38 fir-md1-s1 kernel: Call Trace: Apr 28 09:05:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:05:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:05:38 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:05:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:05:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:05:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:05:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:05:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:05:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467538.105284 Apr 28 09:07:30 fir-md1-s1 kernel: LustreError: 114996:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b5796351000 ns: mdt-fir-MDT0002_UUID lock: ffff8b6311e77740/0x378007fcf67f418e lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 297 type: IBT flags: 0x50200400000020 nid: 10.9.107.31@o2ib4 remote: 0xbd12da816a024c78 expref: 2 pid: 114996 timeout: 0 lvb_type: 0 Apr 28 09:07:30 fir-md1-s1 kernel: Lustre: 114996:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (40:3138s); client may timeout. req@ffff8b5e9b7dfb00 x1631555503764000/t0(0) o101->42800284-789e-e9cc-0ebd-dbacb154f6ac@10.9.107.31@o2ib4:12/0 lens 568/2296 e 0 to 0 dl 1556464512 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 09:08:04 fir-md1-s1 kernel: Pid: 105046, comm: mdt02_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:08:04 fir-md1-s1 kernel: Call Trace: Apr 28 09:08:04 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:08:04 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:08:04 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:08:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:08:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:08:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:08:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:08:04 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:08:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467684.105046 Apr 28 09:08:30 fir-md1-s1 kernel: LustreError: 105112:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556467620, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4e8ff4a880/0x378007fd641c2839 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 307 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105112 timeout: 0 lvb_type: 0 Apr 28 09:08:30 fir-md1-s1 kernel: LustreError: 105112:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 27 previous similar messages Apr 28 09:10:00 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.102.64@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b3f652cba80/0x378007fcf67e9305 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 306 type: IBT flags: 0x60200400000020 nid: 10.9.102.64@o2ib4 remote: 0x23c8738a3b654895 expref: 10 pid: 114850 timeout: 467634 lvb_type: 0 Apr 28 09:10:00 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 28 09:10:00 fir-md1-s1 kernel: LustreError: 105085:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b34d02a4c00 ns: mdt-fir-MDT0002_UUID lock: ffff8b7271b3bf00/0x378007fcf7926dac lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 303 type: IBT flags: 0x50200400000020 nid: 10.9.108.32@o2ib4 remote: 0xe64f46a1a610511f expref: 2 pid: 105085 timeout: 0 lvb_type: 0 Apr 28 09:10:00 fir-md1-s1 kernel: LNet: Service thread pid 114803 completed after 3268.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:10:00 fir-md1-s1 kernel: LNet: Skipped 19 previous similar messages Apr 28 09:10:00 fir-md1-s1 kernel: Lustre: 105085:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3267s); client may timeout. req@ffff8b7061ed7b00 x1631558580176032/t0(0) o101->d76fd0e2-c0e8-e1af-41b7-af513684736a@10.9.108.32@o2ib4:3/0 lens 568/2296 e 0 to 0 dl 1556464533 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 09:10:20 fir-md1-s1 kernel: Pid: 104990, comm: mdt00_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:10:20 fir-md1-s1 kernel: Call Trace: Apr 28 09:10:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:10:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:10:20 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:10:20 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:10:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:10:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:10:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467821.104990 Apr 28 09:10:21 fir-md1-s1 kernel: Pid: 104948, comm: mdt02_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:10:21 fir-md1-s1 kernel: Call Trace: Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:10:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:10:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:10:21 fir-md1-s1 kernel: Pid: 104356, comm: mdt01_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:10:21 fir-md1-s1 kernel: Call Trace: Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:10:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:10:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:10:21 fir-md1-s1 kernel: Pid: 105112, comm: mdt01_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:10:21 fir-md1-s1 kernel: Call Trace: Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:10:21 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:10:21 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:10:21 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:10:25 fir-md1-s1 kernel: Lustre: 105014:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b51a735da00 x1631546251154112/t0(0) o101->49530de5-f172-5bb3-a0d3-bd0ce56d3339@10.8.7.17@o2ib6:0/0 lens 480/568 e 0 to 0 dl 1556467830 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:10:25 fir-md1-s1 kernel: Lustre: 105014:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Apr 28 09:10:39 fir-md1-s1 kernel: Pid: 115020, comm: mdt03_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:10:39 fir-md1-s1 kernel: Call Trace: Apr 28 09:10:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:10:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:10:39 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:10:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:10:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:10:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:10:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:10:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:10:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:10:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467840.115020 Apr 28 09:10:40 fir-md1-s1 kernel: LNet: Service thread pid 114823 was inactive for 200.50s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:10:40 fir-md1-s1 kernel: LNet: Skipped 17 previous similar messages Apr 28 09:10:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467850.114975 Apr 28 09:11:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556467871.105234 Apr 28 09:12:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to d57e6085-f336-4114-317e-790fc3e13342 (at 10.8.17.22@o2ib6) Apr 28 09:12:43 fir-md1-s1 kernel: Lustre: Skipped 1665 previous similar messages Apr 28 09:13:00 fir-md1-s1 kernel: LustreError: 104968:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b706caef000 ns: mdt-fir-MDT0002_UUID lock: ffff8b52a9867500/0x378007fcf8aa79a6 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 303 type: IBT flags: 0x50200400000020 nid: 10.9.101.68@o2ib4 remote: 0xf1b661a3e3ac0bb5 expref: 2 pid: 104968 timeout: 0 lvb_type: 0 Apr 28 09:13:00 fir-md1-s1 kernel: Lustre: 104968:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:3292s); client may timeout. req@ffff8b70b5ff8900 x1631558437692864/t0(0) o101->b1560181-32d0-3000-87fb-1969e5df2f5e@10.9.101.68@o2ib4:4/0 lens 568/2296 e 0 to 0 dl 1556464688 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 09:13:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468001.114803 Apr 28 09:13:30 fir-md1-s1 kernel: LustreError: 104996:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b724ba23000 ns: mdt-fir-MDT0002_UUID lock: ffff8b50087b7500/0x378007fd4656f31b lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 298 type: IBT flags: 0x50200400000020 nid: 10.8.17.11@o2ib6 remote: 0x23a0b0480ea31262 expref: 13 pid: 104996 timeout: 0 lvb_type: 0 Apr 28 09:13:30 fir-md1-s1 kernel: LustreError: 104996:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 26 previous similar messages Apr 28 09:13:30 fir-md1-s1 kernel: Lustre: 104971:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:1082s); client may timeout. req@ffff8b44e6f98c00 x1631558286389648/t0(0) o101->3ef17f0c-d35b-8428-c1da-c84a40a8bdbc@10.9.101.71@o2ib4:24/0 lens 576/1792 e 0 to 0 dl 1556466928 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 09:13:30 fir-md1-s1 kernel: Lustre: 104971:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 26 previous similar messages Apr 28 09:14:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 661f0cfa-e148-dc98-69cd-517192e597e7 (at 10.8.7.3@o2ib6) reconnecting Apr 28 09:14:02 fir-md1-s1 kernel: Lustre: Skipped 1549 previous similar messages Apr 28 09:14:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.19@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:14:58 fir-md1-s1 kernel: LustreError: Skipped 49 previous similar messages Apr 28 09:16:51 fir-md1-s1 kernel: LNet: Service thread pid 114952 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:16:51 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 09:16:51 fir-md1-s1 kernel: Pid: 114952, comm: mdt02_086 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:16:51 fir-md1-s1 kernel: Call Trace: Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:16:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:16:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:16:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468211.114952 Apr 28 09:16:51 fir-md1-s1 kernel: Pid: 114930, comm: mdt02_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:16:51 fir-md1-s1 kernel: Call Trace: Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:16:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:16:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:16:51 fir-md1-s1 kernel: Pid: 105252, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:16:51 fir-md1-s1 kernel: Call Trace: Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:16:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:16:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:16:51 fir-md1-s1 kernel: Pid: 105301, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:16:51 fir-md1-s1 kernel: Call Trace: Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:16:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:16:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:16:51 fir-md1-s1 kernel: Pid: 105048, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:16:51 fir-md1-s1 kernel: Call Trace: Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:16:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:16:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:16:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:16:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468212.114911 Apr 28 09:17:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468273.105419 Apr 28 09:19:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468367.114814 Apr 28 09:20:11 fir-md1-s1 kernel: LustreError: 105406:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468321, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b43013f0480/0x378007fd7c4ed1df lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 209 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105406 timeout: 0 lvb_type: 0 Apr 28 09:20:11 fir-md1-s1 kernel: LustreError: 105406:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 92 previous similar messages Apr 28 09:21:02 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.7.3@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b4f9cf02640/0x378007fd71b7e872 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 209 type: IBT flags: 0x60200400000020 nid: 10.8.7.3@o2ib6 remote: 0x2e0af7b7f9b62467 expref: 268 pid: 105301 timeout: 468296 lvb_type: 0 Apr 28 09:21:02 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 28 09:21:02 fir-md1-s1 kernel: LNet: Service thread pid 114952 completed after 451.29s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:21:02 fir-md1-s1 kernel: LNet: Skipped 148 previous similar messages Apr 28 09:21:41 fir-md1-s1 kernel: Lustre: 105112:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5225361b00 x1631546268666976/t0(0) o101->1f6469d3-d26f-d8c9-bf51-966fcd210811@10.8.2.34@o2ib6:16/0 lens 568/0 e 0 to 0 dl 1556468506 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:21:41 fir-md1-s1 kernel: Lustre: 105112:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 136 previous similar messages Apr 28 09:22:02 fir-md1-s1 kernel: Pid: 105406, comm: mdt00_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:22:02 fir-md1-s1 kernel: Call Trace: Apr 28 09:22:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:22:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:22:02 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:22:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:22:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:22:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:22:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:22:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:22:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468522.105406 Apr 28 09:22:33 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0002 Apr 28 09:22:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:22:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.108.32@o2ib4 (stopping) Apr 28 09:22:34 fir-md1-s1 kernel: LustreError: 105281:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b45e877a400 ns: mdt-fir-MDT0002_UUID lock: ffff8b4784f60fc0/0x378007fd71b9faf8 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 160 type: IBT flags: 0x50200400000020 nid: 10.8.22.23@o2ib6 remote: 0x26b2b40a925265ad expref: 643 pid: 105281 timeout: 0 lvb_type: 0 Apr 28 09:22:34 fir-md1-s1 kernel: LustreError: 105281:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 17 previous similar messages Apr 28 09:22:34 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Apr 28 09:22:34 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_disconnect to node 0@lo failed: rc = -107 Apr 28 09:22:34 fir-md1-s1 kernel: Lustre: 105419:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:389s); client may timeout. req@ffff8b51c16b0600 x1631892717588512/t0(0) o101->ae62f7a4-768c-db41-e2f9-002069bc0e09@10.8.7.23@o2ib6:3/0 lens 584/1792 e 0 to 0 dl 1556468165 ref 1 fl Complete:/0/0 rc -19/-19 Apr 28 09:22:34 fir-md1-s1 kernel: Lustre: 105419:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 23 previous similar messages Apr 28 09:22:34 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Apr 28 09:22:34 fir-md1-s1 kernel: LustreError: 21023:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.82@o2ib6 arrived at 1556468554 with bad export cookie 3999205221519734419 Apr 28 09:22:34 fir-md1-s1 kernel: LustreError: 21023:0:(ldlm_lock.c:2677:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0000_UUID lock: ffff8b42e7d08480/0x378007fcfcf933ac lrc: 3/0,0 mode: CR/CR res: [0x20001a2ba:0x5d9c:0x0].0x0 bits 0x9/0x0 rrc: 13 type: IBT flags: 0x40200000000000 nid: 10.8.0.82@o2ib6 remote: 0xb0774129de8b0903 expref: 1625 pid: 105406 timeout: 0 lvb_type: 0 Apr 28 09:22:35 fir-md1-s1 kernel: LustreError: 23528:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.21.20@o2ib6 arrived at 1556468555 with bad export cookie 3999205221519723289 Apr 28 09:22:35 fir-md1-s1 kernel: LustreError: 20346:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b3f7abbf800 x1631591794137632/t0(0) o41->fir-MDT0001-osp-MDT0002@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:22:35 fir-md1-s1 kernel: LustreError: 82844:0:(ldlm_lock.c:2677:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff8b46e1e28480/0x378007fcfcbfd9bf lrc: 3/0,0 mode: PR/PR res: [0x2c001a5cd:0x900:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.9.107.34@o2ib4 remote: 0xfa42ba6f3d0da70d expref: 167 pid: 114904 timeout: 464509 lvb_type: 0 Apr 28 09:22:35 fir-md1-s1 kernel: LustreError: 82844:0:(ldlm_lock.c:2677:ldlm_lock_dump_handle()) Skipped 2 previous similar messages Apr 28 09:22:35 fir-md1-s1 kernel: LustreError: 20343:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b48c9757500 x1631591794137824/t0(0) o41->fir-MDT0003-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:22:35 fir-md1-s1 kernel: LustreError: 20343:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Apr 28 09:22:37 fir-md1-s1 kernel: LustreError: 21023:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.8@o2ib6 arrived at 1556468557 with bad export cookie 3999205221519738605 Apr 28 09:22:37 fir-md1-s1 kernel: LustreError: 21023:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages Apr 28 09:22:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.2.22@o2ib6 (stopping) Apr 28 09:22:38 fir-md1-s1 kernel: Lustre: Skipped 334 previous similar messages Apr 28 09:22:38 fir-md1-s1 kernel: LustreError: 20341:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b38807f2d00 x1631591794138768/t0(0) o41->fir-MDT0000-osp-MDT0002@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:22:39 fir-md1-s1 kernel: LustreError: 82844:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.104.22@o2ib4 arrived at 1556468559 with bad export cookie 3999205221519727601 Apr 28 09:22:39 fir-md1-s1 kernel: LustreError: 82844:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 6 previous similar messages Apr 28 09:22:44 fir-md1-s1 kernel: LustreError: 23528:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.23.19@o2ib6 arrived at 1556468564 with bad export cookie 3999205221519725501 Apr 28 09:22:44 fir-md1-s1 kernel: LustreError: 23528:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 5 previous similar messages Apr 28 09:22:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.107.17@o2ib4 (stopping) Apr 28 09:22:46 fir-md1-s1 kernel: Lustre: Skipped 721 previous similar messages Apr 28 09:22:52 fir-md1-s1 kernel: LustreError: 23528:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.27.33@o2ib6 arrived at 1556468572 with bad export cookie 3999205221519730065 Apr 28 09:22:52 fir-md1-s1 kernel: LustreError: 23528:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 6 previous similar messages Apr 28 09:23:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.102.66@o2ib4 (stopping) Apr 28 09:23:02 fir-md1-s1 kernel: Lustre: Skipped 1600 previous similar messages Apr 28 09:23:08 fir-md1-s1 kernel: Lustre: server umount fir-MDT0000 complete Apr 28 09:23:20 fir-md1-s1 kernel: LustreError: 26839:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b3f3e171500 x1631591794139776/t0(0) o101->fir-MDT0000-lwp-MDT0002@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:23:20 fir-md1-s1 kernel: LustreError: 26839:0:(qsd_reint.c:56:qsd_reint_completion()) fir-MDT0002: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 Apr 28 09:23:20 fir-md1-s1 kernel: LustreError: 26839:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 1 previous similar message Apr 28 09:23:27 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Apr 28 09:24:02 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:24:02 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 28 09:24:02 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:24:02 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:24:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.17.21@o2ib6 (not set up) Apr 28 09:24:03 fir-md1-s1 kernel: Lustre: Skipped 110 previous similar messages Apr 28 09:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:24:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:24:05 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 28 09:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 28 09:24:05 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 09:24:05 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Apr 28 09:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:24:06 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Apr 28 09:24:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect Apr 28 09:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d3dcd8ee-7913-062f-8514-9178ef53d789 (at 10.0.10.107@o2ib7) Apr 28 09:24:08 fir-md1-s1 kernel: Lustre: Skipped 916 previous similar messages Apr 28 09:24:09 fir-md1-s1 kernel: LustreError: 27102:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b521bf2e600 x1631596518336128/t0(0) o601->fir-MDT0000-lwp-OST002e_UUID@10.0.10.107@o2ib7:15/0 lens 336/0 e 0 to 0 dl 1556468655 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:24:09 fir-md1-s1 kernel: LustreError: 27102:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 1007 previous similar messages Apr 28 09:24:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1326 clients reconnect Apr 28 09:24:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:24:20 fir-md1-s1 kernel: LustreError: 27105:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b5bd9ab9800 x1631596508812160/t0(0) o601->fir-MDT0000-lwp-OST0004_UUID@10.0.10.101@o2ib7:20/0 lens 336/0 e 0 to 0 dl 1556468690 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:24:20 fir-md1-s1 kernel: LustreError: 27105:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 19 previous similar messages Apr 28 09:24:36 fir-md1-s1 kernel: LustreError: 27518:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b427cdc0900 x1631596599935824/t0(0) o601->fir-MDT0000-lwp-OST0005_UUID@10.0.10.102@o2ib7:6/0 lens 336/0 e 0 to 0 dl 1556468706 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:24:36 fir-md1-s1 kernel: LustreError: 27518:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 175 previous similar messages Apr 28 09:24:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.0.10.107@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:24:58 fir-md1-s1 kernel: LustreError: Skipped 5557 previous similar messages Apr 28 09:25:01 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Apr 28 09:25:01 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (5): c: 0, oc: 0, rc: 8 Apr 28 09:25:09 fir-md1-s1 kernel: LustreError: 27453:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b5020faf500 x1631596518364256/t0(0) o601->fir-MDT0000-lwp-OST002e_UUID@10.0.10.107@o2ib7:9/0 lens 336/0 e 0 to 0 dl 1556468739 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:25:09 fir-md1-s1 kernel: LustreError: 27453:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 154 previous similar messages Apr 28 09:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 1:37, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 4:57. If you do not want to wait more, please abort the recovery by force. Apr 28 09:25:47 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages Apr 28 09:26:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ac0b0a5c-beb8-3040-e55d-d0c9dcd3f011 (at 10.8.1.18@o2ib6) reconnecting Apr 28 09:26:18 fir-md1-s1 kernel: Lustre: Skipped 853 previous similar messages Apr 28 09:29:07 fir-md1-s1 kernel: LNet: Service thread pid 27725 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:29:07 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Apr 28 09:29:07 fir-md1-s1 kernel: Pid: 27725, comm: mdt01_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:29:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468947.27725 Apr 28 09:29:07 fir-md1-s1 kernel: Pid: 27615, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:29:07 fir-md1-s1 kernel: Pid: 27623, comm: mdt00_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:29:07 fir-md1-s1 kernel: Pid: 27630, comm: mdt00_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:29:07 fir-md1-s1 kernel: Pid: 27869, comm: mdt00_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:07 fir-md1-s1 kernel: Call Trace: Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:08 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:29:08 fir-md1-s1 kernel: LNet: Service thread pid 27860 was inactive for 200.66s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:29:08 fir-md1-s1 kernel: LNet: Skipped 80 previous similar messages Apr 28 09:29:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468964.27913 Apr 28 09:29:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556468965.27470 Apr 28 09:29:48 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0002 Apr 28 09:29:48 fir-md1-s1 kernel: LustreError: 27088:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7319b3b000 ns: mdt-fir-MDT0002_UUID lock: ffff8b7212f39f80/0x378007fd84533898 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 121 type: IBT flags: 0x50200400000020 nid: 10.8.22.30@o2ib6 remote: 0x126ce2b3a8aef387 expref: 1687 pid: 27088 timeout: 0 lvb_type: 0 Apr 28 09:29:48 fir-md1-s1 kernel: LustreError: 27088:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 65 previous similar messages Apr 28 09:29:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.22.30@o2ib6 (stopping) Apr 28 09:29:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:29:48 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_disconnect to node 10.0.10.52@o2ib7 failed: rc = -107 Apr 28 09:29:48 fir-md1-s1 kernel: LustreError: 28742:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0002-osp-MDT0000: can't disconnect: rc = -19 Apr 28 09:29:48 fir-md1-s1 kernel: LustreError: 28742:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0000-mdtlov: error cleaning up LOD index 2: cmd 0xcf031: rc = -19 Apr 28 09:29:48 fir-md1-s1 kernel: Lustre: 27920:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:87s); client may timeout. req@ffff8b5e642eb900 x1631651152511280/t0(0) o101->5924c705-ac90-422d-3e46-a0ea5d70203c@10.9.102.26@o2ib4:17/0 lens 568/2296 e 0 to 0 dl 1556468901 ref 1 fl Complete:/0/0 rc -19/-19 Apr 28 09:29:48 fir-md1-s1 kernel: Lustre: 27920:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 09:29:49 fir-md1-s1 kernel: LustreError: 20343:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b72644ba700 x1631591949000560/t0(0) o41->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:29:49 fir-md1-s1 kernel: LustreError: 20343:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Apr 28 09:29:50 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Apr 28 09:29:53 fir-md1-s1 kernel: Lustre: server umount fir-MDT0000 complete Apr 28 09:29:57 fir-md1-s1 kernel: LustreError: 23526:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.102.39@o2ib4 arrived at 1556468997 with bad export cookie 3999205254532490104 Apr 28 09:29:57 fir-md1-s1 kernel: LustreError: 23526:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages Apr 28 09:30:05 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:30:05 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 28 09:30:05 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:30:05 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:30:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:30:08 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 28 09:30:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 28 09:30:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1326 clients reconnect Apr 28 09:30:08 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 09:30:08 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Apr 28 09:30:28 fir-md1-s1 kernel: LustreError: 29015:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b5bd9abe900 x1631596471348320/t0(0) o601->fir-MDT0000-lwp-OST001d_UUID@10.0.10.106@o2ib7:28/0 lens 336/0 e 0 to 0 dl 1556469058 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:30:28 fir-md1-s1 kernel: LustreError: 29015:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 299 previous similar messages Apr 28 09:31:05 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Apr 28 09:31:05 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (6): c: 0, oc: 0, rc: 8 Apr 28 09:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: disconnecting 1 stale clients Apr 28 09:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 5:42. If you do not want to wait more, please abort the recovery by force. Apr 28 09:31:49 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Apr 28 09:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 1:41, of 1328 clients 1327 recovered and 1 was evicted. Apr 28 09:31:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:31:49 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0000-lwp-MDT0002: operation quota_acquire to node 0@lo failed: rc = -11 Apr 28 09:32:14 fir-md1-s1 kernel: Lustre: 29613:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b62f743b000 x1631727178645328/t0(0) o101->980c53c1-d60f-2717-9259-d8f7cc6e1f79@10.8.13.20@o2ib6:19/0 lens 480/568 e 0 to 0 dl 1556469139 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:32:14 fir-md1-s1 kernel: Lustre: 29613:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 106 previous similar messages Apr 28 09:33:19 fir-md1-s1 kernel: LustreError: 29005:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469109, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b62737fde80/0x378007fd8941da12 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 110 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 29005 timeout: 0 lvb_type: 0 Apr 28 09:33:19 fir-md1-s1 kernel: LustreError: 29005:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 121 previous similar messages Apr 28 09:34:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.102.25@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b523bc545c0/0x378007fd894165b9 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 110 type: IBT flags: 0x60000400000020 nid: 10.9.102.25@o2ib4 remote: 0x7fa05e6b251b7fb6 expref: 23 pid: 29205 timeout: 469092 lvb_type: 0 Apr 28 09:34:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 09:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to ec3a6b48-c57a-17c3-f292-1109fbbb4e4d (at 10.9.107.32@o2ib4) Apr 28 09:34:24 fir-md1-s1 kernel: Lustre: Skipped 6085 previous similar messages Apr 28 09:35:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:35:04 fir-md1-s1 kernel: LustreError: Skipped 5196 previous similar messages Apr 28 09:35:09 fir-md1-s1 kernel: Pid: 29066, comm: mdt03_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:09 fir-md1-s1 kernel: Call Trace: Apr 28 09:35:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:09 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:09 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:35:09 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:35:09 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:35:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:09 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:35:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469309.29066 Apr 28 09:35:10 fir-md1-s1 kernel: Pid: 29565, comm: mdt01_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:10 fir-md1-s1 kernel: Call Trace: Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:35:10 fir-md1-s1 kernel: Pid: 29567, comm: mdt01_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:10 fir-md1-s1 kernel: Call Trace: Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:35:10 fir-md1-s1 kernel: Pid: 29568, comm: mdt01_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:10 fir-md1-s1 kernel: Call Trace: Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:35:10 fir-md1-s1 kernel: Pid: 29571, comm: mdt01_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:10 fir-md1-s1 kernel: Call Trace: Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:36:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 315cf750-5ce7-61a0-093d-91bfc52b74be (at 10.8.17.10@o2ib6) reconnecting Apr 28 09:36:28 fir-md1-s1 kernel: Lustre: Skipped 710 previous similar messages Apr 28 09:36:54 fir-md1-s1 kernel: LNet: Service thread pid 29433 completed after 305.07s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:36:54 fir-md1-s1 kernel: LNet: Skipped 123 previous similar messages Apr 28 09:36:59 fir-md1-s1 kernel: Lustre: 29565:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (309:1s); client may timeout. req@ffff8b49e8271200 x1631589134992512/t0(0) o101->3dcb322d-e380-40cb-7045-6103248b3328@10.8.2.16@o2ib6:19/0 lens 480/536 e 0 to 0 dl 1556469418 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 09:36:59 fir-md1-s1 kernel: LustreError: 29442:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4319078000 ns: mdt-fir-MDT0002_UUID lock: ffff8b42b32460c0/0x378007fd8941db70 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 106 type: IBT flags: 0x50200400000020 nid: 10.8.21.14@o2ib6 remote: 0x73f1e31c4de4d48f expref: 4 pid: 29442 timeout: 0 lvb_type: 0 Apr 28 09:36:59 fir-md1-s1 kernel: LustreError: 29442:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 57 previous similar messages Apr 28 09:36:59 fir-md1-s1 kernel: Lustre: 29565:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 09:37:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469464.29008 Apr 28 09:37:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469465.29403 Apr 28 09:38:22 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0000 Apr 28 09:38:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 09:38:22 fir-md1-s1 kernel: LustreError: 29720:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b421ddb4c00 ns: mdt-fir-MDT0002_UUID lock: ffff8b5321459440/0x378007fd8941f04d lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 126 type: IBT flags: 0x50200400000020 nid: 10.8.13.20@o2ib6 remote: 0xdf15508116c7c7f4 expref: 1833 pid: 29720 timeout: 0 lvb_type: 0 Apr 28 09:38:22 fir-md1-s1 kernel: Lustre: 29641:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:239s); client may timeout. req@ffff8b3db9652400 x1631545533762384/t0(0) o101->59d5c5f3-7800-62b5-895a-6920fabd87eb@10.9.102.25@o2ib4:19/0 lens 568/2296 e 0 to 0 dl 1556469263 ref 1 fl Complete:/0/0 rc -19/-19 Apr 28 09:38:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.13.20@o2ib6 (stopping) Apr 28 09:38:22 fir-md1-s1 kernel: Lustre: Skipped 818 previous similar messages Apr 28 09:38:23 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_disconnect to node 10.0.10.52@o2ib7 failed: rc = -107 Apr 28 09:38:23 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages Apr 28 09:38:23 fir-md1-s1 kernel: LustreError: 30326:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0002-osp-MDT0000: can't disconnect: rc = -19 Apr 28 09:38:23 fir-md1-s1 kernel: LustreError: 30326:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0000-mdtlov: error cleaning up LOD index 2: cmd 0xcf031: rc = -19 Apr 28 09:38:24 fir-md1-s1 kernel: LustreError: 20302:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b3cff48a100 x1631592113867216/t0(0) o41->fir-MDT0002-osp-MDT0000@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:38:24 fir-md1-s1 kernel: LustreError: 20302:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 28 09:38:27 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Apr 28 09:38:29 fir-md1-s1 kernel: LustreError: 53868:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.27.6@o2ib6 arrived at 1556469509 with bad export cookie 3999205254615334733 Apr 28 09:38:53 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:38:53 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 28 09:38:53 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:38:53 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 2:30, or until 1326 clients reconnect Apr 28 09:38:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:38:56 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 09:38:56 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Apr 28 09:38:56 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 28 09:38:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 09:39:00 fir-md1-s1 kernel: LustreError: 30979:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b4275ea2100 x1631596463483104/t0(0) o601->fir-MDT0000-lwp-OST0018_UUID@10.0.10.105@o2ib7:0/0 lens 336/0 e 0 to 0 dl 1556469570 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:39:00 fir-md1-s1 kernel: LustreError: 30580:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b40a4ed2100 x1631596463483088/t0(0) o601->fir-MDT0000-lwp-OST0018_UUID@10.0.10.105@o2ib7:0/0 lens 336/0 e 0 to 0 dl 1556469570 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:39:00 fir-md1-s1 kernel: LustreError: 30580:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 19 previous similar messages Apr 28 09:39:00 fir-md1-s1 kernel: LustreError: 30979:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 1 previous similar message Apr 28 09:39:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 0:53, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:39:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:39:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 3:24, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 09:39:50 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Apr 28 09:39:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 0:54, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:42:45 fir-md1-s1 kernel: Lustre: 30578:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b71e6fbb300 x1631535330632832/t0(0) o101->b5280270-3b22-224e-0daa-bad5776be543@10.9.103.24@o2ib4:20/0 lens 568/0 e 0 to 0 dl 1556469770 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:42:45 fir-md1-s1 kernel: Lustre: 31447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5fc8203600 x1631546626164992/t0(0) o101->b2d6ba71-31e6-985c-c04f-54e302ddc48e@10.9.102.3@o2ib4:20/0 lens 480/568 e 0 to 0 dl 1556469770 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:42:45 fir-md1-s1 kernel: Lustre: 31447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 122 previous similar messages Apr 28 09:43:11 fir-md1-s1 kernel: LNet: Service thread pid 31206 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:43:11 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 09:43:11 fir-md1-s1 kernel: Pid: 31206, comm: mdt01_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:11 fir-md1-s1 kernel: Call Trace: Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:43:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469791.31206 Apr 28 09:43:11 fir-md1-s1 kernel: Pid: 31217, comm: mdt02_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:11 fir-md1-s1 kernel: Call Trace: Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:43:11 fir-md1-s1 kernel: Pid: 31225, comm: mdt01_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:11 fir-md1-s1 kernel: Call Trace: Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:43:11 fir-md1-s1 kernel: Pid: 31255, comm: mdt01_036 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:11 fir-md1-s1 kernel: Call Trace: Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:43:11 fir-md1-s1 kernel: Pid: 31256, comm: mdt01_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:11 fir-md1-s1 kernel: Call Trace: Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:43:11 fir-md1-s1 kernel: LNet: Service thread pid 31257 was inactive for 200.99s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:43:11 fir-md1-s1 kernel: LNet: Skipped 99 previous similar messages Apr 28 09:43:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469796.31730 Apr 28 09:43:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469822.30911 Apr 28 09:43:50 fir-md1-s1 kernel: LustreError: 31458:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469740, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5f667945c0/0x378007fd96f26d73 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 117 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 31458 timeout: 0 lvb_type: 0 Apr 28 09:43:50 fir-md1-s1 kernel: LustreError: 31458:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 68 previous similar messages Apr 28 09:44:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to fec67fa2-7566-c452-41ab-6f040647c599 (at 10.9.102.3@o2ib4) Apr 28 09:44:24 fir-md1-s1 kernel: Lustre: Skipped 3485 previous similar messages Apr 28 09:44:50 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.103.24@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b5279f645c0/0x378007fd937758cb lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 117 type: IBT flags: 0x60200400000020 nid: 10.9.103.24@o2ib4 remote: 0xc9ac1acb20371828 expref: 10 pid: 31215 timeout: 469724 lvb_type: 0 Apr 28 09:44:50 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 09:45:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.13.10@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:45:26 fir-md1-s1 kernel: LustreError: Skipped 559 previous similar messages Apr 28 09:45:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469940.31636 Apr 28 09:45:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556469946.31448 Apr 28 09:46:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9a15b23e-e39a-6029-5c05-ad2362b1e59e (at 10.9.109.12@o2ib4) reconnecting Apr 28 09:46:28 fir-md1-s1 kernel: Lustre: Skipped 733 previous similar messages Apr 28 09:47:25 fir-md1-s1 kernel: LNet: Service thread pid 31190 completed after 454.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:47:25 fir-md1-s1 kernel: LNet: Skipped 60 previous similar messages Apr 28 09:48:15 fir-md1-s1 kernel: Pid: 30951, comm: mdt03_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:48:15 fir-md1-s1 kernel: Call Trace: Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:48:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:48:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:48:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470095.30951 Apr 28 09:48:15 fir-md1-s1 kernel: Pid: 31464, comm: mdt02_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:48:15 fir-md1-s1 kernel: Call Trace: Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:48:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:48:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:48:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:48:16 fir-md1-s1 kernel: Pid: 30917, comm: mdt02_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:48:16 fir-md1-s1 kernel: Call Trace: Apr 28 09:48:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:48:16 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:48:16 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:48:16 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:48:16 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:48:16 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:48:16 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:48:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:48:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:48:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:48:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470096.30917 Apr 28 09:49:25 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0000 Apr 28 09:49:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 31220:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b60683b9000 ns: mdt-fir-MDT0002_UUID lock: ffff8b523ae81200/0x378007fd93776933 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 118 type: IBT flags: 0x50200400000020 nid: 10.9.107.64@o2ib4 remote: 0x5231ea5fd588162b expref: 8 pid: 31220 timeout: 0 lvb_type: 0 Apr 28 09:49:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Not available for connect from 10.9.106.46@o2ib4 (stopping) Apr 28 09:49:26 fir-md1-s1 kernel: Lustre: Skipped 317 previous similar messages Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 31220:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 52 previous similar messages Apr 28 09:49:26 fir-md1-s1 kernel: Lustre: 31275:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:422s); client may timeout. req@ffff8b62a33cd400 x1631542998444768/t0(0) o101->acb643ef-75ad-6f92-b388-57634462f54f@10.8.28.6@o2ib6:20/0 lens 568/2296 e 0 to 0 dl 1556469744 ref 1 fl Complete:/0/0 rc -19/-19 Apr 28 09:49:26 fir-md1-s1 kernel: Lustre: 31275:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_disconnect to node 10.0.10.52@o2ib7 failed: rc = -107 Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 32376:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0002-osp-MDT0000: can't disconnect: rc = -19 Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 32376:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0000-mdtlov: error cleaning up LOD index 2: cmd 0xcf031: rc = -19 Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 20321:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b4d94613000 x1631592116965712/t0(0) o41->fir-MDT0002-osp-MDT0000@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:49:26 fir-md1-s1 kernel: LustreError: 20321:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 5 previous similar messages Apr 28 09:49:29 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Apr 28 09:49:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:49:33 fir-md1-s1 kernel: Lustre: server umount fir-MDT0000 complete Apr 28 09:49:38 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:49:38 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 28 09:49:38 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:49:38 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:49:40 fir-md1-s1 kernel: LustreError: 23554:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.10.23@o2ib6 arrived at 1556470180 with bad export cookie 3999205254786542413 Apr 28 09:49:40 fir-md1-s1 kernel: LustreError: 23554:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages Apr 28 09:49:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:49:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:49:41 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 28 09:49:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 28 09:49:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:49:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 09:49:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:49:51 fir-md1-s1 kernel: LustreError: 32977:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b72353d4200 x1631596522371664/t0(0) o601->fir-MDT0000-lwp-OST002f_UUID@10.0.10.108@o2ib7:21/0 lens 336/0 e 0 to 0 dl 1556470221 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:49:51 fir-md1-s1 kernel: LustreError: 32977:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 44 previous similar messages Apr 28 09:50:37 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Apr 28 09:50:37 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (5): c: 0, oc: 0, rc: 8 Apr 28 09:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 1:41, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:51:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 4:26, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 09:51:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 09:51:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 1:56, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:54:32 fir-md1-s1 kernel: Lustre: 33270:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5309829800 x1631559030271408/t0(0) o101->02dfd968-e7b1-52cc-0db8-aa0d10c0832c@10.9.102.19@o2ib4:7/0 lens 584/3264 e 0 to 0 dl 1556470477 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:54:32 fir-md1-s1 kernel: Lustre: 33270:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 69 previous similar messages Apr 28 09:54:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0e359090-33c6-a6a0-317c-1b138ecd9b87 (at 10.9.105.16@o2ib4) Apr 28 09:54:38 fir-md1-s1 kernel: Lustre: Skipped 3440 previous similar messages Apr 28 09:54:58 fir-md1-s1 kernel: LNet: Service thread pid 33161 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:54:58 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 28 09:54:58 fir-md1-s1 kernel: Pid: 33161, comm: mdt02_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:58 fir-md1-s1 kernel: Call Trace: Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:54:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470498.33161 Apr 28 09:54:58 fir-md1-s1 kernel: Pid: 32625, comm: mdt02_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:58 fir-md1-s1 kernel: Call Trace: Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:54:58 fir-md1-s1 kernel: Pid: 33157, comm: mdt02_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:58 fir-md1-s1 kernel: Call Trace: Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:54:58 fir-md1-s1 kernel: Pid: 33218, comm: mdt02_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:58 fir-md1-s1 kernel: Call Trace: Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:54:58 fir-md1-s1 kernel: Pid: 32975, comm: mdt03_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:58 fir-md1-s1 kernel: Call Trace: Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 09:54:58 fir-md1-s1 kernel: LNet: Service thread pid 33159 was inactive for 201.01s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:54:58 fir-md1-s1 kernel: LNet: Skipped 56 previous similar messages Apr 28 09:55:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470503.33426 Apr 28 09:55:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470529.33414 Apr 28 09:55:37 fir-md1-s1 kernel: LustreError: 33112:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470447, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b4d016f18c0/0x378007fdaa8e3785 lrc: 3/0,1 mode: --/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 117 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 33112 timeout: 0 lvb_type: 0 Apr 28 09:55:37 fir-md1-s1 kernel: LustreError: 33112:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 69 previous similar messages Apr 28 09:56:37 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.72@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b72e37caac0/0x378007fda60b872c lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 117 type: IBT flags: 0x60200400000020 nid: 10.9.101.72@o2ib4 remote: 0x9ca54af26ab721f4 expref: 10 pid: 33063 timeout: 470431 lvb_type: 0 Apr 28 09:56:37 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 09:56:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d8fe43d-85f9-8061-e5fc-2e0ec8fbd940 (at 10.8.7.11@o2ib6) reconnecting Apr 28 09:56:42 fir-md1-s1 kernel: Lustre: Skipped 666 previous similar messages Apr 28 09:56:43 fir-md1-s1 kernel: LustreError: 33265:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b533a52bc00 ns: mdt-fir-MDT0002_UUID lock: ffff8b382110f2c0/0x378007fda60bd020 lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 141 type: IBT flags: 0x50200000000000 nid: 10.9.102.19@o2ib4 remote: 0x2acf83c3f6b8759f expref: 2 pid: 33265 timeout: 0 lvb_type: 0 Apr 28 09:56:43 fir-md1-s1 kernel: LustreError: 33265:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 45 previous similar messages Apr 28 09:56:43 fir-md1-s1 kernel: Lustre: 33265:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:152s); client may timeout. req@ffff8b33261e0900 x1631559030206048/t0(0) o101->02dfd968-e7b1-52cc-0db8-aa0d10c0832c@10.9.102.19@o2ib4:7/0 lens 568/2296 e 0 to 0 dl 1556470451 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 09:57:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.12.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:57:13 fir-md1-s1 kernel: LustreError: Skipped 3865 previous similar messages Apr 28 09:57:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470648.33266 Apr 28 09:58:24 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Apr 28 09:58:24 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (105): c: 7, oc: 0, rc: 8 Apr 28 09:59:12 fir-md1-s1 kernel: LNet: Service thread pid 33061 completed after 304.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:59:12 fir-md1-s1 kernel: LNet: Skipped 111 previous similar messages Apr 28 10:00:02 fir-md1-s1 kernel: Pid: 33250, comm: mdt03_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:00:02 fir-md1-s1 kernel: Call Trace: Apr 28 10:00:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:00:02 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:00:02 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:00:02 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:00:02 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:00:02 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:00:02 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:00:02 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:00:02 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:00:02 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:00:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470802.33250 Apr 28 10:00:02 fir-md1-s1 kernel: Pid: 33215, comm: mdt03_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:00:02 fir-md1-s1 kernel: Call Trace: Apr 28 10:00:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:00:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:00:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:00:03 fir-md1-s1 kernel: Pid: 33418, comm: mdt00_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:00:03 fir-md1-s1 kernel: Call Trace: Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:00:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:00:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:00:03 fir-md1-s1 kernel: Pid: 33221, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:00:03 fir-md1-s1 kernel: Call Trace: Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:00:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:00:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:00:03 fir-md1-s1 kernel: Pid: 33114, comm: mdt02_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:00:03 fir-md1-s1 kernel: Call Trace: Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:00:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:00:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:00:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:00:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470803.33223 Apr 28 10:00:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470808.33217 Apr 28 10:02:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556470958.33472 Apr 28 10:04:38 fir-md1-s1 kernel: Lustre: 33320:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5f9f26ef00 x1631731810281664/t0(0) o101->5db7ce18-3e24-dca8-3c1c-cbb3c3f8c6de@10.8.1.14@o2ib6:13/0 lens 568/0 e 0 to 0 dl 1556471083 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:04:38 fir-md1-s1 kernel: Lustre: 33320:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 61 previous similar messages Apr 28 10:04:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to be30aa96-53c3-af33-606a-9b13a14ea108 (at 10.8.1.14@o2ib6) Apr 28 10:04:44 fir-md1-s1 kernel: Lustre: Skipped 847 previous similar messages Apr 28 10:04:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471098.33004 Apr 28 10:05:13 fir-md1-s1 kernel: LNet: Service thread pid 33152 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:05:13 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 10:05:13 fir-md1-s1 kernel: Pid: 33152, comm: mdt00_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:05:13 fir-md1-s1 kernel: Call Trace: Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:05:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:05:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:05:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471113.33152 Apr 28 10:05:13 fir-md1-s1 kernel: Pid: 33659, comm: mdt03_040 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:05:13 fir-md1-s1 kernel: Call Trace: Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:05:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:05:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:05:13 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:05:43 fir-md1-s1 kernel: LustreError: 33692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471053, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b6191ae45c0/0x378007fdbe9cc89c lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x20/0x0 rrc: 148 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 33692 timeout: 0 lvb_type: 0 Apr 28 10:05:43 fir-md1-s1 kernel: LustreError: 33692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 59 previous similar messages Apr 28 10:06:42 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.1.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b520336c140/0x378007fdaa8ea586 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 148 type: IBT flags: 0x60200400000020 nid: 10.8.1.14@o2ib6 remote: 0x6727fcfddb1dd554 expref: 158 pid: 33266 timeout: 471036 lvb_type: 0 Apr 28 10:06:42 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Apr 28 10:06:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d8fe43d-85f9-8061-e5fc-2e0ec8fbd940 (at 10.8.7.11@o2ib6) reconnecting Apr 28 10:06:48 fir-md1-s1 kernel: Lustre: Skipped 831 previous similar messages Apr 28 10:07:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 980c53c1-d60f-2717-9259-d8f7cc6e1f79 (at 10.8.13.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b53e967f400, cur 1556471228 expire 1556471078 last 1556471001 Apr 28 10:07:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 10:07:33 fir-md1-s1 kernel: Pid: 33692, comm: mdt02_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:07:33 fir-md1-s1 kernel: Call Trace: Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:07:33 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:07:33 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:07:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471253.33692 Apr 28 10:07:33 fir-md1-s1 kernel: Pid: 33162, comm: mdt00_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:07:33 fir-md1-s1 kernel: Call Trace: Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:07:33 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:07:34 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:07:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:07:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:07:34 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:07:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:07:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:07:34 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:07:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471254.33162 Apr 28 10:07:45 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0002 Apr 28 10:07:45 fir-md1-s1 kernel: LustreError: 33404:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4c8a1f5000 ns: mdt-fir-MDT0002_UUID lock: ffff8b5098ac4800/0x378007fdaf24d5fe lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 136 type: IBT flags: 0x50200400000020 nid: 10.8.8.7@o2ib6 remote: 0x33085ce11472a356 expref: 5 pid: 33404 timeout: 0 lvb_type: 0 Apr 28 10:07:45 fir-md1-s1 kernel: Lustre: 33381:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:664s); client may timeout. req@ffff8b42bec08900 x1631535210813792/t0(0) o101->17f43594-ae2e-f7ad-12ba-21540c4255a2@10.9.101.72@o2ib4:7/0 lens 568/2296 e 0 to 0 dl 1556470601 ref 1 fl Complete:/0/0 rc -19/-19 Apr 28 10:07:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.8.7@o2ib6 (stopping) Apr 28 10:07:45 fir-md1-s1 kernel: Lustre: Skipped 769 previous similar messages Apr 28 10:07:45 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_disconnect to node 10.0.10.52@o2ib7 failed: rc = -107 Apr 28 10:07:45 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Apr 28 10:07:45 fir-md1-s1 kernel: LustreError: 34240:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0002-osp-MDT0000: can't disconnect: rc = -19 Apr 28 10:07:46 fir-md1-s1 kernel: LustreError: 34240:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0000-mdtlov: error cleaning up LOD index 2: cmd 0xcf031: rc = -19 Apr 28 10:07:49 fir-md1-s1 kernel: LustreError: 20307:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b37c566d400 x1631592120475984/t0(0) o41->fir-MDT0002-osp-MDT0000@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 10:07:49 fir-md1-s1 kernel: LustreError: 20307:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Apr 28 10:07:50 fir-md1-s1 kernel: LustreError: 20316:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b3890ed9500 x1631592120476688/t0(0) o41->fir-MDT0003-osp-MDT0002@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 10:07:50 fir-md1-s1 kernel: LustreError: 20313:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b5212550900 x1631592120476672/t0(0) o41->fir-MDT0003-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 10:07:50 fir-md1-s1 kernel: LustreError: 20313:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Apr 28 10:07:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.1.32@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:07:51 fir-md1-s1 kernel: LustreError: Skipped 31 previous similar messages Apr 28 10:07:51 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Apr 28 10:07:53 fir-md1-s1 kernel: Lustre: server umount fir-MDT0000 complete Apr 28 10:07:58 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 10:07:58 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 28 10:07:58 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:07:58 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:08:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 10:08:01 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 28 10:08:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 28 10:08:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:08:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1326 clients reconnect Apr 28 10:08:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:08:06 fir-md1-s1 kernel: LustreError: 34499:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b5649af0600 x1631596472078864/t0(0) o601->fir-MDT0000-lwp-OST001b_UUID@10.0.10.106@o2ib7:6/0 lens 336/0 e 0 to 0 dl 1556471316 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:08:06 fir-md1-s1 kernel: LustreError: 34499:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 23 previous similar messages Apr 28 10:08:57 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Apr 28 10:08:57 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (5): c: 0, oc: 0, rc: 8 Apr 28 10:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 1:41, of 1327 clients 1327 recovered and 0 were evicted. Apr 28 10:09:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Denying connection for new client 33aa3088-5e27-e4ab-6112-0fe513b018fa(at 10.8.13.20@o2ib6), waiting for 1327 known clients (1073 recovered, 253 in progress, and 0 evicted) already passed deadline 4:20 Apr 28 10:09:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 10:09:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 4:25, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 10:09:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Apr 28 10:09:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 1:56, of 1327 clients 1327 recovered and 0 were evicted. Apr 28 10:13:17 fir-md1-s1 kernel: Pid: 34493, comm: mdt03_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:13:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471597.34493 Apr 28 10:13:17 fir-md1-s1 kernel: Pid: 35270, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:13:17 fir-md1-s1 kernel: Pid: 35303, comm: mdt03_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:13:17 fir-md1-s1 kernel: Pid: 35336, comm: mdt02_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:13:17 fir-md1-s1 kernel: Pid: 34891, comm: mdt02_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:17 fir-md1-s1 kernel: Call Trace: Apr 28 10:13:18 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:18 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:18 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:13:18 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:13:18 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:13:18 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:18 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:18 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:13:18 fir-md1-s1 kernel: LNet: Service thread pid 35053 was inactive for 200.59s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:13:18 fir-md1-s1 kernel: LNet: Skipped 97 previous similar messages Apr 28 10:13:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471602.34884 Apr 28 10:13:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471603.34918 Apr 28 10:13:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471604.35038 Apr 28 10:15:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 9d0e62c0-e368-6db8-c860-d1e71d1366bc (at 10.8.17.11@o2ib6) Apr 28 10:15:07 fir-md1-s1 kernel: Lustre: Skipped 3673 previous similar messages Apr 28 10:15:08 fir-md1-s1 kernel: LNet: Service thread pid 35303 completed after 310.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:15:08 fir-md1-s1 kernel: LNet: Skipped 62 previous similar messages Apr 28 10:15:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 33aa3088-5e27-e4ab-6112-0fe513b018fa (at 10.8.13.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b43380dec00, cur 1556471740 expire 1556471590 last 1556471513 Apr 28 10:15:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 10:15:42 fir-md1-s1 kernel: Lustre: 35336:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b62a0dd6600 x1631738119794720/t0(0) o101->25512127-e6de-b60b-cf78-f84b6ec57480@10.8.21.14@o2ib6:17/0 lens 568/0 e 0 to 0 dl 1556471747 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:15:42 fir-md1-s1 kernel: Lustre: 35336:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 130 previous similar messages Apr 28 10:15:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33aa3088-5e27-e4ab-6112-0fe513b018fa (at 10.8.13.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b58fbe11c00, cur 1556471746 expire 1556471596 last 1556471519 Apr 28 10:15:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471753.34894 Apr 28 10:15:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471759.35289 Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5012ae7980/0x378007fdc6c24f96 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 143 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 35049 timeout: 0 lvb_type: 0 Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 67 previous similar messages Apr 28 10:16:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f6954b56-2fa3-efc8-91e3-ac0b90b6f4ac (at 10.9.109.7@o2ib4) reconnecting Apr 28 10:16:50 fir-md1-s1 kernel: Lustre: Skipped 879 previous similar messages Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b51f73e72c0/0x378007fdc5f51cf6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 143 type: IBT flags: 0x60200400000020 nid: 10.8.7.17@o2ib6 remote: 0x1b8755abd081ffc4 expref: 10 pid: 34487 timeout: 471701 lvb_type: 0 Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 10:17:47 fir-md1-s1 kernel: Lustre: 34884:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (464:1s); client may timeout. req@ffff8b4bb86fc800 x1631558509993360/t0(0) o101->d3662f1b-7a4f-44b7-deca-a784cfed408b@10.9.108.62@o2ib4:2/0 lens 480/536 e 0 to 0 dl 1556471866 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 10:17:47 fir-md1-s1 kernel: Lustre: 34884:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Apr 28 10:17:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.27.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:17:57 fir-md1-s1 kernel: LustreError: Skipped 3201 previous similar messages Apr 28 10:18:38 fir-md1-s1 kernel: LNet: Service thread pid 35270 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:18:38 fir-md1-s1 kernel: LNet: Skipped 8 previous similar messages Apr 28 10:18:38 fir-md1-s1 kernel: Pid: 35270, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:38 fir-md1-s1 kernel: Call Trace: Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:18:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556471918.35270 Apr 28 10:18:38 fir-md1-s1 kernel: Pid: 35496, comm: mdt01_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:38 fir-md1-s1 kernel: Call Trace: Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:18:38 fir-md1-s1 kernel: Pid: 34876, comm: mdt01_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:38 fir-md1-s1 kernel: Call Trace: Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:18:38 fir-md1-s1 kernel: Pid: 35049, comm: mdt01_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:38 fir-md1-s1 kernel: Call Trace: Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:18:38 fir-md1-s1 kernel: Pid: 35493, comm: mdt01_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:38 fir-md1-s1 kernel: Call Trace: Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:21:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472073.35331 Apr 28 10:21:46 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0000 Apr 28 10:21:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 28 10:21:46 fir-md1-s1 kernel: LustreError: 35273:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b62dfa03400 ns: mdt-fir-MDT0002_UUID lock: ffff8b7222ff9d40/0x378007fdc5f5608e lrc: 3/0,0 mode: PR/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x1b/0x0 rrc: 131 type: IBT flags: 0x50200400000020 nid: 10.9.107.32@o2ib4 remote: 0x555773f6555a8411 expref: 5 pid: 35273 timeout: 0 lvb_type: 0 Apr 28 10:21:46 fir-md1-s1 kernel: LustreError: 35273:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 52 previous similar messages Apr 28 10:21:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.107.32@o2ib4 (stopping) Apr 28 10:21:46 fir-md1-s1 kernel: Lustre: Skipped 493 previous similar messages Apr 28 10:21:46 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_disconnect to node 10.0.10.52@o2ib7 failed: rc = -107 Apr 28 10:21:46 fir-md1-s1 kernel: LustreError: Skipped 6 previous similar messages Apr 28 10:21:46 fir-md1-s1 kernel: LustreError: 36629:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0002-osp-MDT0000: can't disconnect: rc = -19 Apr 28 10:21:46 fir-md1-s1 kernel: LustreError: 36629:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0000-mdtlov: error cleaning up LOD index 2: cmd 0xcf031: rc = -19 Apr 28 10:21:48 fir-md1-s1 kernel: LustreError: 20328:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b6252b3ad00 x1631592122747088/t0(0) o41->fir-MDT0001-osp-MDT0002@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 10:21:48 fir-md1-s1 kernel: LustreError: 20328:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 28 10:21:48 fir-md1-s1 kernel: LustreError: 20333:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b5a4ba8f200 x1631592122747856/t0(0) o41->fir-MDT0000-osp-MDT0002@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 10:21:48 fir-md1-s1 kernel: LustreError: 20333:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 28 10:21:50 fir-md1-s1 kernel: LustreError: 21024:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.61@o2ib4 arrived at 1556472110 with bad export cookie 3999205255633529414 Apr 28 10:21:50 fir-md1-s1 kernel: LustreError: 21024:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 2 previous similar messages Apr 28 10:21:51 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Apr 28 10:21:53 fir-md1-s1 kernel: Lustre: server umount fir-MDT0000 complete Apr 28 10:22:02 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 10:22:02 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 28 10:22:02 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:22:02 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 10:22:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:22:05 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 28 10:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 28 10:22:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:22:06 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 10:22:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 28 10:22:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:22:06 fir-md1-s1 kernel: LustreError: Skipped 5 previous similar messages Apr 28 10:22:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 10:22:09 fir-md1-s1 kernel: LustreError: 53867:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.106.27@o2ib4 arrived at 1556472129 with bad export cookie 3999205255633525158 Apr 28 10:22:09 fir-md1-s1 kernel: LustreError: 53867:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 14 previous similar messages Apr 28 10:22:21 fir-md1-s1 kernel: LustreError: 36894:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b3f8032e900 x1631596601094976/t0(0) o601->fir-MDT0000-lwp-OST0007_UUID@10.0.10.102@o2ib7:21/0 lens 336/0 e 0 to 0 dl 1556472171 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:22:21 fir-md1-s1 kernel: LustreError: 36894:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 18 previous similar messages Apr 28 10:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 3:27, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 10:23:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 0:57, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 10:23:03 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Apr 28 10:25:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2cde83f6-2de4-32c9-c63f-55f47cbe66e9 (at 10.9.105.6@o2ib4) Apr 28 10:25:07 fir-md1-s1 kernel: Lustre: Skipped 3420 previous similar messages Apr 28 10:26:03 fir-md1-s1 kernel: Lustre: 37539:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5c9e672700 x1631549069565904/t0(0) o101->6137bba0-34c0-9107-d068-27095ef10964@10.8.22.23@o2ib6:8/0 lens 584/3264 e 0 to 0 dl 1556472368 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 10:26:03 fir-md1-s1 kernel: Lustre: 37539:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 63 previous similar messages Apr 28 10:26:23 fir-md1-s1 kernel: Pid: 37508, comm: mdt02_046 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:23 fir-md1-s1 kernel: Call Trace: Apr 28 10:26:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:23 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:23 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:23 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:26:23 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:26:23 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:26:23 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:23 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:23 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:23 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472384.37508 Apr 28 10:26:24 fir-md1-s1 kernel: Pid: 37439, comm: mdt02_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s1 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s1 kernel: Pid: 37488, comm: mdt02_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s1 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s1 kernel: Pid: 37491, comm: mdt02_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s1 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s1 kernel: Pid: 37485, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s1 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s1 kernel: LNet: Service thread pid 37306 was inactive for 200.47s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:26:24 fir-md1-s1 kernel: LNet: Skipped 60 previous similar messages Apr 28 10:26:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472391.37728 Apr 28 10:27:03 fir-md1-s1 kernel: LustreError: 37527:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472333, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5d92628240/0x378007fdd309dee5 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 107 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 37527 timeout: 0 lvb_type: 0 Apr 28 10:27:03 fir-md1-s1 kernel: LustreError: 37527:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 61 previous similar messages Apr 28 10:27:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5422af31-e043-e74f-91b8-ea281f34d204 (at 10.8.26.16@o2ib6) reconnecting Apr 28 10:27:11 fir-md1-s1 kernel: Lustre: Skipped 655 previous similar messages Apr 28 10:28:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.21.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b62c0775580/0x378007fdd2418456 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 107 type: IBT flags: 0x60200400000020 nid: 10.8.21.14@o2ib6 remote: 0x73f1e31c4de5fe78 expref: 18 pid: 37479 timeout: 472323 lvb_type: 0 Apr 28 10:28:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 10:28:09 fir-md1-s1 kernel: LNet: Service thread pid 37485 completed after 305.56s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:28:09 fir-md1-s1 kernel: LNet: Skipped 70 previous similar messages Apr 28 10:28:15 fir-md1-s1 kernel: Lustre: 37723:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:158s); client may timeout. req@ffff8b426f876600 x1631549069534624/t0(0) o101->6137bba0-34c0-9107-d068-27095ef10964@10.8.22.23@o2ib6:3/0 lens 568/2296 e 0 to 0 dl 1556472337 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 10:28:15 fir-md1-s1 kernel: Lustre: 37723:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 10:28:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:28:34 fir-md1-s1 kernel: LustreError: Skipped 658 previous similar messages Apr 28 10:31:36 fir-md1-s1 kernel: LNet: Service thread pid 37513 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:31:36 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Apr 28 10:31:36 fir-md1-s1 kernel: Pid: 37513, comm: mdt02_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:36 fir-md1-s1 kernel: Call Trace: Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:31:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472696.37513 Apr 28 10:31:36 fir-md1-s1 kernel: Pid: 37728, comm: mdt02_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:36 fir-md1-s1 kernel: Call Trace: Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:31:36 fir-md1-s1 kernel: Pid: 36886, comm: mdt01_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:36 fir-md1-s1 kernel: Call Trace: Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:31:36 fir-md1-s1 kernel: Pid: 37527, comm: mdt02_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:36 fir-md1-s1 kernel: Call Trace: Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:31:36 fir-md1-s1 kernel: Pid: 37487, comm: mdt01_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:36 fir-md1-s1 kernel: Call Trace: Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 28 10:32:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472756.37545 Apr 28 10:33:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472804.37429 Apr 28 10:34:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556472851.37436