Apr 26 11:08:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 11:11:09 fir-md1-s2 kernel: Lustre: 99509:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:24:26 fir-md1-s2 kernel: Lustre: 100437:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:24:26 fir-md1-s2 kernel: Lustre: 100437:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Apr 26 11:31:17 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:31:17 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Apr 26 11:34:59 fir-md1-s2 kernel: Lustre: 99509:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:34:59 fir-md1-s2 kernel: Lustre: 99509:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Apr 26 11:35:58 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:35:58 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Apr 26 11:37:16 fir-md1-s2 kernel: Lustre: 100214:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:37:16 fir-md1-s2 kernel: Lustre: 100214:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Apr 26 11:38:14 fir-md1-s2 kernel: Lustre: 100214:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:38:14 fir-md1-s2 kernel: Lustre: 100214:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Apr 26 11:38:46 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:38:46 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages Apr 26 11:48:45 fir-md1-s2 kernel: Lustre: 100357:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:48:45 fir-md1-s2 kernel: Lustre: 100357:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Apr 26 11:51:18 fir-md1-s2 kernel: Lustre: 100213:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 11:51:18 fir-md1-s2 kernel: Lustre: 100213:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Apr 26 12:08:06 fir-md1-s2 kernel: Lustre: 100141:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 12:08:06 fir-md1-s2 kernel: Lustre: 100141:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Apr 26 12:10:37 fir-md1-s2 kernel: Lustre: 99547:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 12:14:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c41887d8-667a-fcc3-3801-53e405eea2a0 (at 10.8.30.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98798a6a7400, cur 1556306050 expire 1556305900 last 1556305823 Apr 26 12:14:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 12:14:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to c41887d8-667a-fcc3-3801-53e405eea2a0 (at 10.8.30.34@o2ib6) Apr 26 12:14:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 12:14:54 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Apr 26 12:14:54 fir-md1-s2 kernel: Lustre: 100466:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Apr 26 12:22:58 fir-md1-s2 kernel: LNetError: 98921:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 26 12:23:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 78ab2c22-394d-bdd4-0b8e-3553d6a47e28 (at 10.8.17.2@o2ib6) reconnecting Apr 26 12:23:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 78ab2c22-394d-bdd4-0b8e-3553d6a47e28 (at 10.8.17.2@o2ib6) Apr 26 12:26:28 fir-md1-s2 kernel: LNetError: 98919:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Apr 26 12:26:28 fir-md1-s2 kernel: LNetError: 98919:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (7): c: 4, oc: 0, rc: 8 Apr 26 12:26:28 fir-md1-s2 kernel: Lustre: 100425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306782/real 1556306788] req@ffff988b214b2700 x1631885367955696/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306789 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Apr 26 12:26:28 fir-md1-s2 kernel: Lustre: 100425:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 306 previous similar messages Apr 26 12:26:29 fir-md1-s2 kernel: Lustre: 100425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306789/real 1556306789] req@ffff988b214b2700 x1631885367955696/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306796 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:29 fir-md1-s2 kernel: Lustre: 100425:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7756 previous similar messages Apr 26 12:26:30 fir-md1-s2 kernel: Lustre: 99522:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306790/real 1556306790] req@ffff9867e2fea400 x1631885368035728/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306797 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:30 fir-md1-s2 kernel: Lustre: 99522:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15549 previous similar messages Apr 26 12:26:32 fir-md1-s2 kernel: Lustre: 99522:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306792/real 1556306792] req@ffff9867e2fea400 x1631885368035728/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306799 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:32 fir-md1-s2 kernel: Lustre: 99522:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 31593 previous similar messages Apr 26 12:26:36 fir-md1-s2 kernel: Lustre: 99158:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306796/real 1556306796] req@ffff984f987d1800 x1631885368101776/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306803 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:36 fir-md1-s2 kernel: Lustre: 99158:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 81946 previous similar messages Apr 26 12:26:37 fir-md1-s2 kernel: Lustre: 100490:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff988a82273600 x1631588137131600/t0(0) o101->90d81c86-5db8-d29f-71be-9c3030e109bc@10.9.102.49@o2ib4:12/0 lens 480/568 e 1 to 0 dl 1556306802 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:26:38 fir-md1-s2 kernel: Lustre: 100187:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98502ce68000 x1631532741432784/t0(0) o101->3f0e14d1-f8f7-87d0-dc33-650c4734f9c2@10.9.104.10@o2ib4:13/0 lens 376/1600 e 1 to 0 dl 1556306803 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:26:43 fir-md1-s2 kernel: Lustre: 100300:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff986a8f0b0c00 x1631596147305200/t0(0) o101->d3e22dd2-d25d-28e8-5f86-5d27043eaa8d@10.8.7.18@o2ib6:18/0 lens 480/568 e 1 to 0 dl 1556306808 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:26:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 90d81c86-5db8-d29f-71be-9c3030e109bc (at 10.9.102.49@o2ib4) reconnecting Apr 26 12:26:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 90d81c86-5db8-d29f-71be-9c3030e109bc (at 10.9.102.49@o2ib4) Apr 26 12:26:44 fir-md1-s2 kernel: Lustre: 100448:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306804/real 1556306804] req@ffff9859bf774b00 x1631885368144032/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306811 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:44 fir-md1-s2 kernel: Lustre: 100448:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 254150 previous similar messages Apr 26 12:26:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d3e22dd2-d25d-28e8-5f86-5d27043eaa8d (at 10.8.7.18@o2ib6) reconnecting Apr 26 12:26:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 12:26:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.7.18@o2ib6) Apr 26 12:26:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 12:26:53 fir-md1-s2 kernel: LustreError: 100425:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff988b214b2700 x1631885367955696 status 0 rc -110), evict it ns: mdt-fir-MDT0003_UUID lock: ffff985aa8702640/0x4f3cef65e16dc8a2 lrc: 4/0,0 mode: PR/PR res: [0x28001b6c4:0x9a:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b2935fbb1 expref: 1984118 pid: 100415 timeout: 306236 lvb_type: 0 Apr 26 12:26:53 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 Apr 26 12:26:53 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 31s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff985aa8702640/0x4f3cef65e16dc8a2 lrc: 3/0,0 mode: PR/PR res: [0x28001b6c4:0x9a:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b2935fbb1 expref: 1984119 pid: 100415 timeout: 0 lvb_type: 0 Apr 26 12:26:53 fir-md1-s2 kernel: LustreError: 100172:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff985954f2f800 x1631885368408896/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:26:54 fir-md1-s2 kernel: LustreError: 99402:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff987a813a1500 x1631885368425280/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:26:54 fir-md1-s2 kernel: LustreError: 99402:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Apr 26 12:27:00 fir-md1-s2 kernel: LustreError: 100020:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9872cc240f00 x1631885368487232/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:00 fir-md1-s2 kernel: LustreError: 100020:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Apr 26 12:27:04 fir-md1-s2 kernel: LustreError: 100228:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff988826f96f00 x1631885368553984/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:04 fir-md1-s2 kernel: LustreError: 100228:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 26 12:27:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 90d81c86-5db8-d29f-71be-9c3030e109bc (at 10.9.102.49@o2ib4) reconnecting Apr 26 12:27:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 90d81c86-5db8-d29f-71be-9c3030e109bc (at 10.9.102.49@o2ib4) Apr 26 12:27:11 fir-md1-s2 kernel: LustreError: 100198:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff986ab1834e00 x1631885368648752/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:11 fir-md1-s2 kernel: LustreError: 100198:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Apr 26 12:27:18 fir-md1-s2 kernel: Lustre: 100403:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff988a4ea20c00 x1631535279942864/t270596782450(0) o36->b5280270-3b22-224e-0daa-bad5776be543@10.9.103.24@o2ib4:23/0 lens 488/3152 e 0 to 0 dl 1556306843 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:27:21 fir-md1-s2 kernel: LustreError: 100466:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff985ae06dd400 x1631885368778128/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:23 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff985ad2f557c0/0x4f3cef65c8d97a8f lrc: 3/0,0 mode: PR/PR res: [0x28001a57e:0x231b:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b24220451 expref: 1645176 pid: 100213 timeout: 306237 lvb_type: 0 Apr 26 12:27:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b5280270-3b22-224e-0daa-bad5776be543 (at 10.9.103.24@o2ib4) reconnecting Apr 26 12:27:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.103.24@o2ib4) Apr 26 12:27:26 fir-md1-s2 kernel: Lustre: 99921:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff987a9f8c1e00 x1631293233408080/t0(0) o101->766c6e9e-6589-78e8-fb69-8836dc850825@10.8.28.2@o2ib6:0/0 lens 480/568 e 0 to 0 dl 1556306850 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:27:26 fir-md1-s2 kernel: Lustre: 99921:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 26 12:27:29 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff9868c5207500/0x4f3cef65e0462566 lrc: 3/0,0 mode: PR/PR res: [0x2800166fc:0xd6:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x60000400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b28ec57e6 expref: 1612249 pid: 99158 timeout: 306243 lvb_type: 0 Apr 26 12:27:29 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Apr 26 12:27:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting Apr 26 12:27:32 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 26 12:27:34 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff98545d68ec00/0x4f3cef65e225471e lrc: 3/0,0 mode: PR/PR res: [0x28001b767:0x25:0x0].0x0 bits 0x1b/0x0 rrc: 14 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b295c3548 expref: 1586794 pid: 99158 timeout: 306248 lvb_type: 0 Apr 26 12:27:36 fir-md1-s2 kernel: Lustre: 100113:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9869d7353600 x1631642407568800/t0(0) o101->8290d58b-0905-6161-be47-84efd8d09138@10.9.108.18@o2ib4:11/0 lens 376/1600 e 0 to 0 dl 1556306861 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:27:36 fir-md1-s2 kernel: Lustre: 100113:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 26 12:27:37 fir-md1-s2 kernel: LustreError: 100259:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff987362e2f200 x1631885368992128/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:37 fir-md1-s2 kernel: LustreError: 100259:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 5 previous similar messages Apr 26 12:27:41 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff9868a4afd340/0x4f3cef65e2b3e1d7 lrc: 3/0,0 mode: PR/PR res: [0x28001b608:0x32:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b2979c513 expref: 1553661 pid: 99158 timeout: 306255 lvb_type: 0 Apr 26 12:27:41 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 26 12:27:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.18@o2ib4) Apr 26 12:27:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 26 12:27:50 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff985b126fc5c0/0x4f3cef65e27daeb8 lrc: 3/0,0 mode: PR/PR res: [0x28001b54b:0x30e6:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b296de4b2 expref: 1514410 pid: 100415 timeout: 306264 lvb_type: 0 Apr 26 12:27:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d3ad20a5-81a2-915a-58a3-1542c85784cf (at 10.9.107.53@o2ib4) reconnecting Apr 26 12:27:52 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 26 12:27:53 fir-md1-s2 kernel: Lustre: 100172:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff985ac5afc800 x1631641796829440/t0(0) o36->e0e6d63f-0238-284f-ef41-faf2bb976ece@10.9.108.52@o2ib4:28/0 lens 496/448 e 0 to 0 dl 1556306878 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:27:53 fir-md1-s2 kernel: Lustre: 100172:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 26 12:28:06 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff98699f35c5c0/0x4f3cef65e1986edb lrc: 3/0,0 mode: PR/PR res: [0x28001a6a3:0x1c:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b293de440 expref: 1451654 pid: 99547 timeout: 306280 lvb_type: 0 Apr 26 12:28:06 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Apr 26 12:28:12 fir-md1-s2 kernel: LustreError: 99386:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff986ab1834b00 x1631885369446256/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:28:12 fir-md1-s2 kernel: LustreError: 99386:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 12 previous similar messages Apr 26 12:28:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to c4a32940-a9be-512a-496b-f65411562f7a (at 10.9.106.43@o2ib4) Apr 26 12:28:15 fir-md1-s2 kernel: Lustre: Skipped 16 previous similar messages Apr 26 12:28:23 fir-md1-s2 kernel: LustreError: 100425:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306813, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff988863715c40/0x4f3cef65e51250b0 lrc: 3/0,1 mode: --/PW res: [0x28001b6c4:0x9a:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100425 timeout: 0 lvb_type: 0 Apr 26 12:28:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556306903.100425 Apr 26 12:28:25 fir-md1-s2 kernel: LustreError: 100186:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306814, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986a007eb3c0/0x4f3cef65e52a5160 lrc: 3/0,1 mode: --/EX res: [0x28001ad52:0x203:0x0].0x0 bits 0x3/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100186 timeout: 0 lvb_type: 0 Apr 26 12:28:25 fir-md1-s2 kernel: LustreError: 100186:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 26 12:28:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f4957443-90e7-a7b5-e3db-c45f8726f1c2 (at 10.9.102.13@o2ib4) reconnecting Apr 26 12:28:26 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Apr 26 12:28:27 fir-md1-s2 kernel: Lustre: 100107:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986abd912400 x1631534643110416/t0(0) o101->4fd3697b-8ac3-d03c-d547-c2a2aae5b292@10.8.28.8@o2ib6:2/0 lens 1784/3288 e 0 to 0 dl 1556306912 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:28:27 fir-md1-s2 kernel: Lustre: 100107:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Apr 26 12:28:30 fir-md1-s2 kernel: LustreError: 100020:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306820, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9878eda45100/0x4f3cef65e52df428 lrc: 3/0,1 mode: --/PW res: [0x2800166fc:0xd6:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100020 timeout: 0 lvb_type: 0 Apr 26 12:28:33 fir-md1-s2 kernel: LustreError: 100439:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306823, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985ab92c5e80/0x4f3cef65e530fd6e lrc: 3/1,0 mode: --/PR res: [0x28001a57e:0x4bf8:0x0].0x0 bits 0x13/0x0 rrc: 11 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100439 timeout: 0 lvb_type: 0 Apr 26 12:28:33 fir-md1-s2 kernel: LustreError: 100439:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 26 12:28:41 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff98649bb3b3c0/0x4f3cef65e2b232a1 lrc: 3/0,0 mode: PR/PR res: [0x28001b6b8:0x4a:0x0].0x0 bits 0x12/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b297976fb expref: 1335151 pid: 100466 timeout: 306315 lvb_type: 0 Apr 26 12:28:41 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 12 previous similar messages Apr 26 12:28:41 fir-md1-s2 kernel: LustreError: 100198:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306831, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986aa371d7c0/0x4f3cef65e537204d lrc: 3/0,1 mode: --/EX res: [0x28001b608:0x32:0x0].0x0 bits 0x8/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100198 timeout: 0 lvb_type: 0 Apr 26 12:28:41 fir-md1-s2 kernel: LustreError: 100198:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 26 12:28:51 fir-md1-s2 kernel: LustreError: 100466:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306841, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985112fe2880/0x4f3cef65e53dbc24 lrc: 3/0,1 mode: --/CW res: [0x28001b54b:0x30e6:0x0].0x0 bits 0x2/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100466 timeout: 0 lvb_type: 0 Apr 26 12:29:07 fir-md1-s2 kernel: LustreError: 100259:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306857, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9872e7f08fc0/0x4f3cef65e5495590 lrc: 3/0,1 mode: --/PW res: [0x28001a6a3:0x1c:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100259 timeout: 0 lvb_type: 0 Apr 26 12:29:07 fir-md1-s2 kernel: LustreError: 100259:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 26 12:29:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.72@o2ib4) Apr 26 12:29:20 fir-md1-s2 kernel: Lustre: Skipped 43 previous similar messages Apr 26 12:29:24 fir-md1-s2 kernel: LustreError: 100507:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff98882f7fa100 x1631885370066432/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:29:24 fir-md1-s2 kernel: LustreError: 100507:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 13 previous similar messages Apr 26 12:29:31 fir-md1-s2 kernel: Lustre: 99921:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-9), not sending early reply req@ffff987a75d6dd00 x1631526800400224/t0(0) o101->0b49eccd-cda4-7bac-8560-4f28415786a3@10.9.0.62@o2ib4:6/0 lens 576/3264 e 0 to 0 dl 1556306976 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:29:31 fir-md1-s2 kernel: Lustre: 99921:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Apr 26 12:29:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 90d81c86-5db8-d29f-71be-9c3030e109bc (at 10.9.102.49@o2ib4) reconnecting Apr 26 12:29:32 fir-md1-s2 kernel: Lustre: Skipped 49 previous similar messages Apr 26 12:29:42 fir-md1-s2 kernel: LustreError: 99386:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306892, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9869862bbcc0/0x4f3cef65e562724c lrc: 3/0,1 mode: --/EX res: [0x28001b6b8:0x4a:0x0].0x0 bits 0x3/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 99386 timeout: 0 lvb_type: 0 Apr 26 12:29:42 fir-md1-s2 kernel: LustreError: 99386:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Apr 26 12:29:43 fir-md1-s2 kernel: LNet: Service thread pid 100425 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:29:43 fir-md1-s2 kernel: Pid: 100425, comm: mdt03_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:29:43 fir-md1-s2 kernel: Call Trace: Apr 26 12:29:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:29:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:29:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:29:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:29:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:29:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:29:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:29:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:29:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:29:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:29:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556306983.100425 Apr 26 12:29:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client bc889374-b0ed-2371-0c2c-d84fc0dd852e (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98798a6a3800, cur 1556306993 expire 1556306843 last 1556306766 Apr 26 12:29:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 12:29:53 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff98545ee0f980/0x4f3cef65e2d63195 lrc: 3/0,0 mode: PR/PR res: [0x28001a6d5:0x3de:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b2981fe29 expref: 1144786 pid: 100415 timeout: 306387 lvb_type: 0 Apr 26 12:29:53 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 12 previous similar messages Apr 26 12:30:13 fir-md1-s2 kernel: LNet: Service thread pid 100166 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:30:13 fir-md1-s2 kernel: Pid: 100166, comm: mdt03_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:13 fir-md1-s2 kernel: Call Trace: Apr 26 12:30:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 26 12:30:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:30:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307013.100166 Apr 26 12:30:15 fir-md1-s2 kernel: LNet: Service thread pid 100186 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:30:15 fir-md1-s2 kernel: Pid: 100186, comm: mdt01_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:15 fir-md1-s2 kernel: Call Trace: Apr 26 12:30:15 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:15 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 26 12:30:15 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:15 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:15 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:15 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:15 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:15 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:30:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307015.100186 Apr 26 12:30:20 fir-md1-s2 kernel: LNet: Service thread pid 100020 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:30:20 fir-md1-s2 kernel: Pid: 100020, comm: mdt02_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:20 fir-md1-s2 kernel: Call Trace: Apr 26 12:30:20 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:20 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:20 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:30:20 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:30:20 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:30:20 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:20 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:20 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:20 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:20 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:30:20 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307020.100020 Apr 26 12:30:22 fir-md1-s2 kernel: Pid: 99402, comm: mdt02_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:22 fir-md1-s2 kernel: Call Trace: Apr 26 12:30:22 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:22 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:22 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Apr 26 12:30:22 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:30:22 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:22 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:22 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:22 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:22 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:30:22 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307022.99402 Apr 26 12:30:23 fir-md1-s2 kernel: LNet: Service thread pid 100439 was inactive for 200.49s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307023.100439 Apr 26 12:30:25 fir-md1-s2 kernel: LNet: Service thread pid 100228 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307025.100228 Apr 26 12:30:27 fir-md1-s2 kernel: LNet: Service thread pid 100425 completed after 244.42s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:30:32 fir-md1-s2 kernel: LNet: Service thread pid 100198 was inactive for 200.50s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:32 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307032.100198 Apr 26 12:30:41 fir-md1-s2 kernel: LNet: Service thread pid 100466 was inactive for 200.20s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:41 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307041.100466 Apr 26 12:30:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307044.99920 Apr 26 12:30:54 fir-md1-s2 kernel: LustreError: 100507:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306964, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff988992ab4a40/0x4f3cef65e582ebf2 lrc: 3/0,1 mode: --/EX res: [0x28001a6d5:0x3de:0x0].0x0 bits 0x8/0x0 rrc: 7 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100507 timeout: 0 lvb_type: 0 Apr 26 12:30:54 fir-md1-s2 kernel: LustreError: 100507:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Apr 26 12:30:57 fir-md1-s2 kernel: LNet: Service thread pid 100259 was inactive for 200.54s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:57 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 26 12:30:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307057.100259 Apr 26 12:31:00 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307060.99895 Apr 26 12:31:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307062.99974 Apr 26 12:31:04 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307064.100200 Apr 26 12:31:07 fir-md1-s2 kernel: LNet: Service thread pid 100129 was inactive for 200.02s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:31:07 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 26 12:31:07 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307067.100129 Apr 26 12:31:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307076.99165 Apr 26 12:31:22 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307082.99383 Apr 26 12:31:23 fir-md1-s2 kernel: LNet: Service thread pid 100308 was inactive for 200.69s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:31:23 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Apr 26 12:31:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307083.100308 Apr 26 12:31:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307084.100238 Apr 26 12:31:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 8b33fbf1-f2ea-97c7-949f-7519ee33fba7 (at 10.8.2.26@o2ib6) Apr 26 12:31:28 fir-md1-s2 kernel: Lustre: Skipped 123 previous similar messages Apr 26 12:31:32 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307092.99386 Apr 26 12:31:35 fir-md1-s2 kernel: LustreError: 100425:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9889dbedda00 x1631885371083120/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:31:35 fir-md1-s2 kernel: LustreError: 100425:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 30 previous similar messages Apr 26 12:31:36 fir-md1-s2 kernel: LNet: Service thread pid 100259 completed after 239.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:31:37 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307097.100175 Apr 26 12:31:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting Apr 26 12:31:41 fir-md1-s2 kernel: Lustre: Skipped 132 previous similar messages Apr 26 12:31:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307104.99522 Apr 26 12:31:45 fir-md1-s2 kernel: LNet: Service thread pid 100166 completed after 291.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:31:49 fir-md1-s2 kernel: Lustre: 99967:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9889dbedd400 x1631588137168768/t270596893122(0) o36->90d81c86-5db8-d29f-71be-9c3030e109bc@10.9.102.49@o2ib4:24/0 lens 488/3152 e 0 to 0 dl 1556307114 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:31:49 fir-md1-s2 kernel: Lustre: 99967:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Apr 26 12:31:56 fir-md1-s2 kernel: LNet: Service thread pid 100238 completed after 232.84s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:32:04 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff985a03bedc40/0x4f3cef65e05c2e7f lrc: 3/0,0 mode: PR/PR res: [0x28001b779:0x1e:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b28f225fa expref: 879584 pid: 100172 timeout: 306518 lvb_type: 0 Apr 26 12:32:04 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 27 previous similar messages Apr 26 12:32:06 fir-md1-s2 kernel: LNet: Service thread pid 100086 was inactive for 200.74s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:32:06 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 26 12:32:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307126.100086 Apr 26 12:32:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307132.99167 Apr 26 12:32:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307133.100081 Apr 26 12:32:22 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307142.100320 Apr 26 12:32:31 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307151.99547 Apr 26 12:32:39 fir-md1-s2 kernel: LNet: Service thread pid 100466 completed after 318.09s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:32:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307164.100507 Apr 26 12:32:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307167.100214 Apr 26 12:32:50 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307170.100172 Apr 26 12:32:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307173.99199 Apr 26 12:33:03 fir-md1-s2 kernel: LustreError: 100089:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556307093, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986a42609440/0x4f3cef65e5b5d049 lrc: 3/1,0 mode: --/PR res: [0x28001a57e:0x4bee:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 100089 timeout: 0 lvb_type: 0 Apr 26 12:33:03 fir-md1-s2 kernel: LustreError: 100089:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 18 previous similar messages Apr 26 12:33:10 fir-md1-s2 kernel: LNet: Service thread pid 100086 completed after 264.47s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:33:10 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 26 12:33:23 fir-md1-s2 kernel: LNet: Service thread pid 100261 was inactive for 200.59s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:33:23 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 26 12:33:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307203.100261 Apr 26 12:33:37 fir-md1-s2 kernel: LNet: Service thread pid 100198 completed after 385.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:33:37 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 26 12:33:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307234.99549 Apr 26 12:34:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307248.99892 Apr 26 12:34:10 fir-md1-s2 kernel: LNet: Service thread pid 100228 completed after 425.49s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:34:10 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 26 12:34:22 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307262.99509 Apr 26 12:34:31 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307271.100141 Apr 26 12:34:45 fir-md1-s2 kernel: LNet: Service thread pid 100353 was inactive for 200.21s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:34:45 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 26 12:34:45 fir-md1-s2 kernel: Pid: 100353, comm: mdt03_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:34:45 fir-md1-s2 kernel: Call Trace: Apr 26 12:34:45 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:34:45 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 26 12:34:45 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:34:45 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:34:45 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:34:45 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:34:45 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:34:45 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:34:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307285.100353 Apr 26 12:34:46 fir-md1-s2 kernel: Pid: 99395, comm: mdt01_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:34:46 fir-md1-s2 kernel: Call Trace: Apr 26 12:34:46 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:34:46 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:34:46 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:34:46 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:34:46 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:34:46 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:34:46 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:34:46 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:34:46 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:34:46 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:34:51 fir-md1-s2 kernel: Pid: 100212, comm: mdt03_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:34:51 fir-md1-s2 kernel: Call Trace: Apr 26 12:34:51 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:34:51 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_reint_unlink+0x704/0x1430 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 26 12:34:51 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:34:51 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:34:51 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:34:51 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:34:51 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:34:51 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:34:51 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307291.100212 Apr 26 12:34:53 fir-md1-s2 kernel: LNet: Service thread pid 100089 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:34:53 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Apr 26 12:34:53 fir-md1-s2 kernel: Pid: 100089, comm: mdt01_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:34:53 fir-md1-s2 kernel: Call Trace: Apr 26 12:34:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:34:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:34:53 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x90a/0x1c30 [mdt] Apr 26 12:34:53 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:34:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:34:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:34:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:34:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:34:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:34:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307293.100089 Apr 26 12:34:55 fir-md1-s2 kernel: Pid: 100425, comm: mdt03_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:34:55 fir-md1-s2 kernel: Call Trace: Apr 26 12:34:55 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:34:55 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:34:55 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:34:55 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:34:55 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:34:55 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:34:55 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:34:55 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:34:55 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:34:55 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:34:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307295.100425 Apr 26 12:35:00 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307300.100483 Apr 26 12:35:21 fir-md1-s2 kernel: LNet: Service thread pid 99165 completed after 444.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:35:30 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307330.100415 Apr 26 12:35:43 fir-md1-s2 kernel: LNet: Service thread pid 100193 was inactive for 200.13s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:35:43 fir-md1-s2 kernel: LNet: Skipped 7 previous similar messages Apr 26 12:35:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307343.100193 Apr 26 12:35:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.59@o2ib4) Apr 26 12:35:44 fir-md1-s2 kernel: Lustre: Skipped 322 previous similar messages Apr 26 12:35:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 5af85e95-71ec-5689-9879-f126f8845b44 (at 10.8.27.1@o2ib6) reconnecting Apr 26 12:35:57 fir-md1-s2 kernel: Lustre: Skipped 319 previous similar messages Apr 26 12:35:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307357.100248 Apr 26 12:36:05 fir-md1-s2 kernel: LustreError: 100228:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9889c87b9200 x1631885376015424/t0(0) o104->fir-MDT0003@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:36:05 fir-md1-s2 kernel: LustreError: 100228:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 45 previous similar messages Apr 26 12:36:05 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307365.100073 Apr 26 12:36:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307375.100300 Apr 26 12:36:30 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307390.99168 Apr 26 12:36:31 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307391.100473 Apr 26 12:36:35 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0003_UUID lock: ffff985032f5da00/0x4f3cef65e276a9ef lrc: 3/0,0 mode: PR/PR res: [0x28001b678:0x3c:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x60000400010020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b296c76f9 expref: 472223 pid: 100396 timeout: 306789 lvb_type: 0 Apr 26 12:36:35 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 34 previous similar messages Apr 26 12:36:35 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307395.100187 Apr 26 12:36:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307398.99265 Apr 26 12:36:54 fir-md1-s2 kernel: Lustre: 99549:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9868997acb00 x1631535279996032/t270597000539(0) o36->b5280270-3b22-224e-0daa-bad5776be543@10.9.103.24@o2ib4:29/0 lens 488/3152 e 0 to 0 dl 1556307419 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:36:54 fir-md1-s2 kernel: Lustre: 99549:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 36 previous similar messages Apr 26 12:37:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307422.100259 Apr 26 12:37:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307428.100362 Apr 26 12:37:10 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307430.99472 Apr 26 12:37:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307438.100396 Apr 26 12:37:21 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307441.100143 Apr 26 12:37:29 fir-md1-s2 kernel: LNet: Service thread pid 100259 completed after 227.52s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:37:29 fir-md1-s2 kernel: LNet: Skipped 10 previous similar messages Apr 26 12:37:33 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307453.99546 Apr 26 12:37:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307473.100106 Apr 26 12:37:59 fir-md1-s2 kernel: LustreError: 100112:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556307389, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986b209ce9c0/0x4f3cef65e780a6d4 lrc: 3/0,1 mode: --/PW res: [0x28001a57e:0x2320:0x0].0x0 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100112 timeout: 0 lvb_type: 0 Apr 26 12:37:59 fir-md1-s2 kernel: LustreError: 100112:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 27 previous similar messages Apr 26 12:38:29 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307509.100103 Apr 26 12:38:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307524.100026 Apr 26 12:40:00 fir-md1-s2 kernel: Lustre: 100507:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:1s); client may timeout. req@ffff988b1ca34b00 x1631558211119824/t270597073983(0) o101->ed4bb535-6b9d-701d-993b-133faa2d1314@10.9.105.25@o2ib4:25/0 lens 376/1568 e 0 to 0 dl 1556307599 ref 1 fl Complete:/0/0 rc 0/0 Apr 26 12:40:13 fir-md1-s2 kernel: LNet: Service thread pid 100353 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:40:13 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 26 12:40:13 fir-md1-s2 kernel: Pid: 100353, comm: mdt03_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:40:13 fir-md1-s2 kernel: Call Trace: Apr 26 12:40:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:40:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:40:13 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:40:13 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:40:13 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:40:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:40:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:40:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:40:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:40:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:40:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307613.100353 Apr 26 12:40:39 fir-md1-s2 kernel: Pid: 99530, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:40:39 fir-md1-s2 kernel: Call Trace: Apr 26 12:40:39 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:40:39 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:40:39 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:40:39 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:40:39 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:40:39 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:40:39 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:40:39 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:40:39 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:40:39 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:40:39 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:40:40 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:40:40 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:40:40 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:40:40 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 26 12:40:40 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:40:40 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 26 12:40:40 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556307640.99530 Apr 26 12:40:47 fir-md1-s2 kernel: Lustre: 100454:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff984b0f119500 x1631686176801424/t270597078705(0) o36->e069e613-f413-14c2-adc9-8bb2c0565535@10.8.20.30@o2ib6:16/0 lens 488/424 e 0 to 0 dl 1556307646 ref 1 fl Complete:/0/0 rc 0/0 Apr 26 12:41:48 fir-md1-s2 kernel: LNet: Service thread pid 100187 completed after 512.84s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:41:48 fir-md1-s2 kernel: LNet: Skipped 23 previous similar messages Apr 26 12:41:53 fir-md1-s2 kernel: Lustre: 100141:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:1s); client may timeout. req@ffff98680d7ddd00 x1631588137219376/t0(0) o101->90d81c86-5db8-d29f-71be-9c3030e109bc@10.9.102.49@o2ib4:21/0 lens 480/536 e 0 to 0 dl 1556307712 ref 1 fl Complete:/0/0 rc 0/0 Apr 26 12:59:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f9520735-3587-9f92-9b49-3202fab97620 (at 10.8.30.15@o2ib6) Apr 26 12:59:17 fir-md1-s2 kernel: Lustre: Skipped 429 previous similar messages Apr 26 12:59:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f9520735-3587-9f92-9b49-3202fab97620 (at 10.8.30.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985b3e639000, cur 1556308758 expire 1556308608 last 1556308531 Apr 26 13:46:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to a9fc322a-89e5-f767-17bb-13992a0ba81e (at 10.8.30.8@o2ib6) Apr 26 13:46:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 13:46:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a9fc322a-89e5-f767-17bb-13992a0ba81e (at 10.8.30.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9876aae32400, cur 1556311593 expire 1556311443 last 1556311366 Apr 26 13:46:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:02:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) Apr 26 14:02:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:02:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cf8f85a2-1ca4-7dfc-de87-ed9c96e9cc9f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984f7e262800, cur 1556312565 expire 1556312415 last 1556312338 Apr 26 14:02:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:04:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8b977ef5-63cd-e7d9-1ad7-6171ef01e9ba (at 10.8.26.33@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff985b3a7d5400, cur 1556312641 expire 1556312491 last 1556312424 Apr 26 14:04:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:04:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 8b977ef5-63cd-e7d9-1ad7-6171ef01e9ba (at 10.8.26.33@o2ib6) Apr 26 14:04:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:07:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 220c5a2f-2274-34f0-c226-fa1f26d20160 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987113a4d000, cur 1556312862 expire 1556312712 last 1556312635 Apr 26 14:07:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:09:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 220c5a2f-2274-34f0-c226-fa1f26d20160 (at 10.8.1.29@o2ib6) Apr 26 14:09:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:15:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) Apr 26 14:15:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:17:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f57f7eba-2fdb-1280-c406-601e64f86cd2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98693da8f000, cur 1556313432 expire 1556313282 last 1556313205 Apr 26 14:17:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:18:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 8b977ef5-63cd-e7d9-1ad7-6171ef01e9ba (at 10.8.26.33@o2ib6) Apr 26 14:18:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:22:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) Apr 26 14:22:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:22:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4dc5ff48-cd20-3360-f645-5e8beded0d02 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987303ea9c00, cur 1556313735 expire 1556313585 last 1556313508 Apr 26 14:22:15 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 26 14:28:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6828267f-f93f-4abc-d5f2-c6e71ea2ea13 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9879d16bb800, cur 1556314086 expire 1556313936 last 1556313859 Apr 26 14:28:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:28:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) Apr 26 14:28:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:32:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 14:32:22 fir-md1-s2 kernel: Lustre: Skipped 411 previous similar messages Apr 26 14:32:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 14:38:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e00ca9c2-e667-86b7-4915-a7d0278e830d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9865b8f4ac00, cur 1556314684 expire 1556314534 last 1556314457 Apr 26 14:38:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:38:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) Apr 26 14:38:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:43:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) Apr 26 14:43:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:44:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1069ecfd-adf5-2c6c-b844-4e46541c9c4b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff988a22eb3400, cur 1556315078 expire 1556314928 last 1556314851 Apr 26 14:44:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:51:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1a12465e-b7f1-a42e-07d7-07a9255244e3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff988984224000, cur 1556315463 expire 1556315313 last 1556315236 Apr 26 14:51:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 14:59:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 14:59:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 14:59:26 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 26 14:59:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.107.68@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 26 14:59:55 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Apr 26 15:11:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 15:11:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 15:35:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 15:35:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 15:41:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a0fe0889-0c25-9b23-e21f-0fe6633e4388 (at 10.9.101.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987adfbe2800, cur 1556318490 expire 1556318340 last 1556318263 Apr 26 15:41:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 15:44:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 15:44:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 15:48:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9c79909a-77e2-a34a-b7c3-31dc625f9836 (at 10.9.103.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985b3e63e000, cur 1556318919 expire 1556318769 last 1556318692 Apr 26 15:48:39 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Apr 26 15:54:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 38282ee7-5b2e-0d0f-e08b-07dbfb6d401e (at 10.9.103.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987843b33800, cur 1556319242 expire 1556319092 last 1556319015 Apr 26 15:54:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 16:03:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.13@o2ib4) Apr 26 16:03:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 16:03:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 36ea7d8d-7e4a-21ad-88b0-42bf5ab07452 (at 10.9.108.5@o2ib4) Apr 26 16:03:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 26 16:08:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 11c0ad46-dc6c-2ef5-3cdb-fbfdb747c28e (at 10.8.14.5@o2ib6) Apr 26 16:08:26 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 26 16:09:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.1@o2ib4) Apr 26 16:09:44 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 26 16:24:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 89a99511-90b9-ed05-4958-be4f82c827b0 (at 10.8.11.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987abe6fe400, cur 1556321083 expire 1556320933 last 1556320856 Apr 26 16:24:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 16:29:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:29:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 16:29:29 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 26 16:47:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:47:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 16:53:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:53:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 16:53:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 16:53:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 16:56:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 89a99511-90b9-ed05-4958-be4f82c827b0 (at 10.8.11.3@o2ib6) Apr 26 16:56:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 16:56:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 17:05:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) reconnecting Apr 26 17:05:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 17:05:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 17:29:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) Apr 26 17:29:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 17:29:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d9411336-de37-573c-3770-30a33eb3671d (at 10.9.107.68@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98746d3e3400, cur 1556324991 expire 1556324841 last 1556324764 Apr 26 17:29:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 17:52:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.103.40@o2ib4) Apr 26 17:52:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 17:54:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 9c79909a-77e2-a34a-b7c3-31dc625f9836 (at 10.9.103.43@o2ib4) Apr 26 17:54:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 18:56:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6d67e932-1276-a13a-e23d-9293929aac12 (at 10.8.20.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9866c9799800, cur 1556330174 expire 1556330024 last 1556329947 Apr 26 18:56:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 18:56:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 6d67e932-1276-a13a-e23d-9293929aac12 (at 10.8.20.11@o2ib6) Apr 26 18:56:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 19:32:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ddf9edec-c20e-03ac-2f90-70269ff89ca8 (at 10.8.9.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9879036ffc00, cur 1556332366 expire 1556332216 last 1556332139 Apr 26 19:32:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 22:08:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b1c719e0-e359-3672-f68c-50568515d76a (at 10.8.20.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9877fb703000, cur 1556341705 expire 1556341555 last 1556341478 Apr 26 22:08:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 22:08:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b1c719e0-e359-3672-f68c-50568515d76a (at 10.8.20.13@o2ib6) Apr 26 22:08:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 22:12:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client b9c6f341-1402-7ab1-11c3-68af210aa388 (at 10.8.30.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986b3e88f400, cur 1556341948 expire 1556341798 last 1556341721 Apr 26 22:12:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 26 22:12:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b9c6f341-1402-7ab1-11c3-68af210aa388 (at 10.8.30.7@o2ib6) Apr 26 22:12:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 01:02:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client a19416cf-bdf5-3e6e-7a8a-06ebf953ccde (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986b1e6c2000, cur 1556352135 expire 1556351985 last 1556351908 Apr 27 01:02:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 01:30:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987ad44bf800, cur 1556353846 expire 1556353696 last 1556353619 Apr 27 01:30:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 01:30:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) Apr 27 01:30:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 02:17:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) Apr 27 02:17:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 02:17:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 42023289-bd73-d552-9154-8822cfffd412 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987296617c00, cur 1556356667 expire 1556356517 last 1556356440 Apr 27 02:17:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 03:40:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0e15507c-acb2-e83c-13f2-69ce5b69d859 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9850fcadc400, cur 1556361636 expire 1556361486 last 1556361409 Apr 27 03:40:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 03:40:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0e15507c-acb2-e83c-13f2-69ce5b69d859 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98749aeb1c00, cur 1556361638 expire 1556361488 last 1556361411 Apr 27 03:41:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) Apr 27 03:41:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 04:07:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 94710490-ff88-a015-6011-4fd8162eb622 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986372216c00, cur 1556363279 expire 1556363129 last 1556363052 Apr 27 04:08:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) Apr 27 04:08:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 04:56:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client efd9b01d-7646-af53-52bb-21baf8260d64 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98793fbb1800, cur 1556366184 expire 1556366034 last 1556365957 Apr 27 04:56:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 04:56:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efd9b01d-7646-af53-52bb-21baf8260d64 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9867043cf400, cur 1556366196 expire 1556366046 last 1556365969 Apr 27 04:56:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.4@o2ib6) Apr 27 04:56:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 05:00:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 27b59868-2412-e12b-495b-ef469cbee2d7 (at 10.8.22.4@o2ib6) Apr 27 05:00:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 05:01:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 27b59868-2412-e12b-495b-ef469cbee2d7 (at 10.8.22.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9867043cd800, cur 1556366465 expire 1556366315 last 1556366238 Apr 27 05:19:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5b36977e-2123-3c92-aece-1cc3fbfc3aea (at 10.8.14.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986a3c293c00, cur 1556367566 expire 1556367416 last 1556367339 Apr 27 05:19:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 05:52:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) Apr 27 05:52:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 05:53:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 964455a3-fd69-54d9-fb4c-40ccde325a87 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98651b2ca000, cur 1556369581 expire 1556369431 last 1556369354 Apr 27 05:53:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 08:25:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f8efffa2-3213-b7c8-ccdc-9a5668e213bd (at 10.8.25.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff988b09200800, cur 1556378720 expire 1556378570 last 1556378493 Apr 27 08:25:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 08:25:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.25.13@o2ib6) Apr 27 08:25:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 12:01:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 042451f7-afee-3ccc-06ab-cb159c555b53 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9850c439a400, cur 1556391680 expire 1556391530 last 1556391453 Apr 27 12:01:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 12:01:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 042451f7-afee-3ccc-06ab-cb159c555b53 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985166e2b400, cur 1556391685 expire 1556391535 last 1556391458 Apr 27 12:01:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4383278-bb6b-8789-575f-020406af24f4 (at 10.8.21.21@o2ib6) Apr 27 13:02:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 804ab419-a813-7f25-6cb1-35e14a3ff3a4 (at 10.8.1.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987ad44be000, cur 1556395365 expire 1556395215 last 1556395138 Apr 27 13:04:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 804ab419-a813-7f25-6cb1-35e14a3ff3a4 (at 10.8.1.11@o2ib6) Apr 27 13:04:34 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 27 13:45:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0f9dff68-8375-0be2-d3b1-88e8149ac753 (at 10.8.10.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987295e2f400, cur 1556397903 expire 1556397753 last 1556397676 Apr 27 13:45:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 13:47:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 0f9dff68-8375-0be2-d3b1-88e8149ac753 (at 10.8.10.5@o2ib6) Apr 27 13:47:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 15:34:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d3da365a-e12b-dd95-2805-14a34f48c77c (at 10.9.102.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986b3e889800, cur 1556404441 expire 1556404291 last 1556404214 Apr 27 15:34:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 23:04:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 7bcdde3c-236d-6bbd-afc2-cf3c9fa536ce (at 10.8.17.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff988b330f2400, cur 1556431447 expire 1556431297 last 1556431220 Apr 27 23:04:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 27 23:52:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 2b43d6d0-617d-0cf0-75e3-8727983baa85 (at 10.8.14.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9870baebc000, cur 1556434364 expire 1556434214 last 1556434137 Apr 27 23:52:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 00:12:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 990001b2-7e7d-402a-1dc8-85b3e7069f05 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987aa48ef000, cur 1556435523 expire 1556435373 last 1556435296 Apr 28 00:12:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 00:14:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 28 00:14:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 01:53:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 61551ba3-276b-add5-766b-362f4d060385 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98791d683c00, cur 1556441601 expire 1556441451 last 1556441374 Apr 28 01:53:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 02:54:01 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.22.30@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 03:40:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.7.33@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 03:41:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.7.33@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:08:26 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.7.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:29:20 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.8.13@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:33:03 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:33:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:34:43 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:35:34 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:36:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 04:38:35 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.28.5@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 06:51:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 57561d34-eaf6-b386-87f8-353ac279f4f2 (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985737e97c00, cur 1556459479 expire 1556459329 last 1556459252 Apr 28 06:51:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 07:39:07 fir-md1-s2 kernel: Lustre: 100273:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985f30611e00 x1631880433116912/t0(0) o101->f191d009-08d1-2ed7-450f-b5fd9785f522@10.8.24.20@o2ib6:12/0 lens 568/0 e 1 to 0 dl 1556462352 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:39:07 fir-md1-s2 kernel: Lustre: 100273:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 45 previous similar messages Apr 28 07:39:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b2acd6c0-c0f5-61d3-4a68-78d78ff1740e (at 10.8.27.13@o2ib6) reconnecting Apr 28 07:39:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to ddef0525-fd05-baf0-eec8-55af7a82431b (at 10.8.24.4@o2ib6) Apr 28 07:39:13 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 28 07:39:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.30.31@o2ib6) Apr 28 07:39:13 fir-md1-s2 kernel: Lustre: Skipped 20 previous similar messages Apr 28 07:39:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client cf596cc9-7297-c8cd-7acc-7023bcdbec89 (at 10.8.18.23@o2ib6) reconnecting Apr 28 07:39:15 fir-md1-s2 kernel: Lustre: Skipped 23 previous similar messages Apr 28 07:39:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to cf596cc9-7297-c8cd-7acc-7023bcdbec89 (at 10.8.18.23@o2ib6) Apr 28 07:39:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 28 07:39:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a96d1534-516d-d4f9-9639-2e53eca7911a (at 10.8.12.36@o2ib6) reconnecting Apr 28 07:39:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to a96d1534-516d-d4f9-9639-2e53eca7911a (at 10.8.12.36@o2ib6) Apr 28 07:39:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 5483b434-bcd6-aaf1-da57-b35b3900df09 (at 10.8.17.6@o2ib6) reconnecting Apr 28 07:39:28 fir-md1-s2 kernel: Lustre: Skipped 15 previous similar messages Apr 28 07:39:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.17.6@o2ib6) Apr 28 07:39:28 fir-md1-s2 kernel: Lustre: Skipped 15 previous similar messages Apr 28 07:39:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a3b6485f-4f02-6f2d-4f9c-1f19f04443c8 (at 10.8.12.25@o2ib6) reconnecting Apr 28 07:39:37 fir-md1-s2 kernel: Lustre: Skipped 41 previous similar messages Apr 28 07:39:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to a3b6485f-4f02-6f2d-4f9c-1f19f04443c8 (at 10.8.12.25@o2ib6) Apr 28 07:39:37 fir-md1-s2 kernel: Lustre: Skipped 41 previous similar messages Apr 28 07:39:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9c58438d-335a-1a4a-8b6e-0ac0b859df8d (at 10.8.12.23@o2ib6) reconnecting Apr 28 07:39:54 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Apr 28 07:39:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.12.23@o2ib6) Apr 28 07:39:54 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Apr 28 07:40:22 fir-md1-s2 kernel: LustreError: 100449:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556462332, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98508ba81b00/0x4f3cef67c9f609f1 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 109 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100449 timeout: 0 lvb_type: 0 Apr 28 07:40:22 fir-md1-s2 kernel: LustreError: 100449:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 21 previous similar messages Apr 28 07:40:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d2ab40ab-8888-3abb-75f9-9c32b2196967 (at 10.8.26.26@o2ib6) reconnecting Apr 28 07:40:26 fir-md1-s2 kernel: Lustre: Skipped 95 previous similar messages Apr 28 07:40:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d2ab40ab-8888-3abb-75f9-9c32b2196967 (at 10.8.26.26@o2ib6) Apr 28 07:40:26 fir-md1-s2 kernel: Lustre: Skipped 95 previous similar messages Apr 28 07:41:22 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.22.24@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987a91b56c00/0x4f3cef67c9f609b2 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 109 type: IBT flags: 0x60200400000020 nid: 10.8.22.24@o2ib6 remote: 0xf16ef805b8c9be81 expref: 28 pid: 100285 timeout: 461876 lvb_type: 0 Apr 28 07:41:22 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 35 previous similar messages Apr 28 07:41:22 fir-md1-s2 kernel: LustreError: 100426:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff98732b31f400 ns: mdt-fir-MDT0003_UUID lock: ffff9865cd7b9440/0x4f3cef67c9f60aa0 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x20/0x0 rrc: 109 type: IBT flags: 0x50200000000000 nid: 10.8.22.24@o2ib6 remote: 0xf16ef805b8c9be88 expref: 5 pid: 100426 timeout: 0 lvb_type: 0 Apr 28 07:41:47 fir-md1-s2 kernel: Lustre: 100274:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff985f27e6c800 x1631545968692704/t0(0) o101->e508f6f3-acda-49f1-6911-42786d06f3ec@10.8.17.21@o2ib6:22/0 lens 568/0 e 0 to 0 dl 1556462512 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:41:47 fir-md1-s2 kernel: Lustre: 100274:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 51 previous similar messages Apr 28 07:41:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2cc0bc1b-7a1f-9dab-b36c-c6206a02385d (at 10.8.20.20@o2ib6) reconnecting Apr 28 07:41:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.22.23@o2ib6) Apr 28 07:41:53 fir-md1-s2 kernel: Lustre: Skipped 161 previous similar messages Apr 28 07:41:53 fir-md1-s2 kernel: Lustre: Skipped 178 previous similar messages Apr 28 07:42:52 fir-md1-s2 kernel: LustreError: 100285:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556462482, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9878032f33c0/0x4f3cef67ca29220b lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 222 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 100285 timeout: 0 lvb_type: 0 Apr 28 07:42:52 fir-md1-s2 kernel: LustreError: 100285:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 115 previous similar messages Apr 28 07:43:52 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.17.21@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff985a22cab3c0/0x4f3cef67ca29488b lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 205 type: IBT flags: 0x60200400000020 nid: 10.8.17.21@o2ib6 remote: 0x8fb41e17adcd0ed1 expref: 22 pid: 100388 timeout: 462026 lvb_type: 0 Apr 28 07:44:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 271fd2d2-5016-c84d-a8f4-1b2be557d398 (at 10.8.10.24@o2ib6) reconnecting Apr 28 07:44:02 fir-md1-s2 kernel: Lustre: Skipped 471 previous similar messages Apr 28 07:44:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 271fd2d2-5016-c84d-a8f4-1b2be557d398 (at 10.8.10.24@o2ib6) Apr 28 07:44:02 fir-md1-s2 kernel: Lustre: Skipped 491 previous similar messages Apr 28 07:44:09 fir-md1-s2 kernel: Lustre: 99546:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98769a656900 x1631558874132432/t0(0) o101->aebbd095-b50a-7733-a754-e1423fab741f@10.8.18.10@o2ib6:14/0 lens 576/3264 e 0 to 0 dl 1556462654 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 07:44:09 fir-md1-s2 kernel: Lustre: 99546:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 209 previous similar messages Apr 28 07:44:43 fir-md1-s2 kernel: LNet: Service thread pid 100309 was inactive for 200.69s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:44:43 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 28 07:44:43 fir-md1-s2 kernel: Pid: 100309, comm: mdt02_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:44:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:44:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:44:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:44:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462683.100309 Apr 28 07:44:43 fir-md1-s2 kernel: Pid: 99387, comm: mdt01_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:44:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:44:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:44:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:44:43 fir-md1-s2 kernel: Pid: 99406, comm: mdt01_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:44:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:44:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:44:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:44:43 fir-md1-s2 kernel: Pid: 100260, comm: mdt01_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:44:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:44:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:44:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:44:43 fir-md1-s2 kernel: Pid: 100221, comm: mdt01_076 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:44:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:44:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:44:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:44:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:44:43 fir-md1-s2 kernel: LNet: Service thread pid 100091 was inactive for 201.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:44:43 fir-md1-s2 kernel: LNet: Skipped 17 previous similar messages Apr 28 07:44:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462684.100322 Apr 28 07:44:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462685.100200 Apr 28 07:44:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462686.100191 Apr 28 07:44:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462697.100189 Apr 28 07:45:14 fir-md1-s2 kernel: LustreError: 100383:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556462624, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98687768fbc0/0x4f3cef67ca570673 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 244 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 100383 timeout: 0 lvb_type: 0 Apr 28 07:45:14 fir-md1-s2 kernel: LustreError: 100383:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 156 previous similar messages Apr 28 07:45:36 fir-md1-s2 kernel: LNet: Service thread pid 100192 was inactive for 200.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:45:36 fir-md1-s2 kernel: LNet: Skipped 81 previous similar messages Apr 28 07:45:36 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462736.100192 Apr 28 07:45:37 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462737.100388 Apr 28 07:45:39 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462739.100433 Apr 28 07:45:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462747.99560 Apr 28 07:45:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462749.100242 Apr 28 07:45:50 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462750.99992 Apr 28 07:45:51 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462751.100250 Apr 28 07:45:52 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462752.100088 Apr 28 07:45:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462753.99399 Apr 28 07:45:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462754.100234 Apr 28 07:45:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462755.100397 Apr 28 07:45:56 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462756.100420 Apr 28 07:45:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462757.100315 Apr 28 07:46:00 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462760.100281 Apr 28 07:46:01 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462761.100251 Apr 28 07:46:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462762.100205 Apr 28 07:46:03 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462763.100357 Apr 28 07:46:04 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462764.99226 Apr 28 07:46:05 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462765.100432 Apr 28 07:46:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462766.99401 Apr 28 07:46:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462769.100001 Apr 28 07:46:10 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462770.99392 Apr 28 07:46:11 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462771.100389 Apr 28 07:46:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462772.100193 Apr 28 07:46:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462773.99390 Apr 28 07:46:14 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462774.100110 Apr 28 07:46:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462775.100401 Apr 28 07:46:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462776.99379 Apr 28 07:46:20 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462780.100263 Apr 28 07:46:22 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.14@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff98531bb80480/0x4f3cef67ca29274b lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 248 type: IBT flags: 0x60200400000020 nid: 10.8.12.14@o2ib6 remote: 0x4ea5bcbe6a4c5c2 expref: 28 pid: 100438 timeout: 462176 lvb_type: 0 Apr 28 07:46:22 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 28 07:46:22 fir-md1-s2 kernel: LNet: Service thread pid 100384 completed after 299.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:46:22 fir-md1-s2 kernel: Lustre: 100424:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:1s); client may timeout. req@ffff9850c4f60300 x1631542399611168/t0(0) o101->e70978d5-4079-f244-8878-d79e9ac6d1e1@10.8.28.5@o2ib6:16/0 lens 576/1168 e 0 to 0 dl 1556462781 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 07:46:22 fir-md1-s2 kernel: LustreError: 100322:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9872c0afac00 ns: mdt-fir-MDT0003_UUID lock: ffff9878c9a68b40/0x4f3cef67ca29ac05 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 242 type: IBT flags: 0x50200400000020 nid: 10.8.12.20@o2ib6 remote: 0x5db4eed49926c023 expref: 6 pid: 100322 timeout: 0 lvb_type: 0 Apr 28 07:46:22 fir-md1-s2 kernel: LustreError: 100322:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 07:46:22 fir-md1-s2 kernel: LNet: Skipped 127 previous similar messages Apr 28 07:47:12 fir-md1-s2 kernel: LNet: Service thread pid 100106 was inactive for 200.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:47:12 fir-md1-s2 kernel: LNet: Skipped 73 previous similar messages Apr 28 07:47:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462832.100106 Apr 28 07:47:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462837.100289 Apr 28 07:47:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462862.100135 Apr 28 07:48:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 59ed9195-3053-54a1-9f0d-fc01c085e1aa (at 10.9.105.39@o2ib4) reconnecting Apr 28 07:48:18 fir-md1-s2 kernel: Lustre: Skipped 1940 previous similar messages Apr 28 07:48:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.105.39@o2ib4) Apr 28 07:48:18 fir-md1-s2 kernel: Lustre: Skipped 1943 previous similar messages Apr 28 07:48:26 fir-md1-s2 kernel: Lustre: 100250:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff987316fda400 x1631542538549712/t0(0) o101->7ed43e69-e1c8-0e51-0bfe-f44bf39fd025@10.9.105.48@o2ib4:1/0 lens 616/0 e 1 to 0 dl 1556462911 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 07:48:26 fir-md1-s2 kernel: Lustre: 100250:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1910 previous similar messages Apr 28 07:49:22 fir-md1-s2 kernel: LNet: Service thread pid 100232 completed after 479.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:49:22 fir-md1-s2 kernel: Lustre: 100248:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:150s); client may timeout. req@ffff987703bd6600 x1631678466648304/t0(0) o101->271fd2d2-5016-c84d-a8f4-1b2be557d398@10.8.10.24@o2ib6:22/0 lens 480/0 e 0 to 0 dl 1556462812 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:49:22 fir-md1-s2 kernel: Lustre: 100248:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 07:49:22 fir-md1-s2 kernel: LustreError: 100248:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.17.9@o2ib6: deadline 30:150s ago req@ffff985f5b6ec200 x1631542869759696/t0(0) o101->fe368c15-0041-26b7-6d7c-54456281630d@10.8.17.9@o2ib6:22/0 lens 568/0 e 0 to 0 dl 1556462812 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:49:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 87d4e5a2-28f4-1320-78c3-d40b179c856f (at 10.8.12.11@o2ib6) refused connection, still busy with 9 references Apr 28 07:49:22 fir-md1-s2 kernel: LNet: Skipped 8 previous similar messages Apr 28 07:49:42 fir-md1-s2 kernel: LNet: Service thread pid 100439 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:49:42 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 07:49:42 fir-md1-s2 kernel: Pid: 100439, comm: mdt00_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:49:42 fir-md1-s2 kernel: Call Trace: Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:49:42 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:49:42 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:49:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462982.100439 Apr 28 07:49:42 fir-md1-s2 kernel: Pid: 99386, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:49:42 fir-md1-s2 kernel: Call Trace: Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:49:42 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:49:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:49:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:49:43 fir-md1-s2 kernel: Pid: 100112, comm: mdt01_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:49:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:49:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:49:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:49:43 fir-md1-s2 kernel: Pid: 100026, comm: mdt01_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:49:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:49:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:49:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:49:43 fir-md1-s2 kernel: Pid: 100093, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:49:43 fir-md1-s2 kernel: Call Trace: Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:49:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:49:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:49:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:49:43 fir-md1-s2 kernel: LNet: Service thread pid 100262 was inactive for 200.80s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:49:43 fir-md1-s2 kernel: LNet: Skipped 18 previous similar messages Apr 28 07:49:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556462984.100187 Apr 28 07:49:56 fir-md1-s2 kernel: LustreError: 99547:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556462906, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9853fdad21c0/0x4f3cef67ca895626 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 561 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 99547 timeout: 0 lvb_type: 0 Apr 28 07:49:56 fir-md1-s2 kernel: LustreError: 99547:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 213 previous similar messages Apr 28 07:50:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463012.99556 Apr 28 07:50:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463013.99561 Apr 28 07:50:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463044.100402 Apr 28 07:51:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463075.100307 Apr 28 07:51:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463076.100407 Apr 28 07:51:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463077.99905 Apr 28 07:51:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463107.99547 Apr 28 07:51:52 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.27.19@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9878dab1d580/0x4f3cef67ca294cc1 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 222 type: IBT flags: 0x60200400000020 nid: 10.8.27.19@o2ib6 remote: 0x5caa65b474237c6f expref: 23 pid: 99400 timeout: 462386 lvb_type: 0 Apr 28 07:51:52 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Apr 28 07:51:52 fir-md1-s2 kernel: LNet: Service thread pid 99553 completed after 329.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:51:52 fir-md1-s2 kernel: Lustre: 99387:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff985f98a79200 x1631546275065072/t0(0) o101->40047af1-727c-af36-6cf4-0ce2eaf8f0e0@10.8.7.28@o2ib6:22/0 lens 568/0 e 0 to 0 dl 1556462992 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:51:52 fir-md1-s2 kernel: Lustre: 99387:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2184 previous similar messages Apr 28 07:51:52 fir-md1-s2 kernel: LustreError: 100225:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.26.26@o2ib6: deadline 30:110s ago req@ffff985f26f88900 x1631782093269520/t0(0) o400->@:2/0 lens 224/0 e 0 to 0 dl 1556463002 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:51:52 fir-md1-s2 kernel: LustreError: 100225:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Apr 28 07:51:52 fir-md1-s2 kernel: LNet: Skipped 23 previous similar messages Apr 28 07:51:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463118.100398 Apr 28 07:52:22 fir-md1-s2 kernel: LustreError: 99906:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9872c0afac00 ns: mdt-fir-MDT0003_UUID lock: ffff985273f23180/0x4f3cef67ca294e81 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 218 type: IBT flags: 0x50200400000020 nid: 10.8.12.20@o2ib6 remote: 0x5db4eed49926bfc8 expref: 2 pid: 99906 timeout: 0 lvb_type: 0 Apr 28 07:52:22 fir-md1-s2 kernel: LustreError: 99906:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 28 07:52:22 fir-md1-s2 kernel: Lustre: 99906:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:506s); client may timeout. req@ffff98757b618000 x1631738257531600/t0(0) o101->2148b651-1ee6-12b9-c46c-72caa706afa6@10.8.12.20@o2ib6:22/0 lens 480/536 e 0 to 0 dl 1556462636 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 07:52:22 fir-md1-s2 kernel: Lustre: 99906:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1912 previous similar messages Apr 28 07:52:29 fir-md1-s2 kernel: LustreError: 100743:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.20.29@o2ib6 arrived at 1556463149 with bad export cookie 5709701647484587238 Apr 28 07:52:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463162.100248 Apr 28 07:52:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463164.100413 Apr 28 07:52:48 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463168.100417 Apr 28 07:52:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463169.100165 Apr 28 07:52:50 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463170.100416 Apr 28 07:52:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463178.99564 Apr 28 07:53:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463186.100368 Apr 28 07:53:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463189.100166 Apr 28 07:53:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463193.100293 Apr 28 07:53:14 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463194.100229 Apr 28 07:53:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463195.99554 Apr 28 07:53:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463206.100028 Apr 28 07:53:36 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463216.100410 Apr 28 07:53:51 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463231.100331 Apr 28 07:53:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463237.100055 Apr 28 07:54:07 fir-md1-s2 kernel: LNet: Service thread pid 100425 was inactive for 200.12s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 07:54:07 fir-md1-s2 kernel: LNet: Skipped 225 previous similar messages Apr 28 07:54:07 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463247.100425 Apr 28 07:54:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463248.99385 Apr 28 07:54:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463257.100489 Apr 28 07:54:22 fir-md1-s2 kernel: Lustre: 100245:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:118s); client may timeout. req@ffff985f27728300 x1631545705935584/t0(0) o101->514546c0-f541-1aff-a686-3b517b2c3225@10.9.105.10@o2ib4:24/0 lens 576/0 e 0 to 0 dl 1556463144 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:54:22 fir-md1-s2 kernel: LustreError: 99226:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.20.29@o2ib6: deadline 30:99s ago req@ffff9874fa3ebf00 x1631691050634160/t0(0) o101->d1dc0969-3578-2cb2-51d3-c95b01afa9d8@10.8.20.29@o2ib6:13/0 lens 376/0 e 0 to 0 dl 1556463163 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 07:54:22 fir-md1-s2 kernel: LustreError: 99226:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Apr 28 07:54:22 fir-md1-s2 kernel: LNetError: 98923:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 28 07:54:22 fir-md1-s2 kernel: Lustre: 100245:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1659 previous similar messages Apr 28 07:54:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463266.100338 Apr 28 07:54:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463278.100436 Apr 28 07:54:48 fir-md1-s2 kernel: LNet: Service thread pid 100341 was inactive for 200.42s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:54:48 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 07:54:48 fir-md1-s2 kernel: Pid: 100341, comm: mdt02_097 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:54:48 fir-md1-s2 kernel: Call Trace: Apr 28 07:54:48 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:54:48 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:54:48 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:54:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:54:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:54:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463289.100341 Apr 28 07:54:49 fir-md1-s2 kernel: Pid: 100212, comm: mdt03_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:54:49 fir-md1-s2 kernel: Call Trace: Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:54:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:54:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:54:49 fir-md1-s2 kernel: Pid: 99562, comm: mdt00_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:54:49 fir-md1-s2 kernel: Call Trace: Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:54:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:54:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:54:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:54:53 fir-md1-s2 kernel: Pid: 100201, comm: mdt02_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:54:53 fir-md1-s2 kernel: Call Trace: Apr 28 07:54:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:54:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:54:53 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:54:53 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:54:53 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:54:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:54:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:54:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:54:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:54:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:54:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463293.100201 Apr 28 07:54:59 fir-md1-s2 kernel: Pid: 100500, comm: mdt03_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:54:59 fir-md1-s2 kernel: Call Trace: Apr 28 07:54:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:54:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:54:59 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:54:59 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:54:59 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:54:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:54:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:54:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:54:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:54:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:54:59 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463299.100500 Apr 28 07:55:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463312.100306 Apr 28 07:55:14 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463314.99316 Apr 28 07:55:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463324.100361 Apr 28 07:55:33 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463333.99520 Apr 28 07:55:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463342.99906 Apr 28 07:55:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463344.99989 Apr 28 07:55:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463345.100333 Apr 28 07:56:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463376.100372 Apr 28 07:56:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463404.100422 Apr 28 07:56:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463406.100446 Apr 28 07:56:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 82f700ce-d75c-3551-750f-ac23ecc311b7 (at 10.9.105.15@o2ib4) reconnecting Apr 28 07:56:50 fir-md1-s2 kernel: Lustre: Skipped 7221 previous similar messages Apr 28 07:56:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 82f700ce-d75c-3551-750f-ac23ecc311b7 (at 10.9.105.15@o2ib4) Apr 28 07:56:50 fir-md1-s2 kernel: Lustre: Skipped 7225 previous similar messages Apr 28 07:56:52 fir-md1-s2 kernel: LNet: Service thread pid 100238 completed after 629.71s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 07:56:52 fir-md1-s2 kernel: Lustre: 100130:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:118s); client may timeout. req@ffff987707eae600 x1631677368375312/t0(0) o101->393bd61e-5b20-710b-902e-143334afc70c@10.9.115.3@o2ib4:24/0 lens 1768/0 e 0 to 0 dl 1556463294 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:56:52 fir-md1-s2 kernel: Lustre: 100130:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 07:56:52 fir-md1-s2 kernel: LustreError: 99163:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.22.24@o2ib6: deadline 30:110s ago req@ffff98673ff8b900 x1631857069451392/t0(0) o400->@:2/0 lens 224/0 e 0 to 0 dl 1556463302 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:56:52 fir-md1-s2 kernel: LustreError: 99163:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Apr 28 07:56:52 fir-md1-s2 kernel: LustreError: 100299:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987a3153f000 ns: mdt-fir-MDT0003_UUID lock: ffff986fa2f0c800/0x4f3cef67ca29503a lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 218 type: IBT flags: 0x50200400000020 nid: 10.8.24.2@o2ib6 remote: 0xdaa583d9f2035400 expref: 10 pid: 100299 timeout: 0 lvb_type: 0 Apr 28 07:56:52 fir-md1-s2 kernel: LNet: Skipped 78 previous similar messages Apr 28 07:57:00 fir-md1-s2 kernel: Lustre: 100136:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff987a2ba7ef00 x1631738118508832/t0(0) o101->25512127-e6de-b60b-cf78-f84b6ec57480@10.8.21.14@o2ib6:5/0 lens 576/3264 e 0 to 0 dl 1556463425 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 07:57:00 fir-md1-s2 kernel: Lustre: 100136:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8138 previous similar messages Apr 28 07:57:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463444.99510 Apr 28 07:57:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463462.100226 Apr 28 07:57:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463464.100474 Apr 28 07:57:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463465.99550 Apr 28 07:57:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463466.100362 Apr 28 07:57:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463478.100288 Apr 28 07:58:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463495.100371 Apr 28 07:59:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463557.99213 Apr 28 07:59:22 fir-md1-s2 kernel: Lustre: 100456:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff984e47b63000 x1631652662645584/t0(0) o101->92ffa420-d747-a973-baf2-68cec64e7e81@10.9.113.14@o2ib4:22/0 lens 376/0 e 0 to 0 dl 1556463442 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:59:22 fir-md1-s2 kernel: LustreError: 100263:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.22.24@o2ib6: deadline 100:33s ago req@ffff98600eb2b600 x1631857069470736/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556463529 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:59:22 fir-md1-s2 kernel: LustreError: 100263:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 15 previous similar messages Apr 28 07:59:22 fir-md1-s2 kernel: Lustre: 100456:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2432 previous similar messages Apr 28 07:59:48 fir-md1-s2 kernel: LNet: Service thread pid 100343 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 07:59:48 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 07:59:48 fir-md1-s2 kernel: Pid: 100343, comm: mdt03_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:59:49 fir-md1-s2 kernel: Call Trace: Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:59:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:59:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:59:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463589.100343 Apr 28 07:59:49 fir-md1-s2 kernel: Pid: 100290, comm: mdt03_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 07:59:49 fir-md1-s2 kernel: Call Trace: Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 07:59:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 07:59:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 07:59:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 07:59:52 fir-md1-s2 kernel: LustreError: 100104:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.7.16@o2ib6: deadline 30:135s ago req@ffff986b2118bc00 x1631546228364320/t0(0) o400->@:7/0 lens 224/0 e 0 to 0 dl 1556463457 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 07:59:52 fir-md1-s2 kernel: LustreError: 100375:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987843b34000 ns: mdt-fir-MDT0003_UUID lock: ffff985a22cabf00/0x4f3cef67ca2952ef lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 199 type: IBT flags: 0x50200400000020 nid: 10.8.12.14@o2ib6 remote: 0x4ea5bcbe6a4c5de expref: 2 pid: 100375 timeout: 0 lvb_type: 0 Apr 28 07:59:52 fir-md1-s2 kernel: LustreError: 100375:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 28 07:59:52 fir-md1-s2 kernel: LustreError: 100104:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Apr 28 08:00:13 fir-md1-s2 kernel: Pid: 100197, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:00:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:00:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:00:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:00:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463613.100197 Apr 28 08:00:13 fir-md1-s2 kernel: Pid: 100276, comm: mdt01_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:00:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:00:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:00:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:00:13 fir-md1-s2 kernel: Pid: 100040, comm: mdt01_040 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:00:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:00:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:00:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:00:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:00:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client dbce07df-2794-8343-e738-4aed71ec3151 (at 10.8.22.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987b0bae3400, cur 1556463639 expire 1556463489 last 1556463412 Apr 28 08:00:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 08:00:52 fir-md1-s2 kernel: LustreError: 100263:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556463562, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9878f26b4140/0x4f3cef67ca9635ac lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x13/0x8 rrc: 197 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100263 timeout: 0 lvb_type: 0 Apr 28 08:00:52 fir-md1-s2 kernel: LustreError: 100263:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 217 previous similar messages Apr 28 08:01:52 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.1.24@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987aff20b600/0x4f3cef67ca87fce1 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 633 type: IBT flags: 0x60200400000020 nid: 10.8.1.24@o2ib6 remote: 0x2827843bb05ad96c expref: 24 pid: 100263 timeout: 463106 lvb_type: 0 Apr 28 08:01:52 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Apr 28 08:01:52 fir-md1-s2 kernel: Lustre: 99992:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (929:1s); client may timeout. req@ffff985f5d3e4800 x1631621151531696/t0(0) o101->e9e3c1ff-a8b3-4acb-aa43-c8a68462511f@10.8.7.20@o2ib6:22/0 lens 480/536 e 0 to 0 dl 1556463711 ref 1 fl Complete:/0/0 rc 0/0 Apr 28 08:01:52 fir-md1-s2 kernel: LustreError: 100001:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987252683000 ns: mdt-fir-MDT0003_UUID lock: ffff986203775580/0x4f3cef67ca87fd0b lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 627 type: IBT flags: 0x50200400000020 nid: 10.8.12.11@o2ib6 remote: 0x7598e7355361f785 expref: 5 pid: 100001 timeout: 0 lvb_type: 0 Apr 28 08:01:52 fir-md1-s2 kernel: LustreError: 100433:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.18.19@o2ib6: deadline 100:20s ago req@ffff98579ca79800 x1631559234734864/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556463692 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:01:52 fir-md1-s2 kernel: Lustre: 99992:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3349 previous similar messages Apr 28 08:02:42 fir-md1-s2 kernel: LNet: Service thread pid 100263 was inactive for 200.18s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:02:42 fir-md1-s2 kernel: LNet: Skipped 152 previous similar messages Apr 28 08:02:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463762.100263 Apr 28 08:02:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463767.100456 Apr 28 08:03:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463792.100095 Apr 28 08:04:52 fir-md1-s2 kernel: Lustre: 100314:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:77s); client may timeout. req@ffff9875d8a86900 x1631657257993936/t0(0) o49->a33810e4-5bd0-c297-1a4d-1cbd7ad1d09a@10.9.101.53@o2ib4:4/0 lens 464/0 e 0 to 0 dl 1556463815 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:04:52 fir-md1-s2 kernel: LustreError: 100136:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.105.31@o2ib4: deadline 30:1s ago req@ffff9872e57a4500 x1631535393353008/t0(0) o101->be3281b8-7392-1e6f-de86-ca7280b5d2c8@10.9.105.31@o2ib4:21/0 lens 576/0 e 0 to 0 dl 1556463891 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 08:04:52 fir-md1-s2 kernel: Lustre: 100314:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 595 previous similar messages Apr 28 08:05:13 fir-md1-s2 kernel: LNet: Service thread pid 99381 was inactive for 200.65s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:05:13 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:05:13 fir-md1-s2 kernel: Pid: 99381, comm: mdt01_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:05:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:05:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:05:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:05:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463913.99381 Apr 28 08:05:13 fir-md1-s2 kernel: Pid: 100119, comm: mdt01_058 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:05:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:05:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:05:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:05:13 fir-md1-s2 kernel: Pid: 100312, comm: mdt01_110 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:05:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:05:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:05:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:05:13 fir-md1-s2 kernel: Pid: 99160, comm: mdt01_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:05:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:05:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:05:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:05:13 fir-md1-s2 kernel: Pid: 100047, comm: mdt01_041 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:05:13 fir-md1-s2 kernel: Call Trace: Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:05:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:05:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:05:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:05:22 fir-md1-s2 kernel: LustreError: 100315:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987a3153f000 ns: mdt-fir-MDT0003_UUID lock: ffff985a378ff080/0x4f3cef67ca87fefc lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 621 type: IBT flags: 0x50200400000020 nid: 10.8.24.2@o2ib6 remote: 0xdaa583d9f20354d9 expref: 7 pid: 100315 timeout: 0 lvb_type: 0 Apr 28 08:05:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ae62f7a4-768c-db41-e2f9-002069bc0e09 (at 10.8.7.23@o2ib6) in 167 seconds. I think it's dead, and I am evicting it. exp ffff987481a6ec00, cur 1556463939 expire 1556463789 last 1556463772 Apr 28 08:05:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463942.100301 Apr 28 08:06:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556463972.100320 Apr 28 08:06:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b296d324-7b59-db76-2f3e-4e7d337fe55a (at 10.9.103.9@o2ib4) reconnecting Apr 28 08:06:50 fir-md1-s2 kernel: Lustre: Skipped 9342 previous similar messages Apr 28 08:06:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.103.9@o2ib4) Apr 28 08:06:50 fir-md1-s2 kernel: Lustre: Skipped 9353 previous similar messages Apr 28 08:07:00 fir-md1-s2 kernel: Lustre: 100186:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff986b2afe8000 x1631649968229920/t0(0) o101->a3f3d2bc-d481-b26b-e6da-3afa952c1e68@10.9.108.9@o2ib4:5/0 lens 1768/0 e 1 to 0 dl 1556464025 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 08:07:00 fir-md1-s2 kernel: Lustre: 100186:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11421 previous similar messages Apr 28 08:07:22 fir-md1-s2 kernel: LNet: Service thread pid 99965 completed after 1559.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:07:22 fir-md1-s2 kernel: Lustre: 100186:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:90s); client may timeout. req@ffff98699e883000 x1631346119944048/t0(0) o101->81206100-f2ef-5b10-2ad6-4678a9c95a5d@10.8.11.16@o2ib6:22/0 lens 592/0 e 0 to 0 dl 1556463952 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:07:22 fir-md1-s2 kernel: Lustre: 100186:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2265 previous similar messages Apr 28 08:07:22 fir-md1-s2 kernel: LustreError: 100186:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.8.11@o2ib6: deadline 100:17s ago req@ffff985f2663f500 x1631546804520096/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556464025 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:07:22 fir-md1-s2 kernel: LustreError: 100186:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Apr 28 08:07:22 fir-md1-s2 kernel: LNet: Skipped 53 previous similar messages Apr 28 08:07:52 fir-md1-s2 kernel: LustreError: 100091:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987a3153f000 ns: mdt-fir-MDT0003_UUID lock: ffff986a03e17bc0/0x4f3cef67ca8800d1 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 625 type: IBT flags: 0x50200400000020 nid: 10.8.24.2@o2ib6 remote: 0xdaa583d9f20354d2 expref: 5 pid: 100091 timeout: 0 lvb_type: 0 Apr 28 08:07:52 fir-md1-s2 kernel: LustreError: 100091:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 08:08:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464092.99165 Apr 28 08:08:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464122.100449 Apr 28 08:09:52 fir-md1-s2 kernel: LustreError: 100187:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9865c6b2c000 ns: mdt-fir-MDT0003_UUID lock: ffff984da074e0c0/0x4f3cef67ca8812b3 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 618 type: IBT flags: 0x50200400000020 nid: 10.8.7.23@o2ib6 remote: 0xd9b3b5fe2e3852a0 expref: 4 pid: 100187 timeout: 0 lvb_type: 0 Apr 28 08:09:52 fir-md1-s2 kernel: LustreError: 100186:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.13.10@o2ib6: deadline 30:74s ago req@ffff98677c331b00 x1631315351712464/t0(0) o400->@:8/0 lens 224/0 e 0 to 0 dl 1556464118 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:09:52 fir-md1-s2 kernel: LustreError: 100186:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Apr 28 08:09:52 fir-md1-s2 kernel: LustreError: 100187:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 08:10:42 fir-md1-s2 kernel: LNet: Service thread pid 100273 was inactive for 200.41s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:10:42 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:10:42 fir-md1-s2 kernel: Pid: 100273, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:10:42 fir-md1-s2 kernel: Call Trace: Apr 28 08:10:42 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:10:42 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:10:42 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:10:42 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:10:42 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:10:42 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:10:42 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:10:42 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:10:42 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:10:42 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:10:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464242.100273 Apr 28 08:11:12 fir-md1-s2 kernel: Pid: 99965, comm: mdt01_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:11:12 fir-md1-s2 kernel: Call Trace: Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:11:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:11:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:11:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464272.99965 Apr 28 08:11:12 fir-md1-s2 kernel: Pid: 100011, comm: mdt01_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:11:12 fir-md1-s2 kernel: Call Trace: Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:11:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:11:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:11:12 fir-md1-s2 kernel: Pid: 100190, comm: mdt01_062 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:11:12 fir-md1-s2 kernel: Call Trace: Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:11:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:11:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:11:12 fir-md1-s2 kernel: Pid: 100389, comm: mdt00_068 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:11:12 fir-md1-s2 kernel: Call Trace: Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:11:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:11:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:11:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:11:22 fir-md1-s2 kernel: LustreError: 100220:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556464192, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986689b30900/0x4f3cef67ca98c33e lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x20/0x0 rrc: 197 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100220 timeout: 0 lvb_type: 0 Apr 28 08:11:22 fir-md1-s2 kernel: LustreError: 100220:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 68 previous similar messages Apr 28 08:11:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464284.100235 Apr 28 08:11:35 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464295.100253 Apr 28 08:11:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 691e4f7c-24cc-f758-5354-96c1b01f1439 (at 10.8.7.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985f26b5e000, cur 1556464299 expire 1556464149 last 1556464072 Apr 28 08:11:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 08:11:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464303.100020 Apr 28 08:12:22 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.10.24@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9861b5f0c800/0x4f3cef67ca8a0867 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 637 type: IBT flags: 0x60200400000020 nid: 10.8.10.24@o2ib6 remote: 0xb6626827ba003fe7 expref: 24 pid: 100261 timeout: 463616 lvb_type: 0 Apr 28 08:12:22 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Apr 28 08:12:22 fir-md1-s2 kernel: Lustre: 99552:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff987b0bafd400 x1631544219368240/t0(0) o101->d306ff79-fb9e-2f98-a900-35120cbd847f@10.9.101.7@o2ib4:22/0 lens 376/0 e 0 to 0 dl 1556464222 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:12:22 fir-md1-s2 kernel: Lustre: 99552:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9188 previous similar messages Apr 28 08:12:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bb5cea78-f3f7-ea45-edef-79767ebf5366 (at 10.8.13.10@o2ib6) in 183 seconds. I think it's dead, and I am evicting it. exp ffff9869afb4ac00, cur 1556464375 expire 1556464225 last 1556464192 Apr 28 08:13:12 fir-md1-s2 kernel: LNet: Service thread pid 100292 was inactive for 200.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:13:12 fir-md1-s2 kernel: LNet: Skipped 61 previous similar messages Apr 28 08:13:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464392.100292 Apr 28 08:13:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464396.100346 Apr 28 08:13:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464408.100398 Apr 28 08:13:29 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464409.100460 Apr 28 08:13:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464424.100386 Apr 28 08:14:14 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464454.100128 Apr 28 08:14:52 fir-md1-s2 kernel: LustreError: 100376:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff984e4a78d800 ns: mdt-fir-MDT0003_UUID lock: ffff98886739ee40/0x4f3cef67ca8a760d lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 633 type: IBT flags: 0x50200400000020 nid: 10.8.22.24@o2ib6 remote: 0xf16ef805b8c9c717 expref: 4 pid: 100376 timeout: 0 lvb_type: 0 Apr 28 08:14:52 fir-md1-s2 kernel: LustreError: 100300:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.10.24@o2ib6: deadline 100:50s ago req@ffff985f5fbf1200 x1631678466804672/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556464442 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:14:52 fir-md1-s2 kernel: LustreError: 100300:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 22 previous similar messages Apr 28 08:14:52 fir-md1-s2 kernel: LustreError: 100376:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 11 previous similar messages Apr 28 08:15:42 fir-md1-s2 kernel: LNet: Service thread pid 100186 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:15:42 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:15:42 fir-md1-s2 kernel: Pid: 100186, comm: mdt01_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:15:42 fir-md1-s2 kernel: Call Trace: Apr 28 08:15:42 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:15:42 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:15:42 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:15:42 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:15:42 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:15:42 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:15:42 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:15:42 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:15:42 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:15:42 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:15:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464542.100186 Apr 28 08:15:43 fir-md1-s2 kernel: Pid: 99530, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:15:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:15:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:15:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:15:43 fir-md1-s2 kernel: Pid: 100261, comm: mdt01_095 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:15:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:15:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:15:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:15:43 fir-md1-s2 kernel: Pid: 99552, comm: mdt02_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:15:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:15:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:15:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:15:43 fir-md1-s2 kernel: Pid: 100364, comm: mdt00_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:15:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:15:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:15:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:15:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:15:52 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464552.100438 Apr 28 08:16:04 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464564.100387 Apr 28 08:16:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a501b92b-e7b6-1a0d-e95a-8363a690f102 (at 10.8.11.28@o2ib6) reconnecting Apr 28 08:16:51 fir-md1-s2 kernel: Lustre: Skipped 10393 previous similar messages Apr 28 08:16:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to a501b92b-e7b6-1a0d-e95a-8363a690f102 (at 10.8.11.28@o2ib6) Apr 28 08:16:51 fir-md1-s2 kernel: Lustre: Skipped 10404 previous similar messages Apr 28 08:17:01 fir-md1-s2 kernel: Lustre: 100300:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98683f3add00 x1631636183594704/t0(0) o101->97b55247-502a-4b0e-6bc7-38b7d7a6fbce@10.9.101.31@o2ib4:6/0 lens 1768/0 e 0 to 0 dl 1556464626 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 08:17:01 fir-md1-s2 kernel: Lustre: 100300:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13752 previous similar messages Apr 28 08:17:22 fir-md1-s2 kernel: LNet: Service thread pid 100310 completed after 1379.95s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:17:22 fir-md1-s2 kernel: LNet: Skipped 235 previous similar messages Apr 28 08:18:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464692.99920 Apr 28 08:18:14 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464694.100224 Apr 28 08:18:19 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464699.100000 Apr 28 08:18:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464706.100332 Apr 28 08:18:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464718.100080 Apr 28 08:18:40 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464720.100377 Apr 28 08:18:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464723.100007 Apr 28 08:18:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464729.99551 Apr 28 08:19:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464749.100328 Apr 28 08:19:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464753.100174 Apr 28 08:19:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464755.100441 Apr 28 08:19:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464784.100419 Apr 28 08:19:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464786.100416 Apr 28 08:19:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464787.100309 Apr 28 08:20:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464815.100172 Apr 28 08:20:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464816.100287 Apr 28 08:20:42 fir-md1-s2 kernel: LNet: Service thread pid 100199 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:20:42 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 08:20:42 fir-md1-s2 kernel: Pid: 100199, comm: mdt02_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:20:42 fir-md1-s2 kernel: Call Trace: Apr 28 08:20:42 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:20:42 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:20:42 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:20:42 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:20:42 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:20:42 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:20:42 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:20:42 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:20:42 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:20:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:20:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464843.100199 Apr 28 08:20:43 fir-md1-s2 kernel: Pid: 100291, comm: mdt02_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:20:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:20:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:20:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:20:43 fir-md1-s2 kernel: Pid: 99393, comm: mdt00_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:20:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:20:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:20:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:20:43 fir-md1-s2 kernel: Pid: 100310, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:20:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:20:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:20:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:20:43 fir-md1-s2 kernel: Pid: 100304, comm: mdt02_080 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:20:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:20:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:20:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:20:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:20:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464846.100244 Apr 28 08:20:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464847.99403 Apr 28 08:21:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464877.99564 Apr 28 08:21:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464903.99161 Apr 28 08:21:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464905.100019 Apr 28 08:21:48 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464908.100256 Apr 28 08:21:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464909.100330 Apr 28 08:21:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464915.99164 Apr 28 08:22:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464935.99166 Apr 28 08:22:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556464936.99400 Apr 28 08:22:23 fir-md1-s2 kernel: LustreError: 100143:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556464853, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff984e21a860c0/0x4f3cef67ca9e0513 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 750 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100143 timeout: 0 lvb_type: 0 Apr 28 08:22:23 fir-md1-s2 kernel: LustreError: 100143:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 361 previous similar messages Apr 28 08:23:23 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.8.33@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9851a2797980/0x4f3cef67ca9bdeb3 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 162 type: IBT flags: 0x60200400000020 nid: 10.8.8.33@o2ib6 remote: 0xfd1c7f49bad93546 expref: 36 pid: 100174 timeout: 464277 lvb_type: 0 Apr 28 08:23:23 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Apr 28 08:23:23 fir-md1-s2 kernel: Lustre: 100187:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff9851d36bda00 x1631574208813472/t0(0) o101->661f0cfa-e148-dc98-69cd-517192e597e7@10.8.7.3@o2ib6:23/0 lens 480/0 e 0 to 0 dl 1556464883 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:23:23 fir-md1-s2 kernel: Lustre: 100187:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9164 previous similar messages Apr 28 08:24:13 fir-md1-s2 kernel: LNet: Service thread pid 100306 was inactive for 200.40s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:24:13 fir-md1-s2 kernel: LNet: Skipped 347 previous similar messages Apr 28 08:24:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465053.100306 Apr 28 08:24:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465055.100273 Apr 28 08:25:53 fir-md1-s2 kernel: LustreError: 99548:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.18.30@o2ib6: deadline 100:195s ago req@ffff98677925e900 x1631575323048736/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556464958 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:25:53 fir-md1-s2 kernel: LustreError: 99548:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 38 previous similar messages Apr 28 08:26:17 fir-md1-s2 kernel: Pid: 100474, comm: mdt03_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:26:17 fir-md1-s2 kernel: Call Trace: Apr 28 08:26:17 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:26:17 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:26:17 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:26:17 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:26:17 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:26:17 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:26:17 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:26:17 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:26:17 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:26:17 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:26:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465177.100474 Apr 28 08:26:43 fir-md1-s2 kernel: Pid: 100404, comm: mdt00_078 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:26:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:26:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:26:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:26:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:26:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465203.100404 Apr 28 08:26:43 fir-md1-s2 kernel: Pid: 100237, comm: mdt02_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:26:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:26:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:26:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:26:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:26:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:26:44 fir-md1-s2 kernel: Pid: 100028, comm: mdt02_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:26:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:26:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:26:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:26:44 fir-md1-s2 kernel: Pid: 100263, comm: mdt02_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:26:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:26:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:26:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:26:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:26:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2c2c4e2a-8072-33c3-de28-eb5583c5c142 (at 10.9.105.52@o2ib4) reconnecting Apr 28 08:26:51 fir-md1-s2 kernel: Lustre: Skipped 10638 previous similar messages Apr 28 08:26:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 2c2c4e2a-8072-33c3-de28-eb5583c5c142 (at 10.9.105.52@o2ib4) Apr 28 08:26:51 fir-md1-s2 kernel: Lustre: Skipped 10644 previous similar messages Apr 28 08:26:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465213.100333 Apr 28 08:27:03 fir-md1-s2 kernel: Lustre: 100261:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9867a2e53000 x1631532276859744/t0(0) o101->53167f6a-842f-386f-191a-b53b24c60d9f@10.9.105.47@o2ib4:8/0 lens 576/0 e 0 to 0 dl 1556465228 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 08:27:03 fir-md1-s2 kernel: Lustre: 100261:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12220 previous similar messages Apr 28 08:27:10 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ddef0525-fd05-baf0-eec8-55af7a82431b (at 10.8.24.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985f42f5a400, cur 1556465230 expire 1556465080 last 1556465003 Apr 28 08:28:23 fir-md1-s2 kernel: LNet: Service thread pid 100401 completed after 810.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:28:23 fir-md1-s2 kernel: LustreError: 100417:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987ab926fc00 ns: mdt-fir-MDT0003_UUID lock: ffff98509bb4a640/0x4f3cef67ca9cba3e lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 148 type: IBT flags: 0x50200400000020 nid: 10.8.8.33@o2ib6 remote: 0xfd1c7f49bad9354d expref: 4 pid: 100417 timeout: 0 lvb_type: 0 Apr 28 08:28:23 fir-md1-s2 kernel: LustreError: 100417:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 24 previous similar messages Apr 28 08:28:23 fir-md1-s2 kernel: LNet: Skipped 278 previous similar messages Apr 28 08:29:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465353.100214 Apr 28 08:29:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465368.99552 Apr 28 08:29:29 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465369.99165 Apr 28 08:31:17 fir-md1-s2 kernel: LNet: Service thread pid 100490 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:31:17 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 28 08:31:17 fir-md1-s2 kernel: Pid: 100490, comm: mdt03_086 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:31:17 fir-md1-s2 kernel: Call Trace: Apr 28 08:31:17 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:31:17 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:31:17 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:31:17 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:31:17 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:31:17 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:31:17 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:31:17 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:31:17 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:31:17 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:31:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465477.100490 Apr 28 08:31:43 fir-md1-s2 kernel: Pid: 100081, comm: mdt02_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:31:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:31:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:31:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:31:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:31:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:31:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:31:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:31:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:31:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:31:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:31:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:31:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465503.100081 Apr 28 08:31:43 fir-md1-s2 kernel: Pid: 99385, comm: mdt00_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:31:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:31:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:31:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:31:44 fir-md1-s2 kernel: Pid: 100196, comm: mdt02_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:31:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:31:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:31:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:31:44 fir-md1-s2 kernel: Pid: 100405, comm: mdt02_107 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:31:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:31:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:31:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:31:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:31:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465504.100338 Apr 28 08:31:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465505.100075 Apr 28 08:31:48 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465508.100289 Apr 28 08:32:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465529.100172 Apr 28 08:32:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465535.100473 Apr 28 08:32:23 fir-md1-s2 kernel: LustreError: 100000:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556465453, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a785bcc80/0x4f3cef67caa0b2c6 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 761 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 100000 timeout: 0 lvb_type: 0 Apr 28 08:32:23 fir-md1-s2 kernel: LustreError: 100000:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 139 previous similar messages Apr 28 08:32:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465565.99967 Apr 28 08:33:23 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.17.8@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9878fc751b00/0x4f3cef67ca9ccb63 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 161 type: IBT flags: 0x60200400000020 nid: 10.8.17.8@o2ib6 remote: 0x77b696d11296474b expref: 22 pid: 100373 timeout: 464997 lvb_type: 0 Apr 28 08:33:23 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Apr 28 08:33:23 fir-md1-s2 kernel: Lustre: 99399:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:115s); client may timeout. req@ffff984e437f8f00 x1631715556415888/t0(0) o101->1b18b8bb-1d7d-bf4a-2015-7b0bb8c8364a@10.8.20.34@o2ib6:28/0 lens 576/0 e 0 to 0 dl 1556465488 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 08:33:23 fir-md1-s2 kernel: Lustre: 99399:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12024 previous similar messages Apr 28 08:34:13 fir-md1-s2 kernel: LNet: Service thread pid 100364 was inactive for 200.49s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:34:13 fir-md1-s2 kernel: LNet: Skipped 129 previous similar messages Apr 28 08:34:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465653.100364 Apr 28 08:34:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465658.100366 Apr 28 08:34:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465684.100507 Apr 28 08:34:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465689.100340 Apr 28 08:35:53 fir-md1-s2 kernel: LustreError: 100336:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.17.8@o2ib6: deadline 100:45s ago req@ffff9876ebabb300 x1631559008358256/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556465708 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:35:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Export ffff98770a674800 already connecting from 10.8.7.18@o2ib6 Apr 28 08:35:53 fir-md1-s2 kernel: LustreError: 100336:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 54 previous similar messages Apr 28 08:36:17 fir-md1-s2 kernel: Pid: 100343, comm: mdt03_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:17 fir-md1-s2 kernel: Call Trace: Apr 28 08:36:17 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:17 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:17 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:36:17 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:36:17 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:36:17 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:17 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:17 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:17 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:17 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:36:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465777.100343 Apr 28 08:36:43 fir-md1-s2 kernel: Pid: 100112, comm: mdt01_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:36:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:36:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:36:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:36:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:36:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465804.100112 Apr 28 08:36:44 fir-md1-s2 kernel: Pid: 99551, comm: mdt00_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:36:44 fir-md1-s2 kernel: Pid: 99522, comm: mdt01_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:36:44 fir-md1-s2 kernel: Pid: 100313, comm: mdt01_111 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:36:44 fir-md1-s2 kernel: Call Trace: Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:36:44 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:36:44 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:36:44 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:36:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b4a7dae6-d345-8e9f-e334-9f38da541ca7 (at 10.8.24.33@o2ib6) reconnecting Apr 28 08:36:51 fir-md1-s2 kernel: Lustre: Skipped 10810 previous similar messages Apr 28 08:36:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b4a7dae6-d345-8e9f-e334-9f38da541ca7 (at 10.8.24.33@o2ib6) Apr 28 08:36:51 fir-md1-s2 kernel: Lustre: Skipped 10820 previous similar messages Apr 28 08:37:03 fir-md1-s2 kernel: Lustre: 100369:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff987af7046900 x1631703199315312/t0(0) o101->561f7379-c022-fe96-32b7-a7c9451ec067@10.8.23.18@o2ib6:8/0 lens 584/0 e 0 to 0 dl 1556465828 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 08:37:03 fir-md1-s2 kernel: Lustre: 100369:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13122 previous similar messages Apr 28 08:37:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465835.100453 Apr 28 08:38:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465896.100363 Apr 28 08:38:21 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465901.99167 Apr 28 08:38:23 fir-md1-s2 kernel: LNet: Service thread pid 100093 completed after 1199.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:38:23 fir-md1-s2 kernel: LustreError: 99553:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987adfbe7400 ns: mdt-fir-MDT0003_UUID lock: ffff9863da2d2880/0x4f3cef67caa0004d lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 157 type: IBT flags: 0x50200400000020 nid: 10.8.12.8@o2ib6 remote: 0x62e31ddcb75f1e95 expref: 2 pid: 99553 timeout: 0 lvb_type: 0 Apr 28 08:38:23 fir-md1-s2 kernel: LustreError: 99553:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 6 previous similar messages Apr 28 08:38:23 fir-md1-s2 kernel: LNet: Skipped 185 previous similar messages Apr 28 08:39:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465953.99921 Apr 28 08:39:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465955.100214 Apr 28 08:39:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465957.100191 Apr 28 08:39:19 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465959.99163 Apr 28 08:39:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465968.100259 Apr 28 08:39:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465983.100304 Apr 28 08:39:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556465984.100489 Apr 28 08:41:17 fir-md1-s2 kernel: LNet: Service thread pid 100446 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:41:17 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 28 08:41:17 fir-md1-s2 kernel: Pid: 100446, comm: mdt03_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:41:17 fir-md1-s2 kernel: Call Trace: Apr 28 08:41:17 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:41:17 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:41:17 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:41:17 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:41:17 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:41:17 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:41:17 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:41:17 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:41:17 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:41:17 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:41:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466077.100446 Apr 28 08:41:43 fir-md1-s2 kernel: Pid: 100289, comm: mdt00_040 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:41:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:41:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:41:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:41:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466103.100289 Apr 28 08:41:43 fir-md1-s2 kernel: Pid: 99549, comm: mdt01_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:41:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:41:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:41:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:41:43 fir-md1-s2 kernel: Pid: 99386, comm: mdt01_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:41:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:41:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:41:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:41:43 fir-md1-s2 kernel: Pid: 100241, comm: mdt01_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:41:43 fir-md1-s2 kernel: Call Trace: Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 08:41:43 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:41:43 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:41:43 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:41:43 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:41:52 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466112.99520 Apr 28 08:41:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466118.100173 Apr 28 08:42:15 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466135.100080 Apr 28 08:42:23 fir-md1-s2 kernel: LustreError: 100332:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556466053, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98793ce48d80/0x4f3cef67caa3f5cc lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 835 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 100332 timeout: 0 lvb_type: 0 Apr 28 08:42:23 fir-md1-s2 kernel: LustreError: 100332:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 231 previous similar messages Apr 28 08:42:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466148.99564 Apr 28 08:42:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466165.100293 Apr 28 08:43:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.13.20@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9871a12eb600/0x4f3cef67caa4c19e lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 126 type: IBT flags: 0x60200400000020 nid: 10.8.13.20@o2ib6 remote: 0xdf15508116c78a21 expref: 33 pid: 99164 timeout: 465487 lvb_type: 0 Apr 28 08:43:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Apr 28 08:43:33 fir-md1-s2 kernel: Lustre: 100437:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:58s); client may timeout. req@ffff984d7b363300 x1631560446949648/t0(0) o36->f14e0c40-3917-4ef5-c69b-2021a0de4682@10.9.106.58@o2ib4:5/0 lens 552/0 e 0 to 0 dl 1556466155 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:43:33 fir-md1-s2 kernel: Lustre: 100437:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9390 previous similar messages Apr 28 08:44:13 fir-md1-s2 kernel: LNet: Service thread pid 100332 was inactive for 200.55s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:44:13 fir-md1-s2 kernel: LNet: Skipped 149 previous similar messages Apr 28 08:44:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466253.100332 Apr 28 08:44:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466256.100321 Apr 28 08:44:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466264.100243 Apr 28 08:44:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466265.100373 Apr 28 08:44:29 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466269.99159 Apr 28 08:44:33 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466273.100434 Apr 28 08:44:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466278.100253 Apr 28 08:44:39 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466279.100237 Apr 28 08:44:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466283.100349 Apr 28 08:44:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466289.100466 Apr 28 08:44:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466294.100392 Apr 28 08:44:56 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466296.100265 Apr 28 08:45:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466325.100070 Apr 28 08:46:03 fir-md1-s2 kernel: LustreError: 100001:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.1.9@o2ib6: deadline 100:41s ago req@ffff985f97202450 x1631293692032592/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556466322 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:46:03 fir-md1-s2 kernel: LustreError: 100001:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 38 previous similar messages Apr 28 08:46:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cd075b36-33db-5052-abc8-0d1d7f478890 (at 10.8.30.8@o2ib6) reconnecting Apr 28 08:46:52 fir-md1-s2 kernel: Lustre: Skipped 10865 previous similar messages Apr 28 08:46:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to a9fc322a-89e5-f767-17bb-13992a0ba81e (at 10.8.30.8@o2ib6) Apr 28 08:46:52 fir-md1-s2 kernel: Lustre: Skipped 10874 previous similar messages Apr 28 08:46:53 fir-md1-s2 kernel: Pid: 100375, comm: mdt00_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:46:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:46:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:46:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:46:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466413.100375 Apr 28 08:46:53 fir-md1-s2 kernel: Pid: 100258, comm: mdt01_093 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:46:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:46:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:46:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:46:53 fir-md1-s2 kernel: Pid: 100190, comm: mdt01_062 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:46:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:46:53 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:46:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:46:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:46:54 fir-md1-s2 kernel: Pid: 99265, comm: mdt01_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:46:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:46:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:46:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:46:54 fir-md1-s2 kernel: Pid: 100295, comm: mdt02_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:46:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:46:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:46:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:46:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:47:04 fir-md1-s2 kernel: Lustre: 100237:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9872783bfb00 x1631546034679216/t0(0) o101->d9dead47-18af-0bad-b841-6c3aac9d942a@10.8.28.7@o2ib6:9/0 lens 600/0 e 0 to 0 dl 1556466429 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 08:47:04 fir-md1-s2 kernel: Lustre: 100237:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11548 previous similar messages Apr 28 08:48:33 fir-md1-s2 kernel: LNet: Service thread pid 100040 completed after 299.94s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:48:33 fir-md1-s2 kernel: LustreError: 99406:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987295e28c00 ns: mdt-fir-MDT0003_UUID lock: ffff9860dd3f6540/0x4f3cef67ca9e1c20 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x20/0x0 rrc: 806 type: IBT flags: 0x50200000000000 nid: 10.8.17.16@o2ib6 remote: 0x57db035364054ffe expref: 2 pid: 99406 timeout: 0 lvb_type: 0 Apr 28 08:48:33 fir-md1-s2 kernel: LustreError: 99406:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 28 08:48:33 fir-md1-s2 kernel: LNet: Skipped 89 previous similar messages Apr 28 08:49:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466563.99391 Apr 28 08:49:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466564.100299 Apr 28 08:49:33 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466573.100339 Apr 28 08:51:35 fir-md1-s2 kernel: LustreError: 111412:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.7@o2ib6 arrived at 1556466695 with bad export cookie 5709701647484594350 Apr 28 08:51:53 fir-md1-s2 kernel: LNet: Service thread pid 100273 was inactive for 200.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 08:51:53 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 28 08:51:53 fir-md1-s2 kernel: Pid: 100273, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:51:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:51:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466713.100273 Apr 28 08:51:53 fir-md1-s2 kernel: Pid: 99920, comm: mdt02_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:51:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:51:54 fir-md1-s2 kernel: Pid: 100274, comm: mdt01_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:51:54 fir-md1-s2 kernel: Pid: 100395, comm: mdt00_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:51:54 fir-md1-s2 kernel: Pid: 100456, comm: mdt00_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:51:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:51:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:51:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:51:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:51:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466717.100019 Apr 28 08:51:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466718.100437 Apr 28 08:52:03 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466723.100288 Apr 28 08:52:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466729.100356 Apr 28 08:52:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466732.100357 Apr 28 08:52:33 fir-md1-s2 kernel: LustreError: 100349:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556466663, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff988a92743600/0x4f3cef67caa71f80 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 728 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 100349 timeout: 0 lvb_type: 0 Apr 28 08:52:33 fir-md1-s2 kernel: LustreError: 100349:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 144 previous similar messages Apr 28 08:53:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 120s: evicting client at 10.8.1.18@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff985a0bbb2d00/0x4f3cef67ca9f2e38 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 727 type: IBT flags: 0x60200400000020 nid: 10.8.1.18@o2ib6 remote: 0xcd2073a4eaab2394 expref: 13 pid: 100206 timeout: 466117 lvb_type: 0 Apr 28 08:53:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Apr 28 08:53:33 fir-md1-s2 kernel: Lustre: 100381:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:94s); client may timeout. req@ffff98756c37b900 x1631677301713664/t0(0) o101->1a944508-a353-6d52-a7d4-1133aba4850b@10.9.101.40@o2ib4:29/0 lens 592/0 e 0 to 0 dl 1556466719 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 08:53:33 fir-md1-s2 kernel: Lustre: 100381:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 9287 previous similar messages Apr 28 08:54:23 fir-md1-s2 kernel: LNet: Service thread pid 100115 was inactive for 200.36s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 08:54:23 fir-md1-s2 kernel: LNet: Skipped 121 previous similar messages Apr 28 08:54:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466863.100115 Apr 28 08:54:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466864.100312 Apr 28 08:54:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466865.99905 Apr 28 08:54:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556466893.99552 Apr 28 08:55:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c3bc6053-568c-66f8-54ce-3905f9250318 (at 10.8.10.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986a3c012c00, cur 1556466920 expire 1556466770 last 1556466693 Apr 28 08:56:03 fir-md1-s2 kernel: LustreError: 100334:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.12.18@o2ib6: deadline 100:46s ago req@ffff9873072ce900 x1631496989386832/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556466917 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 08:56:03 fir-md1-s2 kernel: LustreError: 100334:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 31 previous similar messages Apr 28 08:56:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 563efd27-4a35-ce88-f034-285335953b51 (at 10.8.1.12@o2ib6) reconnecting Apr 28 08:56:53 fir-md1-s2 kernel: Lustre: Skipped 10552 previous similar messages Apr 28 08:56:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 563efd27-4a35-ce88-f034-285335953b51 (at 10.8.1.12@o2ib6) Apr 28 08:56:53 fir-md1-s2 kernel: Lustre: Skipped 10564 previous similar messages Apr 28 08:56:53 fir-md1-s2 kernel: Pid: 99395, comm: mdt01_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:56:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:56:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:56:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:56:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467013.99395 Apr 28 08:56:53 fir-md1-s2 kernel: Pid: 100387, comm: mdt00_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:56:53 fir-md1-s2 kernel: Call Trace: Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:56:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:56:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:56:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:56:54 fir-md1-s2 kernel: Pid: 100341, comm: mdt02_097 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:56:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:56:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:56:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:56:54 fir-md1-s2 kernel: Pid: 99380, comm: mdt01_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:56:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:56:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:56:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:56:54 fir-md1-s2 kernel: Pid: 100490, comm: mdt03_086 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 08:56:54 fir-md1-s2 kernel: Call Trace: Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 08:56:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 08:56:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 08:56:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 08:56:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467017.100292 Apr 28 08:57:04 fir-md1-s2 kernel: Lustre: 100277:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff987907a43000 x1631672944389536/t0(0) o101->1bd832b1-9250-3684-b26a-6a1cc941ff1c@10.9.101.20@o2ib4:9/0 lens 1768/0 e 0 to 0 dl 1556467029 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 08:57:04 fir-md1-s2 kernel: Lustre: 100277:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12151 previous similar messages Apr 28 08:58:33 fir-md1-s2 kernel: LNet: Service thread pid 100305 completed after 1509.93s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 08:58:33 fir-md1-s2 kernel: LNet: Skipped 163 previous similar messages Apr 28 08:59:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467163.100310 Apr 28 08:59:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467164.100198 Apr 28 08:59:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467166.100000 Apr 28 08:59:27 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467167.100338 Apr 28 08:59:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467168.100224 Apr 28 08:59:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467186.100395 Apr 28 08:59:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 40ccf322-59bc-e954-acec-c29d71008e88 (at 10.8.12.18@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987707ff5800, cur 1556467190 expire 1556467040 last 1556466963 Apr 28 08:59:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 08:59:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467193.100206 Apr 28 09:00:04 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467204.99385 Apr 28 09:02:23 fir-md1-s2 kernel: LNet: Service thread pid 100297 was inactive for 200.09s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:02:23 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 28 09:02:23 fir-md1-s2 kernel: Pid: 100297, comm: mdt02_076 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:02:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:02:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:02:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:02:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467343.100297 Apr 28 09:02:23 fir-md1-s2 kernel: Pid: 99398, comm: mdt00_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:02:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:02:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:02:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:02:23 fir-md1-s2 kernel: Pid: 100307, comm: mdt02_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:02:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:02:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:02:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:02:23 fir-md1-s2 kernel: Pid: 100456, comm: mdt00_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:02:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:02:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:02:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:02:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:02:24 fir-md1-s2 kernel: Pid: 100463, comm: mdt00_110 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:02:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:02:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:02:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:02:24 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:02:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:02:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:02:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:02:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:02:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:02:33 fir-md1-s2 kernel: LustreError: 100232:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556467263, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9869aeef8240/0x4f3cef67caaaae7c lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 712 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100232 timeout: 0 lvb_type: 0 Apr 28 09:02:33 fir-md1-s2 kernel: LustreError: 100232:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 166 previous similar messages Apr 28 09:02:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467374.100339 Apr 28 09:02:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467375.99547 Apr 28 09:03:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.1.34@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff98880576b180/0x4f3cef67caa12f14 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 714 type: IBT flags: 0x60200400000020 nid: 10.8.1.34@o2ib6 remote: 0xd559f0cd96564cf0 expref: 11 pid: 100091 timeout: 466807 lvb_type: 0 Apr 28 09:03:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 7 previous similar messages Apr 28 09:03:33 fir-md1-s2 kernel: Lustre: 99987:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff98692e64ce00 x1631738967868176/t0(0) o101->e3a95d5e-2945-1bb1-dd2c-d936b00a965b@10.8.10.4@o2ib6:3/0 lens 376/0 e 0 to 0 dl 1556467293 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:03:33 fir-md1-s2 kernel: Lustre: 99987:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8987 previous similar messages Apr 28 09:04:23 fir-md1-s2 kernel: LNet: Service thread pid 100232 was inactive for 200.42s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:04:23 fir-md1-s2 kernel: LNet: Skipped 137 previous similar messages Apr 28 09:04:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467463.100232 Apr 28 09:04:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467464.99387 Apr 28 09:04:29 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467469.100385 Apr 28 09:04:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467482.100366 Apr 28 09:04:59 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467499.100390 Apr 28 09:05:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467554.100132 Apr 28 09:06:03 fir-md1-s2 kernel: LustreError: 100331:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.1.34@o2ib6: deadline 100:45s ago req@ffff98774928d400 x1631546224349824/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556467518 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:06:03 fir-md1-s2 kernel: LustreError: 100331:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 13 previous similar messages Apr 28 09:06:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 98e3664a-d5ef-eae6-fd14-b2cecf487ed2 (at 10.8.12.10@o2ib6) reconnecting Apr 28 09:06:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 9f8b6381-edaa-1164-03e5-a3250d7c2548 (at 10.9.101.56@o2ib4) Apr 28 09:06:53 fir-md1-s2 kernel: Lustre: Skipped 11820 previous similar messages Apr 28 09:06:53 fir-md1-s2 kernel: Lustre: Skipped 11816 previous similar messages Apr 28 09:06:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467613.100356 Apr 28 09:07:04 fir-md1-s2 kernel: Lustre: 100192:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff986a8f9c3900 x1631647168131312/t0(0) o101->ddea348b-e5a4-5330-325a-755d459e8dda@10.9.107.57@o2ib4:9/0 lens 584/0 e 1 to 0 dl 1556467629 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:07:04 fir-md1-s2 kernel: Lustre: 100192:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12785 previous similar messages Apr 28 09:07:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 761fd9f8-5106-f638-0275-47efaff85a15 (at 10.8.1.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986710265400, cur 1556467640 expire 1556467490 last 1556467413 Apr 28 09:07:28 fir-md1-s2 kernel: LustreError: 99878:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.1.22@o2ib6 arrived at 1556467648 with bad export cookie 5709701647484591641 Apr 28 09:09:23 fir-md1-s2 kernel: Pid: 99987, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:09:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:09:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:09:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:09:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:09:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:09:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467763.99987 Apr 28 09:09:23 fir-md1-s2 kernel: Pid: 100227, comm: mdt01_078 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:09:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:09:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:09:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:09:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:09:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:09:24 fir-md1-s2 kernel: Pid: 99529, comm: mdt02_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:09:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:09:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:09:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:09:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:09:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:09:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:09:33 fir-md1-s2 kernel: LNet: Service thread pid 100393 completed after 2019.95s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:09:33 fir-md1-s2 kernel: LustreError: 100315:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff98791ea9c800 ns: mdt-fir-MDT0003_UUID lock: ffff9865d66e72c0/0x4f3cef67caa201bc lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 705 type: IBT flags: 0x50200400000020 nid: 10.8.18.28@o2ib6 remote: 0x873148a4099a5af8 expref: 2 pid: 100315 timeout: 0 lvb_type: 0 Apr 28 09:09:33 fir-md1-s2 kernel: LustreError: 100315:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 10 previous similar messages Apr 28 09:09:33 fir-md1-s2 kernel: LNet: Skipped 52 previous similar messages Apr 28 09:09:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1135836c-5fb6-92af-ade3-8ef6cf526018 (at 10.8.27.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98686f2df800, cur 1556467790 expire 1556467640 last 1556467563 Apr 28 09:09:54 fir-md1-s2 kernel: Pid: 100449, comm: mdt00_105 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:09:54 fir-md1-s2 kernel: Call Trace: Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:09:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:09:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:09:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467794.100449 Apr 28 09:09:54 fir-md1-s2 kernel: Pid: 100331, comm: mdt02_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:09:54 fir-md1-s2 kernel: Call Trace: Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:09:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:09:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:09:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:10:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467823.99167 Apr 28 09:11:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467885.100507 Apr 28 09:11:51 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467911.100290 Apr 28 09:11:56 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467916.100340 Apr 28 09:12:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467943.99389 Apr 28 09:12:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467973.100268 Apr 28 09:12:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556467974.100313 Apr 28 09:13:33 fir-md1-s2 kernel: LustreError: 100296:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556467923, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9863aa7760c0/0x4f3cef67caad3c9a lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 720 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 100296 timeout: 0 lvb_type: 0 Apr 28 09:13:33 fir-md1-s2 kernel: LustreError: 100296:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 84 previous similar messages Apr 28 09:14:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.20@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986aaa25aac0/0x4f3cef67caaccf95 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 108 type: IBT flags: 0x60200400000020 nid: 10.8.12.20@o2ib6 remote: 0x5db4eed49926c57f expref: 18 pid: 100245 timeout: 467467 lvb_type: 0 Apr 28 09:14:33 fir-md1-s2 kernel: LustreError: 99143:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 9 previous similar messages Apr 28 09:14:33 fir-md1-s2 kernel: Lustre: 99379:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:146s); client may timeout. req@ffff986a25d39e00 x1631734870581616/t0(0) o101->b2ea4b62-1b0e-1f82-5376-2b6f23c901d4@10.8.26.18@o2ib6:7/0 lens 576/0 e 0 to 0 dl 1556467927 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:14:33 fir-md1-s2 kernel: Lustre: 99379:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12091 previous similar messages Apr 28 09:15:23 fir-md1-s2 kernel: LNet: Service thread pid 99921 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:15:23 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 28 09:15:23 fir-md1-s2 kernel: Pid: 99921, comm: mdt02_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:15:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:15:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:15:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:15:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468123.99921 Apr 28 09:15:23 fir-md1-s2 kernel: Pid: 99403, comm: mdt02_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:15:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:15:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:15:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:15:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:15:24 fir-md1-s2 kernel: Pid: 100346, comm: mdt00_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:15:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:15:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:15:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:15:24 fir-md1-s2 kernel: Pid: 100205, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:15:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:15:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:15:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:15:24 fir-md1-s2 kernel: Pid: 99548, comm: mdt01_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:15:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:15:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:15:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:15:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:15:24 fir-md1-s2 kernel: LNet: Service thread pid 100296 was inactive for 201.05s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:15:24 fir-md1-s2 kernel: LNet: Skipped 55 previous similar messages Apr 28 09:15:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d36980b7-2b04-f724-0e6b-cf989e4d7da2 (at 10.8.1.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9874b0bacc00, cur 1556468150 expire 1556468000 last 1556467923 Apr 28 09:15:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:16:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468185.99166 Apr 28 09:16:30 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468190.99892 Apr 28 09:16:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6a8eb6f9-856d-6387-d01b-f8aa660b75d4 (at 10.9.106.7@o2ib4) reconnecting Apr 28 09:16:53 fir-md1-s2 kernel: Lustre: Skipped 12578 previous similar messages Apr 28 09:16:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 6a8eb6f9-856d-6387-d01b-f8aa660b75d4 (at 10.9.106.7@o2ib4) Apr 28 09:16:53 fir-md1-s2 kernel: Lustre: Skipped 12589 previous similar messages Apr 28 09:16:56 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468216.100363 Apr 28 09:16:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468218.100347 Apr 28 09:17:03 fir-md1-s2 kernel: LustreError: 100409:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.12.25@o2ib6: deadline 100:59s ago req@ffff985b1895a100 x1631300261287584/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556468164 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:17:03 fir-md1-s2 kernel: LustreError: 100409:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 71 previous similar messages Apr 28 09:17:05 fir-md1-s2 kernel: Lustre: 100377:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9879e0756900 x1631586439777392/t0(0) o101->f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb@10.9.112.4@o2ib4:10/0 lens 592/0 e 0 to 0 dl 1556468230 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:17:05 fir-md1-s2 kernel: Lustre: 100377:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14519 previous similar messages Apr 28 09:17:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468273.99163 Apr 28 09:18:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e6c11c37-f182-6c12-4d0c-27449ff35ca8 (at 10.8.17.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985a90a1c000, cur 1556468300 expire 1556468150 last 1556468073 Apr 28 09:18:20 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 28 09:19:33 fir-md1-s2 kernel: LNet: Service thread pid 100007 completed after 599.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:19:33 fir-md1-s2 kernel: LustreError: 99394:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987531355400 ns: mdt-fir-MDT0003_UUID lock: ffff988982337bc0/0x4f3cef67caa275ac lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 718 type: IBT flags: 0x50200400000020 nid: 10.8.30.11@o2ib6 remote: 0x2eb8a7f59e2bca8b expref: 2 pid: 99394 timeout: 0 lvb_type: 0 Apr 28 09:19:33 fir-md1-s2 kernel: LustreError: 99394:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 09:19:33 fir-md1-s2 kernel: LNet: Skipped 44 previous similar messages Apr 28 09:20:23 fir-md1-s2 kernel: Pid: 100330, comm: mdt02_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:20:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:20:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:20:23 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:20:23 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:20:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468423.100330 Apr 28 09:20:23 fir-md1-s2 kernel: Pid: 100259, comm: mdt02_065 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:20:23 fir-md1-s2 kernel: Call Trace: Apr 28 09:20:23 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:20:23 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:20:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:20:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:20:24 fir-md1-s2 kernel: Pid: 99379, comm: mdt01_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:20:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:20:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:20:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:20:24 fir-md1-s2 kernel: Pid: 100409, comm: mdt00_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:20:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:20:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:20:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:20:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:22:33 fir-md1-s2 kernel: Lustre: Failing over fir-MDT0003 Apr 28 09:22:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:22:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.105.48@o2ib4 (stopping) Apr 28 09:22:33 fir-md1-s2 kernel: Lustre: Skipped 34 previous similar messages Apr 28 09:22:33 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_disconnect to node 0@lo failed: rc = -107 Apr 28 09:22:33 fir-md1-s2 kernel: LustreError: 100743:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.108.7@o2ib4 arrived at 1556468553 with bad export cookie 5709701649190925361 Apr 28 09:22:33 fir-md1-s2 kernel: LustreError: 100289:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987aa48eb400 ns: mdt-fir-MDT0003_UUID lock: ffff984d1f79c5c0/0x4f3cef67caa2f017 lrc: 3/0,0 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 564 type: IBT flags: 0x50306400000020 nid: 10.8.30.24@o2ib6 remote: 0xc9926b7fd7150e68 expref: 6 pid: 100289 timeout: 0 lvb_type: 0 Apr 28 09:22:33 fir-md1-s2 kernel: LustreError: 98996:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff988b0deb0000 x1631887136019696/t0(0) o41->fir-MDT0000-osp-MDT0001@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:22:33 fir-md1-s2 kernel: LustreError: 98996:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 62 previous similar messages Apr 28 09:22:33 fir-md1-s2 kernel: LustreError: 19122:0:(ldlm_resource.c:1146:ldlm_resource_complain()) mdt-fir-MDT0003_UUID: namespace resource [0x28001b768:0x1b5d0:0x0].0x0 (ffff98887c7fa0c0) refcount nonzero (359) after lock cleanup; forcing cleanup. Apr 28 09:22:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.8.27.19@o2ib6 (stopping) Apr 28 09:22:33 fir-md1-s2 kernel: Lustre: Skipped 757 previous similar messages Apr 28 09:22:34 fir-md1-s2 kernel: LustreError: 100743:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.35@o2ib4 arrived at 1556468554 with bad export cookie 5709701647484594329 Apr 28 09:22:34 fir-md1-s2 kernel: LustreError: 100743:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 28 09:22:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.8.27.21@o2ib6 (stopping) Apr 28 09:22:34 fir-md1-s2 kernel: Lustre: Skipped 121 previous similar messages Apr 28 09:22:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.8.7.25@o2ib6 (stopping) Apr 28 09:22:36 fir-md1-s2 kernel: Lustre: Skipped 85 previous similar messages Apr 28 09:22:37 fir-md1-s2 kernel: LustreError: 111456:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.61@o2ib4 arrived at 1556468557 with bad export cookie 5709701647484590696 Apr 28 09:22:37 fir-md1-s2 kernel: LustreError: 111456:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 28 09:22:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.8.10.18@o2ib6 (stopping) Apr 28 09:22:40 fir-md1-s2 kernel: Lustre: Skipped 253 previous similar messages Apr 28 09:22:43 fir-md1-s2 kernel: LustreError: 102884:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.21.6@o2ib6 arrived at 1556468563 with bad export cookie 5709701647484582170 Apr 28 09:22:43 fir-md1-s2 kernel: LustreError: 102884:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Apr 28 09:22:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.8.27.6@o2ib6 (stopping) Apr 28 09:22:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.27.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:22:49 fir-md1-s2 kernel: Lustre: Skipped 716 previous similar messages Apr 28 09:22:49 fir-md1-s2 kernel: Lustre: server umount fir-MDT0003 complete Apr 28 09:22:51 fir-md1-s2 kernel: LustreError: 111417:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.32@o2ib4 arrived at 1556468571 with bad export cookie 5709701647484593167 Apr 28 09:22:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.60@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:22:53 fir-md1-s2 kernel: LustreError: Skipped 175 previous similar messages Apr 28 09:23:01 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.109.3@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:23:01 fir-md1-s2 kernel: LustreError: Skipped 342 previous similar messages Apr 28 09:23:02 fir-md1-s2 kernel: LustreError: 111417:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.54@o2ib4 arrived at 1556468582 with bad export cookie 5709701647484592544 Apr 28 09:23:10 fir-md1-s2 kernel: Lustre: server umount fir-MDT0001 complete Apr 28 09:23:28 fir-md1-s2 kernel: LNetError: 7254:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.203@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:23:29 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.211@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:23:29 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 15 previous similar messages Apr 28 09:23:30 fir-md1-s2 kernel: LNet: Removed LNI 10.0.10.52@o2ib7 Apr 28 09:24:02 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 28 09:24:02 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 28 09:24:02 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl7 Apr 28 09:24:02 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 28 09:24:03 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 28 09:25:05 fir-md1-s2 kernel: LNetError: 19584:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Apr 28 09:25:05 fir-md1-s2 kernel: LNetError: 19582:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Apr 28 09:25:05 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:25:05 fir-md1-s2 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Apr 28 09:25:05 fir-md1-s2 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:25:05 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:25:05 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.106.46@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:25:05 fir-md1-s2 kernel: LustreError: Skipped 3 previous similar messages Apr 28 09:25:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.106.62@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:25:06 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Apr 28 09:25:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.112.12@o2ib4 (not set up) Apr 28 09:25:07 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.19.2@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:25:07 fir-md1-s2 kernel: LustreError: Skipped 27 previous similar messages Apr 28 09:25:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.112.11@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:25:09 fir-md1-s2 kernel: LustreError: Skipped 72 previous similar messages Apr 28 09:25:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:25:13 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 28 09:25:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:25:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 28 09:25:13 fir-md1-s2 kernel: LustreError: Skipped 162 previous similar messages Apr 28 09:25:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.8.20.3@o2ib6 (not set up) Apr 28 09:25:13 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 28 09:25:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 28 09:25:14 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: fir-MDD0003: changelog on Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: in recovery but waiting for the first client to connect Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.3@o2ib6) Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.18@o2ib6) Apr 28 09:25:14 fir-md1-s2 kernel: Lustre: Skipped 104 previous similar messages Apr 28 09:25:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 054d1548-cdcc-8b5b-1ec4-5ec77e76503f (at 10.8.12.7@o2ib6) Apr 28 09:25:15 fir-md1-s2 kernel: Lustre: Skipped 147 previous similar messages Apr 28 09:25:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 8455dbbf-4366-afd6-29b8-dc2a91bfd5f9 (at 10.8.11.12@o2ib6) Apr 28 09:25:16 fir-md1-s2 kernel: Lustre: Skipped 93 previous similar messages Apr 28 09:25:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2943d7c9-ecf1-ed5a-9d88-3a1d89520529 (at 10.8.31.9@o2ib6) Apr 28 09:25:18 fir-md1-s2 kernel: Lustre: Skipped 84 previous similar messages Apr 28 09:25:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client fa50e0be-bf58-6b43-a8b3-a284779ef524(at 10.8.13.7@o2ib6), waiting for 1327 known clients (364 recovered, 20 in progress, and 0 evicted) already passed deadline 2:35 Apr 28 09:25:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to d930e61d-e56c-e6c2-e16a-4d6a026ada3e (at 10.9.107.7@o2ib4) Apr 28 09:25:22 fir-md1-s2 kernel: Lustre: Skipped 1394 previous similar messages Apr 28 09:25:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.101@o2ib7 (at 10.0.10.101@o2ib7) Apr 28 09:25:35 fir-md1-s2 kernel: Lustre: Skipped 861 previous similar messages Apr 28 09:25:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client fa50e0be-bf58-6b43-a8b3-a284779ef524(at 10.8.13.7@o2ib6), waiting for 1327 known clients (1253 recovered, 73 in progress, and 0 evicted) already passed deadline 3:05 Apr 28 09:25:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.104@o2ib7 (at 10.0.10.104@o2ib7) Apr 28 09:25:54 fir-md1-s2 kernel: Lustre: Skipped 22 previous similar messages Apr 28 09:26:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery already passed deadline 1:38, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 09:26:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:26:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 0:51, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:26:29 fir-md1-s2 kernel: Lustre: 19850:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff988b36ba3c00 x1631558717283696/t0(0) o101->8e6278f9-7305-b217-3b1f-cfd02c7696e0@10.9.105.2@o2ib4:4/0 lens 576/3264 e 0 to 0 dl 1556468794 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:26:29 fir-md1-s2 kernel: Lustre: 19850:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1277 previous similar messages Apr 28 09:26:30 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff985be9a1e600 x1631386353490304/t0(0) o101->6613ceb1-c5ef-6955-40ef-3ea9b9998320@10.8.12.8@o2ib6:5/0 lens 576/0 e 0 to 0 dl 1556468795 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:26:30 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Apr 28 09:26:31 fir-md1-s2 kernel: Lustre: 19799:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b365b4500 x1631534819047760/t0(0) o101->32b4877a-e8a3-e77c-9d56-903e3045a875@10.8.17.22@o2ib6:6/0 lens 576/0 e 0 to 0 dl 1556468796 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:26:31 fir-md1-s2 kernel: Lustre: 19799:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Apr 28 09:26:34 fir-md1-s2 kernel: Lustre: 19799:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9867f4671b00 x1631782093809904/t0(0) o101->d2ab40ab-8888-3abb-75f9-9c32b2196967@10.8.26.26@o2ib6:9/0 lens 576/0 e 0 to 0 dl 1556468799 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:26:34 fir-md1-s2 kernel: Lustre: 19799:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 28 09:26:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 1029f32e-c536-81b0-6441-16ee4f005637 (at 10.8.22.9@o2ib6) reconnecting Apr 28 09:26:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.22.9@o2ib6) Apr 28 09:26:35 fir-md1-s2 kernel: Lustre: Skipped 22 previous similar messages Apr 28 09:26:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b2ea4b62-1b0e-1f82-5376-2b6f23c901d4 (at 10.8.26.18@o2ib6) reconnecting Apr 28 09:26:35 fir-md1-s2 kernel: Lustre: Skipped 571 previous similar messages Apr 28 09:26:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 617d800a-afeb-08ed-bb4c-9f77025769ad (at 10.8.25.17@o2ib6) reconnecting Apr 28 09:26:36 fir-md1-s2 kernel: Lustre: Skipped 226 previous similar messages Apr 28 09:26:39 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b38fb9800 x1631564140100128/t0(0) o101->d4b9204a-0e97-30e6-4a34-9148370f1203@10.9.102.64@o2ib4:14/0 lens 576/0 e 0 to 0 dl 1556468804 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:26:39 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Apr 28 09:26:40 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b4a7dae6-d345-8e9f-e334-9f38da541ca7 (at 10.8.24.33@o2ib6) reconnecting Apr 28 09:26:40 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 28 09:26:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client fa50e0be-bf58-6b43-a8b3-a284779ef524 (at 10.8.13.7@o2ib6) reconnecting Apr 28 09:26:45 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Apr 28 09:26:47 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98678deb6900 x1631542888591536/t0(0) o41->be42b497-ab1b-8d58-3101-014aad577cfc@10.8.27.35@o2ib6:22/0 lens 440/0 e 0 to 0 dl 1556468812 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:26:47 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Apr 28 09:26:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9919f65b-4eda-9eb4-10ee-dff59fb4040e (at 10.9.107.54@o2ib4) reconnecting Apr 28 09:26:54 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Apr 28 09:27:03 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a225dc800 x1631584799027088/t0(0) o101->c4e07e51-5d48-0b06-6fb4-3c726822dfcb@10.9.103.38@o2ib4:8/0 lens 576/0 e 0 to 0 dl 1556468828 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:27:03 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 715 previous similar messages Apr 28 09:27:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client fb4eab6b-7253-4fde-536d-07e03dd4756a (at 10.8.21.35@o2ib6) reconnecting Apr 28 09:27:11 fir-md1-s2 kernel: Lustre: Skipped 799 previous similar messages Apr 28 09:27:34 fir-md1-s2 kernel: LustreError: 20654:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468764, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987ac8ec4800/0xdab803113fd57c70 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 563 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20654 timeout: 0 lvb_type: 0 Apr 28 09:27:34 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468854.20383 Apr 28 09:27:34 fir-md1-s2 kernel: LustreError: 20654:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 134 previous similar messages Apr 28 09:27:36 fir-md1-s2 kernel: LustreError: 20413:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468766, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98557ea869c0/0xdab803113fd67f38 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 563 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20413 timeout: 0 lvb_type: 0 Apr 28 09:27:36 fir-md1-s2 kernel: LustreError: 20413:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 146 previous similar messages Apr 28 09:27:36 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a09e9f200 x1631782093809904/t0(0) o101->d2ab40ab-8888-3abb-75f9-9c32b2196967@10.8.26.26@o2ib6:11/0 lens 576/0 e 0 to 0 dl 1556468861 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 09:27:36 fir-md1-s2 kernel: Lustre: 20227:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 728 previous similar messages Apr 28 09:27:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3d3c7601-6f5c-0785-32fb-c3b7bdfc3b1a (at 10.8.18.25@o2ib6) Apr 28 09:27:39 fir-md1-s2 kernel: Lustre: Skipped 2463 previous similar messages Apr 28 09:27:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6137bba0-34c0-9107-d068-27095ef10964 (at 10.8.22.23@o2ib6) reconnecting Apr 28 09:27:43 fir-md1-s2 kernel: Lustre: Skipped 831 previous similar messages Apr 28 09:28:05 fir-md1-s2 kernel: LustreError: 19900:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468795, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9857e13f6e40/0xdab803113fd68478 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 567 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 19900 timeout: 0 lvb_type: 0 Apr 28 09:28:05 fir-md1-s2 kernel: LustreError: 19900:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 28 09:28:34 fir-md1-s2 kernel: LustreError: 19781:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.8.4@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986b11bea400/0xdab803113fd56dc1 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 569 type: IBT flags: 0x60000400000020 nid: 10.8.8.4@o2ib6 remote: 0x379ffb18bb7cc298 expref: 11 pid: 20240 timeout: 468308 lvb_type: 0 Apr 28 09:28:34 fir-md1-s2 kernel: Lustre: 20567:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff985f3e6b9e00 x1631738146530528/t0(0) o101->cb7e57fe-8059-4e8c-8618-95db5afecaa6@10.8.24.21@o2ib6:4/0 lens 608/0 e 0 to 0 dl 1556468794 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:28:34 fir-md1-s2 kernel: Lustre: 20567:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2849 previous similar messages Apr 28 09:28:36 fir-md1-s2 kernel: LustreError: 20547:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468826, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a8e28a880/0xdab803113fd6ec4b lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 476 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20547 timeout: 0 lvb_type: 0 Apr 28 09:28:36 fir-md1-s2 kernel: LustreError: 20547:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Apr 28 09:28:41 fir-md1-s2 kernel: Lustre: 20429:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986906726000 x1631534853779264/t0(0) o101->f2a69869-02b0-7522-b419-5cd488a596b0@10.8.18.12@o2ib6:16/0 lens 576/0 e 0 to 0 dl 1556468926 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 09:28:41 fir-md1-s2 kernel: Lustre: 20429:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1407 previous similar messages Apr 28 09:28:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client c3c40050-eaa9-7723-3ebf-2e621d4ba656 (at 10.9.102.55@o2ib4) reconnecting Apr 28 09:28:48 fir-md1-s2 kernel: Lustre: Skipped 1460 previous similar messages Apr 28 09:29:05 fir-md1-s2 kernel: Lustre: 20496:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:27s); client may timeout. req@ffff986abd839800 x1631542498522224/t0(0) o101->d63616e9-cace-0db0-1e58-00c83619e62e@10.8.7.24@o2ib6:8/0 lens 480/0 e 0 to 0 dl 1556468918 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:29:05 fir-md1-s2 kernel: LustreError: 20429:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.2.32@o2ib6: deadline 30:1s ago req@ffff986a3826f200 x1631642091817600/t0(0) o101->69b8a659-e484-e102-dbbc-9c1dda44d528@10.8.2.32@o2ib6:4/0 lens 376/0 e 0 to 0 dl 1556468944 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:29:05 fir-md1-s2 kernel: Lustre: 20496:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 595 previous similar messages Apr 28 09:29:07 fir-md1-s2 kernel: LustreError: 20220:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468857, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a1af86780/0xdab803113fd74124 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 553 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 20220 timeout: 0 lvb_type: 0 Apr 28 09:29:07 fir-md1-s2 kernel: LustreError: 20220:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 28 09:29:24 fir-md1-s2 kernel: LNet: Service thread pid 20231 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:29:24 fir-md1-s2 kernel: Pid: 20231, comm: mdt01_036 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:29:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:29:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468965.20231 Apr 28 09:29:25 fir-md1-s2 kernel: Pid: 20411, comm: mdt01_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:29:25 fir-md1-s2 kernel: Pid: 20517, comm: mdt01_083 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:29:25 fir-md1-s2 kernel: Pid: 20500, comm: mdt01_079 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:29:25 fir-md1-s2 kernel: LNet: Service thread pid 20484 was inactive for 200.80s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:29:25 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 09:29:25 fir-md1-s2 kernel: Pid: 20484, comm: mdt00_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:29:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:29:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:29:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:29:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:29:25 fir-md1-s2 kernel: LNet: Service thread pid 20237 was inactive for 200.95s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:29:26 fir-md1-s2 kernel: LNet: Service thread pid 20413 was inactive for 200.45s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:29:26 fir-md1-s2 kernel: LNet: Skipped 202 previous similar messages Apr 28 09:29:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556468966.20413 Apr 28 09:29:38 fir-md1-s2 kernel: LustreError: 19902:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556468888, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9872933c9b00/0xdab803113fd7cbcd lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 581 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 19902 timeout: 0 lvb_type: 0 Apr 28 09:29:38 fir-md1-s2 kernel: LustreError: 19902:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 28 09:29:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.17.11@o2ib6) Apr 28 09:29:47 fir-md1-s2 kernel: Lustre: Skipped 3030 previous similar messages Apr 28 09:29:47 fir-md1-s2 kernel: Lustre: Failing over fir-MDT0003 Apr 28 09:29:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:29:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.102.32@o2ib4 (stopping) Apr 28 09:29:47 fir-md1-s2 kernel: Lustre: Skipped 56 previous similar messages Apr 28 09:29:48 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0003-osp-MDT0001: operation mds_disconnect to node 0@lo failed: rc = -107 Apr 28 09:29:48 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 09:29:48 fir-md1-s2 kernel: LustreError: 20440:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9859b1796c00 ns: mdt-fir-MDT0003_UUID lock: ffff9861ed718900/0xdab803113fd5a829 lrc: 3/0,0 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 507 type: IBT flags: 0x50306400000020 nid: 10.8.12.27@o2ib6 remote: 0xc9009d550c033b72 expref: 7 pid: 20440 timeout: 0 lvb_type: 0 Apr 28 09:29:48 fir-md1-s2 kernel: LNet: Service thread pid 20638 completed after 223.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:29:48 fir-md1-s2 kernel: Lustre: 20418:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:39s); client may timeout. req@ffff986180a5e300 x1631546320174384/t0(0) o101->a1402e9b-5e48-acd3-204d-e410e8c1eb0b@10.8.2.28@o2ib6:9/0 lens 576/0 e 0 to 0 dl 1556468949 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:29:48 fir-md1-s2 kernel: LustreError: 19795:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff985af18bb300 x1632075662875296/t0(0) o105->fir-MDT0003@10.8.7.23@o2ib6:15/16 lens 360/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:29:48 fir-md1-s2 kernel: LNet: Skipped 198 previous similar messages Apr 28 09:29:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.7.2@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:29:48 fir-md1-s2 kernel: Lustre: server umount fir-MDT0001 complete Apr 28 09:29:49 fir-md1-s2 kernel: LustreError: 19644:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff987afdd02700 x1632075662875408/t0(0) o41->fir-MDT0001-osp-MDT0003@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:29:49 fir-md1-s2 kernel: LustreError: 19644:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 6 previous similar messages Apr 28 09:29:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.103.21@o2ib4 (stopping) Apr 28 09:29:49 fir-md1-s2 kernel: Lustre: Skipped 851 previous similar messages Apr 28 09:29:50 fir-md1-s2 kernel: Lustre: server umount fir-MDT0003 complete Apr 28 09:29:52 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.203@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:29:52 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.202@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:29:52 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 12 previous similar messages Apr 28 09:29:54 fir-md1-s2 kernel: LNet: Removed LNI 10.0.10.52@o2ib7 Apr 28 09:30:05 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 28 09:30:05 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 28 09:30:05 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl7 Apr 28 09:30:05 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 28 09:30:05 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 28 09:31:08 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:31:08 fir-md1-s2 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Apr 28 09:31:08 fir-md1-s2 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:31:08 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:31:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.20.5@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:31:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.23.35@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:31:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.107.13@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:31:13 fir-md1-s2 kernel: LustreError: Skipped 86 previous similar messages Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.108.12@o2ib4 (not set up) Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.108.25@o2ib4 (not set up) Apr 28 09:31:13 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Apr 28 09:31:13 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 09:31:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.25@o2ib4) Apr 28 09:31:14 fir-md1-s2 kernel: Lustre: Skipped 28 previous similar messages Apr 28 09:31:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to ae1c0bb0-8745-5d9b-0fca-9b849d57aa4e (at 10.8.9.10@o2ib6) Apr 28 09:31:15 fir-md1-s2 kernel: Lustre: Skipped 20 previous similar messages Apr 28 09:31:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 9a1469ab-c675-66e9-07cc-7b69a63273a8 (at 10.9.101.2@o2ib4) Apr 28 09:31:16 fir-md1-s2 kernel: Lustre: Skipped 34 previous similar messages Apr 28 09:31:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 75a243e1-9f6a-0fab-ec0e-ce32dad51415 (at 10.9.106.71@o2ib4) Apr 28 09:31:18 fir-md1-s2 kernel: Lustre: Skipped 25 previous similar messages Apr 28 09:31:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 74b6aa20-82e6-c3c9-1fe9-141d2e6b56e2 (at 10.8.26.30@o2ib6) Apr 28 09:31:22 fir-md1-s2 kernel: Lustre: Skipped 1921 previous similar messages Apr 28 09:31:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.101@o2ib7 (at 10.0.10.101@o2ib7) Apr 28 09:31:32 fir-md1-s2 kernel: Lustre: Skipped 608 previous similar messages Apr 28 09:31:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.105@o2ib7 (at 10.0.10.105@o2ib7) Apr 28 09:31:48 fir-md1-s2 kernel: Lustre: Skipped 47 previous similar messages Apr 28 09:32:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery already passed deadline 1:39, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 09:32:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 0:51, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:32:29 fir-md1-s2 kernel: Lustre: 22531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff987a6a7fb600 x1631542125371392/t0(0) o101->40b3c666-85bb-7cc6-dce2-ca98ff07da91@10.9.109.6@o2ib4:4/0 lens 576/3264 e 0 to 0 dl 1556469154 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:32:29 fir-md1-s2 kernel: Lustre: 22531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1306 previous similar messages Apr 28 09:32:30 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b3abfda00 x1631752207756384/t0(0) o101->ab8fb752-7566-0b9b-4be7-749799d2e5da@10.8.24.22@o2ib6:5/0 lens 584/0 e 0 to 0 dl 1556469155 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:32:30 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 85 previous similar messages Apr 28 09:32:31 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98699e933f00 x1631584799062224/t0(0) o101->c4e07e51-5d48-0b06-6fb4-3c726822dfcb@10.9.103.38@o2ib4:6/0 lens 568/0 e 0 to 0 dl 1556469156 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:32:31 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Apr 28 09:32:33 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9866aa61bc00 x1631727693921680/t0(0) o48->bee493fe-0255-e0d8-44b9-deb38a2dee88@10.9.0.61@o2ib4:8/0 lens 336/0 e 0 to 0 dl 1556469158 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:32:33 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Apr 28 09:32:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b7aae4ae-1aa0-9e5d-5ecf-90e4dbcd33de (at 10.9.101.27@o2ib4) reconnecting Apr 28 09:32:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b7aae4ae-1aa0-9e5d-5ecf-90e4dbcd33de (at 10.9.101.27@o2ib4) Apr 28 09:32:35 fir-md1-s2 kernel: Lustre: Skipped 44 previous similar messages Apr 28 09:32:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 204a4878-5b36-a4b5-059d-1539f9790a43 (at 10.8.22.16@o2ib6) reconnecting Apr 28 09:32:35 fir-md1-s2 kernel: Lustre: Skipped 23 previous similar messages Apr 28 09:32:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client e3a95d5e-2945-1bb1-dd2c-d936b00a965b (at 10.8.10.4@o2ib6) reconnecting Apr 28 09:32:36 fir-md1-s2 kernel: Lustre: Skipped 770 previous similar messages Apr 28 09:32:37 fir-md1-s2 kernel: Lustre: 22788:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a03c3f500 x1631534756374128/t0(0) o101->6f801a74-3cb3-fd27-e437-ef070629adb4@10.9.106.65@o2ib4:12/0 lens 576/0 e 0 to 0 dl 1556469162 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:32:37 fir-md1-s2 kernel: Lustre: 22788:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Apr 28 09:32:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c0afed87-894c-bd68-b6a7-ca4f7af5df99 (at 10.9.103.18@o2ib4) reconnecting Apr 28 09:32:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 28 09:32:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d0ad499e-df43-3f0b-81b1-4bf6eb44f98c (at 10.8.26.29@o2ib6) reconnecting Apr 28 09:32:44 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 28 09:32:45 fir-md1-s2 kernel: Lustre: 22788:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986995292a00 x1631633101800944/t0(0) o101->7dd0e29d-7256-3003-de53-722d13c944f1@10.8.22.13@o2ib6:20/0 lens 576/0 e 0 to 0 dl 1556469170 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:32:45 fir-md1-s2 kernel: Lustre: 22788:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Apr 28 09:32:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 7bfce0a8-4699-768c-99a3-fd7ddcb9f2aa (at 10.9.103.4@o2ib4) reconnecting Apr 28 09:32:54 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 28 09:33:01 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a359a1200 x1631676102764912/t0(0) o101->24c7f96f-dc04-c900-43d2-4e23d648bf07@10.9.103.35@o2ib4:6/0 lens 592/0 e 0 to 0 dl 1556469186 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:33:01 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 823 previous similar messages Apr 28 09:33:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 64c1669e-ae91-e030-4a6b-557cfa991c99 (at 10.9.102.22@o2ib4) reconnecting Apr 28 09:33:11 fir-md1-s2 kernel: Lustre: Skipped 807 previous similar messages Apr 28 09:33:33 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a2139fb00 x1631584799068672/t0(0) o101->c4e07e51-5d48-0b06-6fb4-3c726822dfcb@10.9.103.38@o2ib4:8/0 lens 576/0 e 0 to 0 dl 1556469218 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:33:33 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 694 previous similar messages Apr 28 09:33:34 fir-md1-s2 kernel: LustreError: 22316:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469124, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98719d639440/0x35bc67a6e0f2bf3a lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 61 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22316 timeout: 0 lvb_type: 0 Apr 28 09:33:34 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469214.22663 Apr 28 09:33:34 fir-md1-s2 kernel: LustreError: 22316:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 204 previous similar messages Apr 28 09:33:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to a501b92b-e7b6-1a0d-e95a-8363a690f102 (at 10.8.11.28@o2ib6) Apr 28 09:33:39 fir-md1-s2 kernel: Lustre: Skipped 2448 previous similar messages Apr 28 09:33:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client fafb3280-fd8a-565a-20cd-97f85a227ff6 (at 10.8.11.31@o2ib6) reconnecting Apr 28 09:33:43 fir-md1-s2 kernel: Lustre: Skipped 810 previous similar messages Apr 28 09:34:05 fir-md1-s2 kernel: LustreError: 22898:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469155, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a435157c0/0x35bc67a6e0f3f35d lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 517 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22898 timeout: 0 lvb_type: 0 Apr 28 09:34:05 fir-md1-s2 kernel: LustreError: 22898:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 77 previous similar messages Apr 28 09:34:34 fir-md1-s2 kernel: LustreError: 21891:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.17.14@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987903a7f2c0/0x35bc67a6e0f2d28f lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 517 type: IBT flags: 0x60200400000020 nid: 10.8.17.14@o2ib6 remote: 0x595dea16994e0ee expref: 13 pid: 22386 timeout: 468668 lvb_type: 0 Apr 28 09:34:34 fir-md1-s2 kernel: Lustre: 22814:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:27s); client may timeout. req@ffff9858a93ff200 x1631558415371312/t0(0) o101->9cb3a3a7-431b-a8f5-fef2-8703076397cf@10.9.107.42@o2ib4:7/0 lens 1768/0 e 0 to 0 dl 1556469247 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:34:34 fir-md1-s2 kernel: Lustre: 22814:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 158 previous similar messages Apr 28 09:34:36 fir-md1-s2 kernel: LustreError: 22668:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469186, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a347f9f80/0x35bc67a6e0f46015 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 511 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22668 timeout: 0 lvb_type: 0 Apr 28 09:34:36 fir-md1-s2 kernel: LustreError: 22668:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Apr 28 09:34:38 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b199eec00 x1631534363994832/t0(0) o101->2faf88a4-e542-43bc-139f-15cb7cd9b030@10.9.108.36@o2ib4:13/0 lens 592/0 e 0 to 0 dl 1556469283 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:34:38 fir-md1-s2 kernel: Lustre: 22586:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1399 previous similar messages Apr 28 09:34:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 55b451b3-fa82-3731-68a1-db9159c37dee (at 10.9.101.12@o2ib4) reconnecting Apr 28 09:34:48 fir-md1-s2 kernel: Lustre: Skipped 1459 previous similar messages Apr 28 09:35:07 fir-md1-s2 kernel: LustreError: 22882:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469217, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a772d8480/0x35bc67a6e0f4b9fd lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 518 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22882 timeout: 0 lvb_type: 0 Apr 28 09:35:07 fir-md1-s2 kernel: LustreError: 22882:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 28 09:35:24 fir-md1-s2 kernel: LNet: Service thread pid 22281 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:35:24 fir-md1-s2 kernel: Pid: 22281, comm: mdt01_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:24 fir-md1-s2 kernel: Call Trace: Apr 28 09:35:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:24 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:35:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:35:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469325.22281 Apr 28 09:35:25 fir-md1-s2 kernel: Pid: 22817, comm: mdt01_110 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:35:25 fir-md1-s2 kernel: Pid: 22815, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:35:25 fir-md1-s2 kernel: Pid: 22812, comm: mdt01_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:35:25 fir-md1-s2 kernel: LNet: Service thread pid 22595 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:35:25 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 09:35:25 fir-md1-s2 kernel: Pid: 22595, comm: mdt01_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:35:25 fir-md1-s2 kernel: Call Trace: Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:35:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:35:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:35:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:35:25 fir-md1-s2 kernel: LNet: Service thread pid 22271 was inactive for 200.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:35:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to fafb3280-fd8a-565a-20cd-97f85a227ff6 (at 10.8.11.31@o2ib6) Apr 28 09:35:47 fir-md1-s2 kernel: Lustre: Skipped 3057 previous similar messages Apr 28 09:35:56 fir-md1-s2 kernel: LNet: Service thread pid 22847 was inactive for 200.39s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:35:56 fir-md1-s2 kernel: LNet: Skipped 293 previous similar messages Apr 28 09:35:56 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469356.22847 Apr 28 09:36:04 fir-md1-s2 kernel: LustreError: 22864:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469274, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff984f5ca8d340/0x35bc67a6e0f50f07 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 63 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22864 timeout: 0 lvb_type: 0 Apr 28 09:36:04 fir-md1-s2 kernel: LustreError: 22864:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 28 09:36:27 fir-md1-s2 kernel: LNet: Service thread pid 22809 was inactive for 200.58s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:36:27 fir-md1-s2 kernel: LNet: Skipped 19 previous similar messages Apr 28 09:36:27 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469387.22809 Apr 28 09:36:35 fir-md1-s2 kernel: LustreError: 22728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469305, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff988a00b65a00/0x35bc67a6e0f54803 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 522 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22728 timeout: 0 lvb_type: 0 Apr 28 09:36:35 fir-md1-s2 kernel: LustreError: 22728:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 28 09:36:46 fir-md1-s2 kernel: Lustre: 22861:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984dcbabf800 x1631678095106352/t0(0) o101->5d399386-b1fb-d405-e88f-f20c8d175a51@10.8.25.4@o2ib6:21/0 lens 576/0 e 0 to 0 dl 1556469411 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 09:36:46 fir-md1-s2 kernel: Lustre: 22861:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2868 previous similar messages Apr 28 09:36:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 257e47e9-78a0-a5d9-0d4b-1c08db5bc591 (at 10.9.106.13@o2ib4) reconnecting Apr 28 09:36:57 fir-md1-s2 kernel: Lustre: Skipped 3171 previous similar messages Apr 28 09:36:58 fir-md1-s2 kernel: LNet: Service thread pid 22811 was inactive for 200.30s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:36:58 fir-md1-s2 kernel: LNet: Skipped 8 previous similar messages Apr 28 09:36:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469418.22811 Apr 28 09:37:04 fir-md1-s2 kernel: LustreError: 21891:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.11.22@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff98719d63af40/0x35bc67a6e0f2d52f lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 522 type: IBT flags: 0x60200400000020 nid: 10.8.11.22@o2ib6 remote: 0x1db56f3cb4038945 expref: 13 pid: 22548 timeout: 468818 lvb_type: 0 Apr 28 09:37:04 fir-md1-s2 kernel: LustreError: 21891:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 09:37:04 fir-md1-s2 kernel: LNet: Service thread pid 22764 completed after 299.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:37:04 fir-md1-s2 kernel: Lustre: 22824:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff985a41215d00 x1631546982218000/t0(0) o101->bd387039-9e5c-0c5b-0227-8087faaf7a40@10.9.103.22@o2ib4:4/0 lens 584/0 e 0 to 0 dl 1556469304 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:37:04 fir-md1-s2 kernel: LustreError: 22824:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.26.28@o2ib6: deadline 100:45s ago req@ffff984f59350900 x1631895577838048/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556469379 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:37:04 fir-md1-s2 kernel: LNet: Skipped 14 previous similar messages Apr 28 09:37:04 fir-md1-s2 kernel: LustreError: 22526:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff985a1abce400 ns: mdt-fir-MDT0003_UUID lock: ffff985951f457c0/0x35bc67a6e0f46a02 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x20/0x0 rrc: 38 type: IBT flags: 0x50200000000000 nid: 10.8.26.28@o2ib6 remote: 0xc0ca24986871d3d8 expref: 2 pid: 22526 timeout: 0 lvb_type: 0 Apr 28 09:37:06 fir-md1-s2 kernel: LustreError: 22794:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469336, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98790333e9c0/0x35bc67a6e0f569f2 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 57 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22794 timeout: 0 lvb_type: 0 Apr 28 09:37:06 fir-md1-s2 kernel: LustreError: 22794:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 28 09:37:54 fir-md1-s2 kernel: LNet: Service thread pid 22864 was inactive for 200.50s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:37:54 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 09:37:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469474.22864 Apr 28 09:37:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469475.22906 Apr 28 09:38:00 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469480.22663 Apr 28 09:38:22 fir-md1-s2 kernel: Lustre: Failing over fir-MDT0003 Apr 28 09:38:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:38:22 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_disconnect to node 0@lo failed: rc = -107 Apr 28 09:38:22 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 09:38:23 fir-md1-s2 kernel: LustreError: 22702:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff98570f60b800 ns: mdt-fir-MDT0003_UUID lock: ffff984df3ee2d00/0x35bc67a6e0f305f0 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 559 type: IBT flags: 0x50200400000020 nid: 10.9.105.6@o2ib4 remote: 0xe365fbfd3f922e21 expref: 5 pid: 22702 timeout: 0 lvb_type: 0 Apr 28 09:38:23 fir-md1-s2 kernel: Lustre: 22878:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:39s); client may timeout. req@ffff9855fdfee900 x1631305543842816/t0(0) o101->253b4c5b-4ff4-3bf5-58fe-413737b1d5c2@10.8.20.14@o2ib6:14/0 lens 376/0 e 0 to 0 dl 1556469464 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:38:23 fir-md1-s2 kernel: LNet: Service thread pid 22879 completed after 378.28s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:38:23 fir-md1-s2 kernel: LNet: Skipped 21 previous similar messages Apr 28 09:38:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.8.11.22@o2ib6 (stopping) Apr 28 09:38:23 fir-md1-s2 kernel: LustreError: 22565:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.13.13@o2ib6: deadline 30:45s ago req@ffff98683ef1e600 x1631731150404256/t0(0) o400->@:8/0 lens 224/0 e 0 to 0 dl 1556469458 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:38:23 fir-md1-s2 kernel: LustreError: 22565:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 09:38:23 fir-md1-s2 kernel: Lustre: 22878:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7405 previous similar messages Apr 28 09:38:23 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.29.2@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:38:23 fir-md1-s2 kernel: LustreError: Skipped 16 previous similar messages Apr 28 09:38:24 fir-md1-s2 kernel: Lustre: server umount fir-MDT0003 complete Apr 28 09:38:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:38:25 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.106@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:38:26 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.210@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:38:26 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 5 previous similar messages Apr 28 09:38:27 fir-md1-s2 kernel: LNet: Removed LNI 10.0.10.52@o2ib7 Apr 28 09:38:52 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 28 09:38:53 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 28 09:38:53 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl7 Apr 28 09:38:53 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 28 09:38:53 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.212@o2ib7 on NA (ib0:0:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:38:53 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 28 09:38:55 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:38:55 fir-md1-s2 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Apr 28 09:38:55 fir-md1-s2 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:38:55 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:38:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.27.13@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:38:56 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.114.12@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:38:56 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Apr 28 09:38:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.106.32@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:38:57 fir-md1-s2 kernel: LustreError: Skipped 326 previous similar messages Apr 28 09:38:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.18.27@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:38:59 fir-md1-s2 kernel: LustreError: Skipped 222 previous similar messages Apr 28 09:38:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.105.62@o2ib4 (not set up) Apr 28 09:39:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:39:00 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 28 09:39:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 28 09:39:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 28 09:39:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.8.23.32@o2ib6 (not set up) Apr 28 09:39:00 fir-md1-s2 kernel: Lustre: Skipped 24 previous similar messages Apr 28 09:39:00 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0003: operation mds_connect to node 10.0.10.51@o2ib7 failed: rc = -114 Apr 28 09:39:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.24@o2ib4) Apr 28 09:39:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client f8df0cc4-ce7f-fc87-f0ad-7ba448980771(at 10.8.13.13@o2ib6), waiting for 1326 known clients (7 recovered, 25 in progress, and 0 evicted) already passed deadline 2:30 Apr 28 09:39:01 fir-md1-s2 kernel: Lustre: Skipped 291 previous similar messages Apr 28 09:39:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to dca21337-fdba-5128-347e-592b37646902 (at 10.9.108.60@o2ib4) Apr 28 09:39:01 fir-md1-s2 kernel: Lustre: Skipped 514 previous similar messages Apr 28 09:39:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 7bc73a34-3e7e-c2f0-81f2-d0da70be75c4 (at 10.9.108.26@o2ib4) Apr 28 09:39:02 fir-md1-s2 kernel: Lustre: Skipped 539 previous similar messages Apr 28 09:39:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 2943d7c9-ecf1-ed5a-9d88-3a1d89520529 (at 10.8.31.9@o2ib6) Apr 28 09:39:04 fir-md1-s2 kernel: Lustre: Skipped 758 previous similar messages Apr 28 09:39:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.108@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:39:06 fir-md1-s2 kernel: LustreError: Skipped 86 previous similar messages Apr 28 09:39:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.49@o2ib4) Apr 28 09:39:08 fir-md1-s2 kernel: Lustre: Skipped 361 previous similar messages Apr 28 09:39:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client 68edc09f-643e-c49e-5a28-0a9623c3776c(at 10.8.11.22@o2ib6), waiting for 1326 known clients (1246 recovered, 77 in progress, and 0 evicted) already passed deadline 2:40 Apr 28 09:39:21 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.0.10.51@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:39:21 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 09:39:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.107@o2ib7 (at 10.0.10.107@o2ib7) Apr 28 09:39:22 fir-md1-s2 kernel: Lustre: Skipped 142 previous similar messages Apr 28 09:39:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client f8df0cc4-ce7f-fc87-f0ad-7ba448980771(at 10.8.13.13@o2ib6), waiting for 1326 known clients (1246 recovered, 77 in progress, and 0 evicted) already passed deadline 3:05 Apr 28 09:39:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.102@o2ib7 (at 10.0.10.102@o2ib7) Apr 28 09:39:39 fir-md1-s2 kernel: Lustre: Skipped 62 previous similar messages Apr 28 09:39:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery already passed deadline 1:40, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 09:39:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 0:51, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:40:16 fir-md1-s2 kernel: Lustre: 24836:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff988992ac3300 x1631542328029552/t0(0) o101->dd055160-73e1-c0f8-3c11-ca5351f1fd45@10.9.105.71@o2ib4:21/0 lens 480/568 e 0 to 0 dl 1556469621 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:40:16 fir-md1-s2 kernel: Lustre: 24836:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 199 previous similar messages Apr 28 09:40:16 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b379a2400 x1631531886830736/t0(0) o101->176b6ecb-736c-d192-973a-179bdc2a9222@10.8.26.27@o2ib6:21/0 lens 376/0 e 0 to 0 dl 1556469621 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:40:16 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Apr 28 09:40:17 fir-md1-s2 kernel: Lustre: 24542:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b379a4500 x1631549463932976/t0(0) o101->b03523f3-7393-4682-6529-e841828fdc86@10.9.103.39@o2ib4:22/0 lens 1768/0 e 0 to 0 dl 1556469622 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:40:17 fir-md1-s2 kernel: Lustre: 24542:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1016 previous similar messages Apr 28 09:40:19 fir-md1-s2 kernel: Lustre: 24542:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b3a2a7b00 x1631694273467376/t0(0) o101->1e8725e1-8b53-4b80-244d-97fadcc45330@10.8.21.27@o2ib6:24/0 lens 576/0 e 0 to 0 dl 1556469624 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:40:19 fir-md1-s2 kernel: Lustre: 24542:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Apr 28 09:40:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6f654be8-ab7e-13e1-5a4b-5026f68cf5e2 (at 10.8.11.26@o2ib6) reconnecting Apr 28 09:40:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 6f654be8-ab7e-13e1-5a4b-5026f68cf5e2 (at 10.8.11.26@o2ib6) Apr 28 09:40:22 fir-md1-s2 kernel: Lustre: Skipped 29 previous similar messages Apr 28 09:40:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 74fb56c5-8bc6-38a9-8624-788945b7232f (at 10.9.115.2@o2ib4) reconnecting Apr 28 09:40:22 fir-md1-s2 kernel: Lustre: Skipped 746 previous similar messages Apr 28 09:40:23 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a76627200 x1631558491875456/t0(0) o101->b4e75cd9-74c7-0ec8-2651-b87e466f256d@10.9.105.70@o2ib4:28/0 lens 584/0 e 0 to 0 dl 1556469628 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:40:23 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 97 previous similar messages Apr 28 09:40:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d594a152-d993-c755-50bf-0f3b806ddc60 (at 10.9.107.22@o2ib4) reconnecting Apr 28 09:40:24 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 28 09:40:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 315cf750-5ce7-61a0-093d-91bfc52b74be (at 10.8.17.10@o2ib6) reconnecting Apr 28 09:40:27 fir-md1-s2 kernel: Lustre: Skipped 18 previous similar messages Apr 28 09:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 55b451b3-fa82-3731-68a1-db9159c37dee (at 10.9.101.12@o2ib4) reconnecting Apr 28 09:40:31 fir-md1-s2 kernel: Lustre: Skipped 34 previous similar messages Apr 28 09:40:31 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b3c32e600 x1631535067210064/t0(0) o101->13458280-a046-3a7f-2bec-0301aba013a1@10.8.28.12@o2ib6:6/0 lens 576/0 e 0 to 0 dl 1556469636 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:40:31 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Apr 28 09:40:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 59081b32-f544-db84-4156-163ed89d33b6 (at 10.8.26.21@o2ib6) reconnecting Apr 28 09:40:39 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages Apr 28 09:40:47 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-27), not sending early reply req@ffff986b375fda00 x1631664077861056/t0(0) o101->0329ffdb-a6ce-9bac-ea06-091e9df77bb3@10.9.101.11@o2ib4:22/0 lens 576/0 e 0 to 0 dl 1556469652 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:40:47 fir-md1-s2 kernel: Lustre: 24223:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 290 previous similar messages Apr 28 09:40:53 fir-md1-s2 kernel: Lustre: 24462:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:32s); client may timeout. req@ffff985f292a3900 x1631624419065952/t0(0) o101->553a403a-82aa-538a-0604-abb10f5fa6f2@10.9.101.33@o2ib4:21/0 lens 576/0 e 0 to 0 dl 1556469621 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:40:53 fir-md1-s2 kernel: LustreError: 23914:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.107.1@o2ib4: deadline 30:1s ago req@ffff984db8ee1b00 x1631534638633424/t0(0) o101->0a88c7dd-73ed-bd8d-be97-b623e6cfcc05@10.9.107.1@o2ib4:22/0 lens 376/0 e 0 to 0 dl 1556469652 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:40:53 fir-md1-s2 kernel: Lustre: 24462:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1425 previous similar messages Apr 28 09:41:19 fir-md1-s2 kernel: Lustre: 24455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b3e4c1200 x1631731227114208/t0(0) o101->dc58e19f-fa23-c922-cca4-d163110eef55@10.9.104.46@o2ib4:24/0 lens 1768/0 e 0 to 0 dl 1556469684 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:41:19 fir-md1-s2 kernel: Lustre: 24455:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1200 previous similar messages Apr 28 09:41:21 fir-md1-s2 kernel: LustreError: 24536:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469591, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a343706c0/0xe8a2bfab068867a2 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 543 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24536 timeout: 0 lvb_type: 0 Apr 28 09:41:21 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469681.24433 Apr 28 09:41:21 fir-md1-s2 kernel: LustreError: 24536:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 251 previous similar messages Apr 28 09:41:21 fir-md1-s2 kernel: LustreError: 24152:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469591, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986b08a2b600/0xe8a2bfab06895d82 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 543 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24152 timeout: 0 lvb_type: 0 Apr 28 09:41:21 fir-md1-s2 kernel: LustreError: 24152:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 23 previous similar messages Apr 28 09:41:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6f654be8-ab7e-13e1-5a4b-5026f68cf5e2 (at 10.8.11.26@o2ib6) reconnecting Apr 28 09:41:24 fir-md1-s2 kernel: Lustre: Skipped 756 previous similar messages Apr 28 09:41:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.104.50@o2ib4) Apr 28 09:41:26 fir-md1-s2 kernel: Lustre: Skipped 2369 previous similar messages Apr 28 09:41:52 fir-md1-s2 kernel: LustreError: 24617:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469622, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a3eeb18c0/0xe8a2bfab06897274 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 556 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24617 timeout: 0 lvb_type: 0 Apr 28 09:41:52 fir-md1-s2 kernel: LustreError: 24617:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 16 previous similar messages Apr 28 09:41:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client ebcd528e-8f7e-3bd7-e882-5d7bc3de37e1 (at 10.9.106.4@o2ib4) reconnecting Apr 28 09:41:56 fir-md1-s2 kernel: Lustre: Skipped 1620 previous similar messages Apr 28 09:42:20 fir-md1-s2 kernel: LustreError: 23898:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.22.20@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986abf3a0fc0/0xe8a2bfab068839b9 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 556 type: IBT flags: 0x60000400000020 nid: 10.8.22.20@o2ib6 remote: 0xe96edf088221fb83 expref: 14 pid: 24392 timeout: 469134 lvb_type: 0 Apr 28 09:42:20 fir-md1-s2 kernel: Lustre: 24433:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:57s); client may timeout. req@ffff9858f76fb900 x1631543348345136/t0(0) o101->f97cfae5-aafc-8930-1f52-c15218589c16@10.9.108.37@o2ib4:23/0 lens 584/0 e 0 to 0 dl 1556469683 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:42:20 fir-md1-s2 kernel: LustreError: 24455:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.8.31@o2ib6: deadline 30:1s ago req@ffff986b35e40600 x1631628138103504/t0(0) o101->e18301fc-f860-0db4-bf24-6c606e0cc839@10.8.8.31@o2ib6:19/0 lens 576/0 e 0 to 0 dl 1556469739 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:42:20 fir-md1-s2 kernel: LustreError: 24455:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 122 previous similar messages Apr 28 09:42:20 fir-md1-s2 kernel: Lustre: 24433:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1624 previous similar messages Apr 28 09:42:23 fir-md1-s2 kernel: LustreError: 24546:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469653, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98595e2945c0/0xe8a2bfab06899bda lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 559 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24546 timeout: 0 lvb_type: 0 Apr 28 09:42:23 fir-md1-s2 kernel: LustreError: 24546:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Apr 28 09:42:45 fir-md1-s2 kernel: Lustre: 24424:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9869f0e35700 x1631543135624128/t0(0) o101->30832ebc-e700-a2ed-766e-a894d22af97d@10.9.113.8@o2ib4:20/0 lens 480/0 e 0 to 0 dl 1556469770 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:42:45 fir-md1-s2 kernel: Lustre: 24424:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1523 previous similar messages Apr 28 09:42:54 fir-md1-s2 kernel: LustreError: 24466:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469684, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a7e21e9c0/0xe8a2bfab0689e3d2 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 584 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24466 timeout: 0 lvb_type: 0 Apr 28 09:42:54 fir-md1-s2 kernel: LustreError: 24466:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 14 previous similar messages Apr 28 09:43:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 27ca7937-8a40-795a-1879-048dcb621b68 (at 10.9.108.46@o2ib4) reconnecting Apr 28 09:43:00 fir-md1-s2 kernel: Lustre: Skipped 1037 previous similar messages Apr 28 09:43:11 fir-md1-s2 kernel: LNet: Service thread pid 24630 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:43:11 fir-md1-s2 kernel: Pid: 24630, comm: mdt00_106 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:11 fir-md1-s2 kernel: Call Trace: Apr 28 09:43:11 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:11 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:11 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:43:11 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:43:11 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:43:11 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:11 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:11 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:11 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:11 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:43:11 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469791.24630 Apr 28 09:43:12 fir-md1-s2 kernel: Pid: 24611, comm: mdt00_098 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:12 fir-md1-s2 kernel: Call Trace: Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:43:12 fir-md1-s2 kernel: Pid: 24588, comm: mdt02_084 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:12 fir-md1-s2 kernel: Call Trace: Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:43:12 fir-md1-s2 kernel: Pid: 24557, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:12 fir-md1-s2 kernel: Call Trace: Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:43:12 fir-md1-s2 kernel: LNet: Service thread pid 24518 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:43:12 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 09:43:12 fir-md1-s2 kernel: Pid: 24518, comm: mdt01_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:43:12 fir-md1-s2 kernel: Call Trace: Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:43:12 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:43:12 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:43:12 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:43:12 fir-md1-s2 kernel: LNet: Service thread pid 24457 was inactive for 200.53s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:43:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469792.24375 Apr 28 09:43:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 443850a1-e00f-945f-2b6c-3f1b9a404420 (at 10.8.10.25@o2ib6) Apr 28 09:43:36 fir-md1-s2 kernel: Lustre: Skipped 2612 previous similar messages Apr 28 09:43:42 fir-md1-s2 kernel: LNet: Service thread pid 24625 was inactive for 200.21s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:43:42 fir-md1-s2 kernel: LNet: Skipped 195 previous similar messages Apr 28 09:43:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469822.24625 Apr 28 09:43:50 fir-md1-s2 kernel: LustreError: 24867:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469740, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff988b10f28240/0xe8a2bfab0689f942 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 590 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24867 timeout: 0 lvb_type: 0 Apr 28 09:43:50 fir-md1-s2 kernel: LustreError: 24867:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Apr 28 09:44:13 fir-md1-s2 kernel: LNet: Service thread pid 24604 was inactive for 200.41s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:44:13 fir-md1-s2 kernel: LNet: Skipped 13 previous similar messages Apr 28 09:44:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469853.24604 Apr 28 09:44:21 fir-md1-s2 kernel: LustreError: 24154:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469771, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98782a376300/0xe8a2bfab068bd05d lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x13/0x8 rrc: 65 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24154 timeout: 0 lvb_type: 0 Apr 28 09:44:21 fir-md1-s2 kernel: LustreError: 24154:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 89 previous similar messages Apr 28 09:44:45 fir-md1-s2 kernel: LNet: Service thread pid 24466 was inactive for 200.70s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:44:45 fir-md1-s2 kernel: LNet: Skipped 13 previous similar messages Apr 28 09:44:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469885.24466 Apr 28 09:44:50 fir-md1-s2 kernel: LustreError: 23898:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.13.10@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9878dfadaac0/0xe8a2bfab068b7326 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 68 type: IBT flags: 0x60200400000020 nid: 10.8.13.10@o2ib6 remote: 0xe75777389ec485e expref: 22 pid: 24180 timeout: 469284 lvb_type: 0 Apr 28 09:44:50 fir-md1-s2 kernel: Lustre: 24581:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (123:27s); client may timeout. req@ffff986b35acc800 x1631679532654080/t0(0) o101->9d3308b1-2b7d-a5d6-62ae-29b5c7f70c8a@10.9.101.24@o2ib4:20/0 lens 592/0 e 0 to 0 dl 1556469863 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:44:50 fir-md1-s2 kernel: LNet: Service thread pid 24173 completed after 299.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:44:50 fir-md1-s2 kernel: Lustre: 24581:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2593 previous similar messages Apr 28 09:44:54 fir-md1-s2 kernel: LustreError: 24601:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469804, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9877c8bc3f00/0xe8a2bfab068bff2d lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 593 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24601 timeout: 0 lvb_type: 0 Apr 28 09:44:54 fir-md1-s2 kernel: Lustre: 24173:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a1cbacb00 x1631535554927600/t0(0) o101->4dc6ad45-c67c-15d0-5638-611b0defe5f9@10.8.16.2@o2ib6:29/0 lens 576/0 e 0 to 0 dl 1556469899 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 09:44:54 fir-md1-s2 kernel: Lustre: 24173:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3563 previous similar messages Apr 28 09:44:54 fir-md1-s2 kernel: LustreError: 24601:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Apr 28 09:45:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a012f954-274c-2ad9-91a0-3bd8760ed55b (at 10.9.114.8@o2ib4) reconnecting Apr 28 09:45:08 fir-md1-s2 kernel: Lustre: Skipped 2653 previous similar messages Apr 28 09:45:40 fir-md1-s2 kernel: LNet: Service thread pid 24622 was inactive for 200.38s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:45:40 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 09:45:40 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469940.24622 Apr 28 09:45:41 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469941.24510 Apr 28 09:45:46 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469946.24180 Apr 28 09:46:00 fir-md1-s2 kernel: LustreError: 24200:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556469870, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a25efe300/0xe8a2bfab068c3fc3 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x13/0x8 rrc: 72 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 24200 timeout: 0 lvb_type: 0 Apr 28 09:46:00 fir-md1-s2 kernel: LustreError: 24200:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Apr 28 09:46:12 fir-md1-s2 kernel: LNet: Service thread pid 24154 was inactive for 200.49s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:46:12 fir-md1-s2 kernel: LNet: Skipped 91 previous similar messages Apr 28 09:46:12 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469972.24154 Apr 28 09:46:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556469977.24235 Apr 28 09:46:42 fir-md1-s2 kernel: LNet: Service thread pid 24623 was inactive for 200.18s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:46:42 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 09:46:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470002.24623 Apr 28 09:46:44 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470004.24601 Apr 28 09:47:14 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470034.24473 Apr 28 09:47:19 fir-md1-s2 kernel: LNet: Service thread pid 24230 was inactive for 200.31s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:47:19 fir-md1-s2 kernel: LNet: Skipped 6 previous similar messages Apr 28 09:47:19 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470039.24230 Apr 28 09:47:20 fir-md1-s2 kernel: LustreError: 23898:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.28.6@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff984b5a973f00/0xe8a2bfab0688a035 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 597 type: IBT flags: 0x60200400000020 nid: 10.8.28.6@o2ib6 remote: 0x1ba52d6600a240c9 expref: 15 pid: 24630 timeout: 469434 lvb_type: 0 Apr 28 09:47:20 fir-md1-s2 kernel: LustreError: 23898:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 09:47:20 fir-md1-s2 kernel: LNet: Service thread pid 24476 completed after 449.06s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:47:20 fir-md1-s2 kernel: Lustre: 24608:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:109s); client may timeout. req@ffff98591a30a700 x1631584799138256/t0(0) o101->c4e07e51-5d48-0b06-6fb4-3c726822dfcb@10.9.103.38@o2ib4:1/0 lens 584/0 e 0 to 0 dl 1556469931 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:47:20 fir-md1-s2 kernel: Lustre: 24525:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff985a30240c00 x1631549463963952/t0(0) o36->b03523f3-7393-4682-6529-e841828fdc86@10.9.103.39@o2ib4:20/0 lens 488/0 e 0 to 0 dl 1556469920 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:47:20 fir-md1-s2 kernel: LustreError: 24630:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.13.10@o2ib6: deadline 100:39s ago req@ffff98547f260600 x1631315352263504/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556470001 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:47:20 fir-md1-s2 kernel: LustreError: 24478:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff985a4a259800 ns: mdt-fir-MDT0003_UUID lock: ffff985413608480/0xe8a2bfab0688a988 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 580 type: IBT flags: 0x50200400000020 nid: 10.8.10.1@o2ib6 remote: 0xe21e2921ca3f3ccc expref: 3 pid: 24478 timeout: 0 lvb_type: 0 Apr 28 09:47:20 fir-md1-s2 kernel: LNet: Skipped 73 previous similar messages Apr 28 09:47:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470065.24451 Apr 28 09:47:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.115.9@o2ib4) Apr 28 09:47:52 fir-md1-s2 kernel: Lustre: Skipped 5556 previous similar messages Apr 28 09:48:10 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470090.24444 Apr 28 09:48:21 fir-md1-s2 kernel: LNet: Service thread pid 24461 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:48:21 fir-md1-s2 kernel: Pid: 24461, comm: mdt02_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:48:21 fir-md1-s2 kernel: Call Trace: Apr 28 09:48:21 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:48:21 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:48:21 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:48:21 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470101.24461 Apr 28 09:48:21 fir-md1-s2 kernel: Pid: 24265, comm: mdt02_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:48:21 fir-md1-s2 kernel: Call Trace: Apr 28 09:48:21 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:48:21 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:48:21 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:48:21 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:48:21 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:48:50 fir-md1-s2 kernel: LustreError: 24525:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470040, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9853dd6672c0/0xe8a2bfab068cd126 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 643 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24525 timeout: 0 lvb_type: 0 Apr 28 09:48:50 fir-md1-s2 kernel: LustreError: 24525:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 31 previous similar messages Apr 28 09:49:11 fir-md1-s2 kernel: Lustre: 24471:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98714a680600 x1631544541669408/t0(0) o101->19d564f5-81a1-e2c5-8dd9-3787a091c07a@10.9.112.12@o2ib4:16/0 lens 576/0 e 0 to 0 dl 1556470156 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 09:49:11 fir-md1-s2 kernel: Lustre: 24471:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6027 previous similar messages Apr 28 09:49:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f1b26272-cb99-9dbe-fdc3-6a70f1d77cbb (at 10.9.112.4@o2ib4) reconnecting Apr 28 09:49:24 fir-md1-s2 kernel: Lustre: Skipped 5740 previous similar messages Apr 28 09:49:25 fir-md1-s2 kernel: Lustre: Failing over fir-MDT0001 Apr 28 09:49:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 24608:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff984ffc6b1800 ns: mdt-fir-MDT0003_UUID lock: ffff985900e97980/0xe8a2bfab068dcbd6 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 48 type: IBT flags: 0x50200400000020 nid: 10.8.13.20@o2ib6 remote: 0xdf15508116cd4a82 expref: 10 pid: 24608 timeout: 0 lvb_type: 0 Apr 28 09:49:26 fir-md1-s2 kernel: LNet: Service thread pid 24447 completed after 574.51s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:49:26 fir-md1-s2 kernel: Lustre: 24447:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:17s); client may timeout. req@ffff985a09fa3900 x1631731227154064/t0(0) o101->dc58e19f-fa23-c922-cca4-d163110eef55@10.9.104.46@o2ib4:6/0 lens 1768/0 e 0 to 0 dl 1556470148 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:49:26 fir-md1-s2 kernel: Lustre: 24447:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3025 previous similar messages Apr 28 09:49:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.101.51@o2ib4 (stopping) Apr 28 09:49:26 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 24195:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.28.6@o2ib6: deadline 30:85s ago req@ffff986ab42d0300 x1631542998648816/t0(0) o400->@:0/0 lens 224/0 e 0 to 0 dl 1556470080 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 24195:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_disconnect to node 0@lo failed: rc = -19 Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 25387:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0001-osp-MDT0003: can't disconnect: rc = -19 Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 25387:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0003-mdtlov: error cleaning up LOD index 1: cmd 0xcf031: rc = -19 Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 24608:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 18 previous similar messages Apr 28 09:49:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.104.23@o2ib4 (stopping) Apr 28 09:49:26 fir-md1-s2 kernel: Lustre: Skipped 601 previous similar messages Apr 28 09:49:26 fir-md1-s2 kernel: LustreError: 23776:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff988a81619b00 x1632076597281584/t0(0) o41->fir-MDT0002-osp-MDT0001@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 28 09:49:27 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.104.19@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:49:27 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 09:49:27 fir-md1-s2 kernel: Lustre: server umount fir-MDT0003 complete Apr 28 09:49:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:49:29 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.201@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:49:31 fir-md1-s2 kernel: LNet: Removed LNI 10.0.10.52@o2ib7 Apr 28 09:49:38 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 28 09:49:38 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 28 09:49:39 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl7 Apr 28 09:49:39 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 28 09:49:39 fir-md1-s2 kernel: LNetError: 84:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.211@o2ib7 on NA (ib0:0:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 09:49:39 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 28 09:50:41 fir-md1-s2 kernel: LNetError: 25846:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Apr 28 09:50:41 fir-md1-s2 kernel: LNetError: 25846:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 28 09:50:41 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 09:50:41 fir-md1-s2 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Apr 28 09:50:41 fir-md1-s2 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:50:41 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 09:50:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.22.22@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:50:44 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.25.26@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:50:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.0.10.107@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:50:45 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Apr 28 09:50:47 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0001: operation mds_connect to node 10.0.10.51@o2ib7 failed: rc = -114 Apr 28 09:50:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 09:50:47 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 28 09:50:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 28 09:50:47 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_connect to node 0@lo failed: rc = -114 Apr 28 09:50:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 09:50:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 09:50:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) Apr 28 09:50:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to da3f843b-57eb-189b-c00c-30194a364ab5 (at 10.8.18.9@o2ib6) Apr 28 09:50:49 fir-md1-s2 kernel: Lustre: Skipped 80 previous similar messages Apr 28 09:50:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to fe32fb07-33f9-125c-c663-eb05b7f0f990 (at 10.9.107.2@o2ib4) Apr 28 09:50:50 fir-md1-s2 kernel: Lustre: Skipped 174 previous similar messages Apr 28 09:50:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to f40ab9a5-d4a7-6a11-740b-91c353da32d6 (at 10.8.10.8@o2ib6) Apr 28 09:50:52 fir-md1-s2 kernel: Lustre: Skipped 259 previous similar messages Apr 28 09:50:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3c7c7264-1e32-88c4-c095-86b076110784 (at 10.8.1.30@o2ib6) Apr 28 09:50:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 3c7c7264-1e32-88c4-c095-86b076110784 (at 10.8.1.30@o2ib6) Apr 28 09:50:56 fir-md1-s2 kernel: Lustre: Skipped 1836 previous similar messages Apr 28 09:50:56 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 28 09:51:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client acb643ef-75ad-6f92-b388-57634462f54f(at 10.8.28.6@o2ib6), waiting for 1327 known clients (1251 recovered, 73 in progress, and 0 evicted) already passed deadline 2:42 Apr 28 09:51:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 09:51:12 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Apr 28 09:51:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.104@o2ib7 (at 10.0.10.104@o2ib7) Apr 28 09:51:17 fir-md1-s2 kernel: Lustre: Skipped 327 previous similar messages Apr 28 09:51:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client acb643ef-75ad-6f92-b388-57634462f54f(at 10.8.28.6@o2ib6), waiting for 1327 known clients (1253 recovered, 73 in progress, and 0 evicted) already passed deadline 3:07 Apr 28 09:51:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery already passed deadline 1:40, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 09:51:38 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 28 09:51:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 0:50, of 1328 clients 1328 recovered and 0 were evicted. Apr 28 09:51:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to acb643ef-75ad-6f92-b388-57634462f54f (at 10.8.28.6@o2ib6) Apr 28 09:51:50 fir-md1-s2 kernel: Lustre: Skipped 56 previous similar messages Apr 28 09:52:03 fir-md1-s2 kernel: Lustre: 26898:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9887963e1e00 x1631559335297744/t0(0) o101->98305627-c518-5132-2386-e8ff7f2f8fb5@10.9.102.21@o2ib4:8/0 lens 576/3264 e 0 to 0 dl 1556470328 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 09:52:03 fir-md1-s2 kernel: Lustre: 26898:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1197 previous similar messages Apr 28 09:52:04 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b0c1df500 x1631677102941632/t0(0) o101->77b03dd0-1704-2ada-7765-48818a378def@10.8.23.15@o2ib6:9/0 lens 480/0 e 0 to 0 dl 1556470329 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:52:04 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Apr 28 09:52:05 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98648af2c500 x1631534307173888/t0(0) o101->3b7f9e8e-0cc2-e0b1-ed46-2872567345ed@10.9.103.29@o2ib4:10/0 lens 608/0 e 0 to 0 dl 1556470330 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:52:05 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Apr 28 09:52:07 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98648af2e000 x1631968202286336/t0(0) o101->b0f06bff-666c-4f17-dbec-3b9049db7d53@10.8.26.4@o2ib6:12/0 lens 584/0 e 0 to 0 dl 1556470332 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:52:07 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 28 09:52:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f7f29fbd-f06d-1e4f-a662-2d2ae362522d (at 10.8.8.7@o2ib6) reconnecting Apr 28 09:52:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cc57ad24-07f9-6270-9e45-e86bdff220e7 (at 10.8.2.27@o2ib6) reconnecting Apr 28 09:52:09 fir-md1-s2 kernel: Lustre: Skipped 430 previous similar messages Apr 28 09:52:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client bfaf32fd-a75c-1493-838b-c2682e1a6ae6 (at 10.9.101.15@o2ib4) reconnecting Apr 28 09:52:10 fir-md1-s2 kernel: Lustre: Skipped 276 previous similar messages Apr 28 09:52:11 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b2ebd9800 x1631534723024512/t0(0) o101->e8118800-654b-da1e-89d9-2c7dcbc3992b@10.9.106.20@o2ib4:16/0 lens 576/0 e 0 to 0 dl 1556470336 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:52:11 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 95 previous similar messages Apr 28 09:52:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 392f8ae2-e01b-21c6-87b9-54b71ec9a21b (at 10.8.22.6@o2ib6) reconnecting Apr 28 09:52:12 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 28 09:52:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fafb3280-fd8a-565a-20cd-97f85a227ff6 (at 10.8.11.31@o2ib6) reconnecting Apr 28 09:52:16 fir-md1-s2 kernel: Lustre: Skipped 57 previous similar messages Apr 28 09:52:19 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986aa2afe000 x1631905143627376/t0(0) o101->fa4c9091-2d56-2e2a-5dff-17e3f5cfb585@10.8.30.34@o2ib6:24/0 lens 576/0 e 0 to 0 dl 1556470344 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:52:19 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Apr 28 09:52:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6f3ae313-77c2-91cd-0011-b55b06f56220 (at 10.9.107.44@o2ib4) reconnecting Apr 28 09:52:24 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Apr 28 09:52:35 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b2e649800 x1631346343288640/t0(0) o101->bd49a221-b9b3-c4d6-c057-a5a69a0093b0@10.8.10.9@o2ib6:10/0 lens 576/0 e 0 to 0 dl 1556470360 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:52:35 fir-md1-s2 kernel: Lustre: 26632:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 729 previous similar messages Apr 28 09:52:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b9df03f5-d7de-55e7-26be-b7cb233fd358 (at 10.9.115.9@o2ib4) reconnecting Apr 28 09:52:40 fir-md1-s2 kernel: Lustre: Skipped 741 previous similar messages Apr 28 09:52:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) Apr 28 09:52:55 fir-md1-s2 kernel: Lustre: Skipped 1636 previous similar messages Apr 28 09:53:07 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986ac03ebf00 x1631805102290272/t0(0) o101->3b8583db-9b25-3a84-4b44-4c626faa0d2b@10.8.30.13@o2ib6:12/0 lens 576/0 e 0 to 0 dl 1556470392 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:53:07 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 724 previous similar messages Apr 28 09:53:08 fir-md1-s2 kernel: LustreError: 26527:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470298, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985aeb5557c0/0x7ec9d9e6725a489b lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 549 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 26527 timeout: 0 lvb_type: 0 Apr 28 09:53:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470388.26578 Apr 28 09:53:08 fir-md1-s2 kernel: LustreError: 26527:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 215 previous similar messages Apr 28 09:53:09 fir-md1-s2 kernel: LustreError: 26730:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470299, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985f27605100/0x7ec9d9e6725b5072 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 549 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 26730 timeout: 0 lvb_type: 0 Apr 28 09:53:09 fir-md1-s2 kernel: LustreError: 26730:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 55 previous similar messages Apr 28 09:53:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client ed81787c-cc99-8f20-0232-22f2afe6ad48 (at 10.9.109.1@o2ib4) reconnecting Apr 28 09:53:13 fir-md1-s2 kernel: Lustre: Skipped 829 previous similar messages Apr 28 09:53:39 fir-md1-s2 kernel: LustreError: 26524:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470329, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987b093c21c0/0x7ec9d9e6725b760b lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 568 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26524 timeout: 0 lvb_type: 0 Apr 28 09:53:39 fir-md1-s2 kernel: LustreError: 26524:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 20 previous similar messages Apr 28 09:54:08 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.18.4@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987a9fed8b40/0x7ec9d9e6725a485c lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 579 type: IBT flags: 0x60200400000020 nid: 10.8.18.4@o2ib6 remote: 0xd21f35d88982e2a9 expref: 17 pid: 26739 timeout: 469842 lvb_type: 0 Apr 28 09:54:10 fir-md1-s2 kernel: LustreError: 26682:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470360, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98767db8c800/0x7ec9d9e6725bcf83 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 578 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 26682 timeout: 0 lvb_type: 0 Apr 28 09:54:10 fir-md1-s2 kernel: LustreError: 26682:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Apr 28 09:54:11 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9862eebb1500 x1631550345447104/t0(0) o101->d8d6f8e7-a2cd-08f2-c263-fa8b0dbeef3c@10.8.8.2@o2ib6:16/0 lens 576/0 e 0 to 0 dl 1556470456 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 09:54:11 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1362 previous similar messages Apr 28 09:54:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting Apr 28 09:54:17 fir-md1-s2 kernel: Lustre: Skipped 1565 previous similar messages Apr 28 09:54:41 fir-md1-s2 kernel: LustreError: 26599:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470391, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9859fff160c0/0x7ec9d9e6725c2cba lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 581 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 26599 timeout: 0 lvb_type: 0 Apr 28 09:54:41 fir-md1-s2 kernel: LustreError: 26599:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Apr 28 09:54:58 fir-md1-s2 kernel: LNet: Service thread pid 26592 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:54:58 fir-md1-s2 kernel: Pid: 26592, comm: mdt01_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:58 fir-md1-s2 kernel: Call Trace: Apr 28 09:54:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:58 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:58 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:58 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:58 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:54:58 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:54:58 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:54:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:54:59 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470499.26592 Apr 28 09:54:59 fir-md1-s2 kernel: Pid: 26441, comm: mdt01_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:59 fir-md1-s2 kernel: Call Trace: Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:54:59 fir-md1-s2 kernel: Pid: 26877, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:59 fir-md1-s2 kernel: Call Trace: Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:54:59 fir-md1-s2 kernel: Pid: 26650, comm: mdt01_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:59 fir-md1-s2 kernel: Call Trace: Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:54:59 fir-md1-s2 kernel: LNet: Service thread pid 26456 was inactive for 200.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:54:59 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 09:54:59 fir-md1-s2 kernel: Pid: 26456, comm: mdt01_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:54:59 fir-md1-s2 kernel: Call Trace: Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:54:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:54:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:54:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:54:59 fir-md1-s2 kernel: LNet: Service thread pid 26730 was inactive for 200.44s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:55:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 469d3c01-0ba5-8df1-fade-b379f197d2fe (at 10.8.27.33@o2ib6) Apr 28 09:55:03 fir-md1-s2 kernel: Lustre: Skipped 3179 previous similar messages Apr 28 09:55:12 fir-md1-s2 kernel: LustreError: 26767:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470422, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a5eebbf00/0x7ec9d9e6725c8e66 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x21/0x0 rrc: 583 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 26767 timeout: 0 lvb_type: 0 Apr 28 09:55:12 fir-md1-s2 kernel: LustreError: 26767:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Apr 28 09:55:30 fir-md1-s2 kernel: LNet: Service thread pid 26064 was inactive for 200.63s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:55:30 fir-md1-s2 kernel: LNet: Skipped 287 previous similar messages Apr 28 09:55:30 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470530.26064 Apr 28 09:55:38 fir-md1-s2 kernel: LustreError: 26695:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470448, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff984e1826e300/0x7ec9d9e6725cc126 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 583 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26695 timeout: 0 lvb_type: 0 Apr 28 09:55:38 fir-md1-s2 kernel: LustreError: 26695:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 28 09:56:00 fir-md1-s2 kernel: LNet: Service thread pid 26447 was inactive for 200.25s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:56:00 fir-md1-s2 kernel: LNet: Skipped 25 previous similar messages Apr 28 09:56:00 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470560.26447 Apr 28 09:56:19 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986ac2fa4500 x1631609899692016/t0(0) o101->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:24/0 lens 576/0 e 0 to 0 dl 1556470584 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 09:56:19 fir-md1-s2 kernel: Lustre: 26848:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3282 previous similar messages Apr 28 09:56:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client c795f4ee-001d-f339-6d66-a97c85191d32 (at 10.9.102.14@o2ib4) reconnecting Apr 28 09:56:25 fir-md1-s2 kernel: Lustre: Skipped 3232 previous similar messages Apr 28 09:56:31 fir-md1-s2 kernel: LNet: Service thread pid 26593 was inactive for 200.06s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:56:31 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 09:56:31 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470591.26593 Apr 28 09:56:32 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470592.26657 Apr 28 09:56:38 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.23@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff9858f2e5ec00/0x7ec9d9e6725a5a7d lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 74 type: IBT flags: 0x60200400000020 nid: 10.8.12.23@o2ib6 remote: 0xf838c50e2ba86491 expref: 17 pid: 26453 timeout: 469992 lvb_type: 0 Apr 28 09:56:38 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 09:56:38 fir-md1-s2 kernel: LNet: Service thread pid 26529 completed after 299.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:56:38 fir-md1-s2 kernel: LustreError: 26527:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff986ad4bfb800 ns: mdt-fir-MDT0003_UUID lock: ffff985aeb5557c0/0x7ec9d9e6725a489b lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 581 type: IBT flags: 0x50200400000020 nid: 10.8.18.4@o2ib6 remote: 0xd21f35d88982e28d expref: 8 pid: 26527 timeout: 0 lvb_type: 0 Apr 28 09:56:38 fir-md1-s2 kernel: Lustre: 26527:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff985a117db300 x1631542589608032/t0(0) o101->48f75538-8d4e-d59e-ac26-e59d04e34633@10.8.18.4@o2ib6:8/0 lens 480/536 e 0 to 0 dl 1556470452 ref 1 fl Complete:/0/0 rc -107/-107 Apr 28 09:56:38 fir-md1-s2 kernel: LustreError: 26417:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.31.9@o2ib6: deadline 30:1s ago req@ffff987707670f00 x1631846527026528/t0(0) o101->2943d7c9-ecf1-ed5a-9d88-3a1d89520529@10.8.31.9@o2ib6:7/0 lens 576/0 e 0 to 0 dl 1556470597 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:56:38 fir-md1-s2 kernel: LNet: Skipped 27 previous similar messages Apr 28 09:56:40 fir-md1-s2 kernel: LustreError: 26561:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470510, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a3fea18c0/0x7ec9d9e6725d1b3f lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 582 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26561 timeout: 0 lvb_type: 0 Apr 28 09:56:40 fir-md1-s2 kernel: LustreError: 26561:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 28 09:57:02 fir-md1-s2 kernel: LNet: Service thread pid 27038 was inactive for 200.29s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:57:02 fir-md1-s2 kernel: LNet: Skipped 10 previous similar messages Apr 28 09:57:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470622.27038 Apr 28 09:57:28 fir-md1-s2 kernel: LNet: Service thread pid 26453 was inactive for 200.53s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:57:28 fir-md1-s2 kernel: LNet: Skipped 5 previous similar messages Apr 28 09:57:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470648.26453 Apr 28 09:57:34 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470654.26610 Apr 28 09:58:08 fir-md1-s2 kernel: LustreError: 26864:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470598, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98590ff02ac0/0x7ec9d9e6725d5d10 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 615 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 26864 timeout: 0 lvb_type: 0 Apr 28 09:58:08 fir-md1-s2 kernel: LustreError: 26864:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 28 09:58:26 fir-md1-s2 kernel: LNetError: 25839:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Apr 28 09:58:26 fir-md1-s2 kernel: LNetError: 25839:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (107): c: 7, oc: 0, rc: 8 Apr 28 09:58:30 fir-md1-s2 kernel: LNet: Service thread pid 26561 was inactive for 200.49s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 09:58:30 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 09:58:30 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470710.26561 Apr 28 09:59:08 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.1.13@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987a8175ca40/0x7ec9d9e6725a4966 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 615 type: IBT flags: 0x60200400000020 nid: 10.8.1.13@o2ib6 remote: 0xfaa1a62aa28365cc expref: 14 pid: 26850 timeout: 470142 lvb_type: 0 Apr 28 09:59:08 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 09:59:08 fir-md1-s2 kernel: LNet: Service thread pid 26578 completed after 449.93s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 09:59:08 fir-md1-s2 kernel: Lustre: 26578:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:110s); client may timeout. req@ffff987af5f18000 x1631672944931296/t0(0) o101->1bd832b1-9250-3684-b26a-6a1cc941ff1c@10.9.101.20@o2ib4:18/0 lens 576/0 e 0 to 0 dl 1556470638 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 09:59:08 fir-md1-s2 kernel: Lustre: 26578:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6741 previous similar messages Apr 28 09:59:18 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.1.9@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987a7ea94800/0x7ec9d9e6725df017 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.8.1.9@o2ib6 remote: 0x375d7ebedb6bc280 expref: 12 pid: 26865 timeout: 470152 lvb_type: 0 Apr 28 09:59:18 fir-md1-s2 kernel: Lustre: 26417:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:6s); client may timeout. req@ffff98770a669b00 x1631534746952320/t0(0) o101->147c0c80-0156-d078-a77e-b8af4511cc40@10.8.27.6@o2ib6:12/0 lens 576/0 e 0 to 0 dl 1556470752 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:59:18 fir-md1-s2 kernel: LustreError: 26670:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.106.18@o2ib4: deadline 30:1s ago req@ffff98528ab80c00 x1631540656837136/t0(0) o101->6efc0e4b-1ad3-bb80-daf0-68493389a065@10.9.106.18@o2ib4:17/0 lens 480/0 e 0 to 0 dl 1556470757 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 09:59:18 fir-md1-s2 kernel: LustreError: 26670:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 09:59:18 fir-md1-s2 kernel: Lustre: 26417:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1775 previous similar messages Apr 28 09:59:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 7091b135-6c6b-9bf3-c9af-2b66c7261933 (at 10.9.102.31@o2ib4) Apr 28 09:59:19 fir-md1-s2 kernel: Lustre: Skipped 6350 previous similar messages Apr 28 09:59:58 fir-md1-s2 kernel: LNet: Service thread pid 26679 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 09:59:58 fir-md1-s2 kernel: Pid: 26679, comm: mdt00_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:59:58 fir-md1-s2 kernel: Call Trace: Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:59:58 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:59:58 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:59:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470798.26679 Apr 28 09:59:58 fir-md1-s2 kernel: Pid: 26848, comm: mdt01_101 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:59:58 fir-md1-s2 kernel: Call Trace: Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:59:58 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:59:58 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:59:58 fir-md1-s2 kernel: Pid: 26632, comm: mdt01_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:59:58 fir-md1-s2 kernel: Call Trace: Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:59:58 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:59:58 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:59:58 fir-md1-s2 kernel: Pid: 26864, comm: mdt00_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:59:58 fir-md1-s2 kernel: Call Trace: Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:59:58 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:59:58 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:59:58 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:59:58 fir-md1-s2 kernel: Pid: 26564, comm: mdt02_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 09:59:58 fir-md1-s2 kernel: Call Trace: Apr 28 09:59:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 09:59:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 09:59:59 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 09:59:59 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 09:59:59 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 09:59:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 09:59:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 09:59:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 09:59:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 09:59:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 09:59:59 fir-md1-s2 kernel: LNet: Service thread pid 26435 was inactive for 200.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:00:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470808.26763 Apr 28 10:00:37 fir-md1-s2 kernel: Lustre: 26641:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98511ef60000 x1631680356690528/t0(0) o101->43ccaee7-6859-bcfd-d92c-275d612f13ca@10.8.1.15@o2ib6:12/0 lens 576/0 e 0 to 0 dl 1556470842 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 10:00:37 fir-md1-s2 kernel: Lustre: 26641:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6589 previous similar messages Apr 28 10:00:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client c977be3c-f98f-fbec-3aac-245ba5109971 (at 10.8.30.35@o2ib6) reconnecting Apr 28 10:00:41 fir-md1-s2 kernel: Lustre: Skipped 5946 previous similar messages Apr 28 10:00:48 fir-md1-s2 kernel: LustreError: 26670:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556470758, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a9bd90d80/0x7ec9d9e6725e6ebf lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 617 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26670 timeout: 0 lvb_type: 0 Apr 28 10:00:48 fir-md1-s2 kernel: LustreError: 26670:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 24 previous similar messages Apr 28 10:01:38 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.36@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986be99d8000/0x7ec9d9e6725a4a46 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 617 type: IBT flags: 0x60200400000020 nid: 10.8.12.36@o2ib6 remote: 0xe24c847e61182e34 expref: 20 pid: 26578 timeout: 470292 lvb_type: 0 Apr 28 10:01:38 fir-md1-s2 kernel: LNet: Service thread pid 26646 completed after 599.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:01:38 fir-md1-s2 kernel: Lustre: 26734:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:115s); client may timeout. req@ffff9879d5246300 x1631590123398864/t0(0) o101->9dfc2bda-cf66-13a5-c506-30cd55e4267b@10.9.108.17@o2ib4:13/0 lens 600/0 e 0 to 0 dl 1556470783 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:01:38 fir-md1-s2 kernel: LustreError: 26641:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.1.23@o2ib6: deadline 100:36s ago req@ffff985a5eead700 x1631749642598160/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556470862 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:01:38 fir-md1-s2 kernel: LustreError: 26641:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 10:01:38 fir-md1-s2 kernel: LNet: Skipped 6 previous similar messages Apr 28 10:02:38 fir-md1-s2 kernel: LNet: Service thread pid 26670 was inactive for 200.28s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:02:38 fir-md1-s2 kernel: LNet: Skipped 20 previous similar messages Apr 28 10:02:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470958.26670 Apr 28 10:02:39 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556470959.26697 Apr 28 10:03:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9c58438d-335a-1a4a-8b6e-0ac0b859df8d (at 10.8.12.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9879c16b9c00, cur 1556470985 expire 1556470835 last 1556470758 Apr 28 10:04:08 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.11.27@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986a3f789440/0x7ec9d9e6725a65ba lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 629 type: IBT flags: 0x60200400000020 nid: 10.8.11.27@o2ib6 remote: 0x6808485f0a46f4ac expref: 15 pid: 26832 timeout: 470442 lvb_type: 0 Apr 28 10:04:08 fir-md1-s2 kernel: LNet: Service thread pid 26694 completed after 749.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:04:08 fir-md1-s2 kernel: Lustre: 26529:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:141s); client may timeout. req@ffff986a5a66b900 x1631321350978432/t0(0) o101->116f8ca1-b769-fb6d-d36c-9cf7ced79209@10.8.22.35@o2ib6:17/0 lens 584/0 e 0 to 0 dl 1556470907 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 10:04:08 fir-md1-s2 kernel: Lustre: 26529:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5891 previous similar messages Apr 28 10:04:08 fir-md1-s2 kernel: LustreError: 26534:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff986ad4bfb800 ns: mdt-fir-MDT0003_UUID lock: ffff986a3f78a400/0x7ec9d9e6725a67c7 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 622 type: IBT flags: 0x50200400000020 nid: 10.8.18.4@o2ib6 remote: 0xd21f35d88982e286 expref: 5 pid: 26534 timeout: 0 lvb_type: 0 Apr 28 10:04:08 fir-md1-s2 kernel: LustreError: 26534:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.12.23@o2ib6: deadline 100:64s ago req@ffff98698cb5f800 x1631554888385280/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556470984 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:04:08 fir-md1-s2 kernel: LustreError: 26534:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Apr 28 10:04:08 fir-md1-s2 kernel: LNet: Skipped 19 previous similar messages Apr 28 10:04:59 fir-md1-s2 kernel: LNet: Service thread pid 26832 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:04:59 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 10:04:59 fir-md1-s2 kernel: Pid: 26832, comm: mdt01_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:04:59 fir-md1-s2 kernel: Call Trace: Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:04:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:04:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:04:59 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471099.26832 Apr 28 10:04:59 fir-md1-s2 kernel: Pid: 26595, comm: mdt00_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:04:59 fir-md1-s2 kernel: Call Trace: Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:04:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:04:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:04:59 fir-md1-s2 kernel: Pid: 26783, comm: mdt02_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:04:59 fir-md1-s2 kernel: Call Trace: Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:04:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:04:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:04:59 fir-md1-s2 kernel: Pid: 26640, comm: mdt02_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:04:59 fir-md1-s2 kernel: Call Trace: Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:04:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:04:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:04:59 fir-md1-s2 kernel: Pid: 26734, comm: mdt02_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:04:59 fir-md1-s2 kernel: Call Trace: Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:04:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:04:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:04:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:04:59 fir-md1-s2 kernel: LNet: Service thread pid 26708 was inactive for 201.15s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:04:59 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Apr 28 10:05:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471118.26867 Apr 28 10:05:38 fir-md1-s2 kernel: LustreError: 26529:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471048, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986a91e32640/0x7ec9d9e6725fbcb9 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x20/0x0 rrc: 634 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 26529 timeout: 0 lvb_type: 0 Apr 28 10:05:38 fir-md1-s2 kernel: LustreError: 26529:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 21 previous similar messages Apr 28 10:06:38 fir-md1-s2 kernel: LustreError: 26049:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.20.20@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986a89cacec0/0x7ec9d9e6725a72fd lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 634 type: IBT flags: 0x60200400000020 nid: 10.8.20.20@o2ib6 remote: 0xc5e25a5da5f0138e expref: 26 pid: 26732 timeout: 470592 lvb_type: 0 Apr 28 10:06:38 fir-md1-s2 kernel: LNet: Service thread pid 26731 completed after 899.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:06:38 fir-md1-s2 kernel: Lustre: 26863:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff985b168c1e00 x1631354605498912/t0(0) o101->414f404d-8f8d-e649-adfd-ee21c11784f7@10.8.21.32@o2ib6:8/0 lens 584/0 e 0 to 0 dl 1556471078 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:06:38 fir-md1-s2 kernel: Lustre: 26863:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3360 previous similar messages Apr 28 10:06:38 fir-md1-s2 kernel: LustreError: 26659:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.0.10.3@o2ib7: deadline 100:31s ago req@ffff985b0e64ef00 x1632077944457296/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556471167 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:06:38 fir-md1-s2 kernel: LustreError: 26659:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Apr 28 10:06:38 fir-md1-s2 kernel: LNet: Skipped 32 previous similar messages Apr 28 10:07:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 980c53c1-d60f-2717-9259-d8f7cc6e1f79 (at 10.8.13.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986a77c6f800, cur 1556471228 expire 1556471078 last 1556471001 Apr 28 10:07:29 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471248.26529 Apr 28 10:07:31 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471251.26835 Apr 28 10:07:42 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471262.26877 Apr 28 10:07:43 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471263.26694 Apr 28 10:07:45 fir-md1-s2 kernel: Lustre: Failing over fir-MDT0001 Apr 28 10:07:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:07:45 fir-md1-s2 kernel: LustreError: 26853:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff986aa570d000 ns: mdt-fir-MDT0003_UUID lock: ffff988b361e6e40/0x7ec9d9e6725aa109 lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 646 type: IBT flags: 0x50200400000020 nid: 10.9.104.68@o2ib4 remote: 0xf951267456a9b1a4 expref: 3 pid: 26853 timeout: 0 lvb_type: 0 Apr 28 10:07:45 fir-md1-s2 kernel: LustreError: 26853:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 28 10:07:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.112.4@o2ib4 (stopping) Apr 28 10:07:45 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0003: operation mds_disconnect to node 0@lo failed: rc = -107 Apr 28 10:07:45 fir-md1-s2 kernel: LNet: Service thread pid 26828 completed after 967.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:07:45 fir-md1-s2 kernel: Lustre: 26828:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:37s); client may timeout. req@ffff98699cdf6f00 x1631648527720672/t0(0) o101->0be6d380-cae9-932b-545b-7ee72d9a934d@10.9.103.19@o2ib4:8/0 lens 576/0 e 0 to 0 dl 1556471228 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:07:45 fir-md1-s2 kernel: Lustre: 26828:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2755 previous similar messages Apr 28 10:07:45 fir-md1-s2 kernel: Lustre: Skipped 657 previous similar messages Apr 28 10:07:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.8.24.33@o2ib6 (stopping) Apr 28 10:07:46 fir-md1-s2 kernel: Lustre: Skipped 27 previous similar messages Apr 28 10:07:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.102.34@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:07:46 fir-md1-s2 kernel: Lustre: server umount fir-MDT0001 complete Apr 28 10:07:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.2.24@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:07:46 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Apr 28 10:07:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.105.49@o2ib4 (stopping) Apr 28 10:07:47 fir-md1-s2 kernel: Lustre: Skipped 16 previous similar messages Apr 28 10:07:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.106.59@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:07:47 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Apr 28 10:07:48 fir-md1-s2 kernel: Lustre: server umount fir-MDT0003 complete Apr 28 10:07:49 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.201@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 10:07:49 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.211@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 10:07:49 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 10 previous similar messages Apr 28 10:07:51 fir-md1-s2 kernel: LNet: Removed LNI 10.0.10.52@o2ib7 Apr 28 10:07:58 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 28 10:07:58 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 28 10:07:58 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl7 Apr 28 10:07:58 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 28 10:07:58 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 28 10:09:00 fir-md1-s2 kernel: LNetError: 27803:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Apr 28 10:09:00 fir-md1-s2 kernel: LNetError: 27803:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 28 10:09:01 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 10:09:01 fir-md1-s2 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Apr 28 10:09:01 fir-md1-s2 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:09:01 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:09:01 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.104.11@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:09:01 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 10:09:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.107.30@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:09:02 fir-md1-s2 kernel: LustreError: Skipped 18 previous similar messages Apr 28 10:09:03 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.20.16@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:09:03 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Apr 28 10:09:05 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.13.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:09:05 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Apr 28 10:09:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.8.18.1@o2ib6 (not set up) Apr 28 10:09:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 10:09:06 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 28 10:09:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 28 10:09:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 28 10:09:07 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0003: operation mds_connect to node 10.0.10.51@o2ib7 failed: rc = -114 Apr 28 10:09:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 393bd61e-5b20-710b-902e-143334afc70c (at 10.9.115.3@o2ib4) Apr 28 10:09:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 10:09:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 32e552d9-553a-99e0-f22a-c15ac116b169 (at 10.9.108.5@o2ib4) Apr 28 10:09:10 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to e2de8c8f-4e69-6fc5-e549-d898d3513fdc (at 10.8.11.21@o2ib6) Apr 28 10:09:10 fir-md1-s2 kernel: Lustre: Skipped 18 previous similar messages Apr 28 10:09:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 9101e47c-5087-9ebf-bb20-6ff2bf817bf0 (at 10.9.101.32@o2ib4) Apr 28 10:09:12 fir-md1-s2 kernel: Lustre: Skipped 388 previous similar messages Apr 28 10:09:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 1a944508-a353-6d52-a7d4-1133aba4850b (at 10.9.101.40@o2ib4) Apr 28 10:09:16 fir-md1-s2 kernel: Lustre: Skipped 2018 previous similar messages Apr 28 10:09:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 10.0.10.102@o2ib7 (at 10.0.10.102@o2ib7) Apr 28 10:09:27 fir-md1-s2 kernel: Lustre: Skipped 242 previous similar messages Apr 28 10:09:32 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:09:32 fir-md1-s2 kernel: LustreError: Skipped 7 previous similar messages Apr 28 10:09:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) Apr 28 10:09:43 fir-md1-s2 kernel: Lustre: Skipped 48 previous similar messages Apr 28 10:09:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 33aa3088-5e27-e4ab-6112-0fe513b018fa(at 10.8.13.20@o2ib6), waiting for 1328 known clients (1103 recovered, 223 in progress, and 0 evicted) already passed deadline 3:15 Apr 28 10:09:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:09:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery already passed deadline 1:42, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 10:09:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery over after 0:49, of 1327 clients 1327 recovered and 0 were evicted. Apr 28 10:10:23 fir-md1-s2 kernel: Lustre: 28917:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff985ac655e300 x1631311404390784/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:28/0 lens 480/568 e 0 to 0 dl 1556471428 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 10:10:23 fir-md1-s2 kernel: Lustre: 28817:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff988b35703900 x1631630531441232/t0(0) o101->1f0d761d-c9df-f4c8-dc40-a56f3a650b4e@10.9.108.72@o2ib4:28/0 lens 576/3264 e 0 to 0 dl 1556471428 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 10:10:23 fir-md1-s2 kernel: Lustre: 28817:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 653 previous similar messages Apr 28 10:10:26 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b01212100 x1631559495590400/t0(0) o101->59eb580d-79f6-c9fa-7886-60aca8aaf8c9@10.9.101.22@o2ib4:1/0 lens 584/0 e 0 to 0 dl 1556471431 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:10:26 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Apr 28 10:10:28 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b01214200 x1631727741567088/t0(0) o101->bee493fe-0255-e0d8-44b9-deb38a2dee88@10.9.0.61@o2ib4:3/0 lens 584/0 e 0 to 0 dl 1556471433 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:10:28 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 28 10:10:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 3926cda2-471a-9775-ffa6-15d857ceb079 (at 10.8.13.2@o2ib6) reconnecting Apr 28 10:10:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 10947b9b-0f5a-c39c-5999-172515a96889 (at 10.8.8.10@o2ib6) Apr 28 10:10:29 fir-md1-s2 kernel: Lustre: Skipped 22 previous similar messages Apr 28 10:10:29 fir-md1-s2 kernel: Lustre: Skipped 218 previous similar messages Apr 28 10:10:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 5db7ce18-3e24-dca8-3c1c-cbb3c3f8c6de (at 10.8.1.14@o2ib6) reconnecting Apr 28 10:10:29 fir-md1-s2 kernel: Lustre: Skipped 158 previous similar messages Apr 28 10:10:32 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b01214b00 x1631542889028512/t0(0) o41->be42b497-ab1b-8d58-3101-014aad577cfc@10.8.27.35@o2ib6:7/0 lens 440/0 e 0 to 0 dl 1556471437 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:10:32 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 28 10:10:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9f83e13f-6dc4-9163-be6d-ae55a9f62b03 (at 10.8.27.2@o2ib6) reconnecting Apr 28 10:10:33 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 28 10:10:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client be42b497-ab1b-8d58-3101-014aad577cfc (at 10.8.27.35@o2ib6) reconnecting Apr 28 10:10:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:10:50 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b2eaba700 x1631542665908064/t0(0) o41->69e867f7-2c34-9281-0411-6ff880d43ef5@10.8.28.11@o2ib6:25/0 lens 440/0 e 0 to 0 dl 1556471455 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:11:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 195b2aff-dd7c-f763-3c56-263374ade64c (at 10.8.7.12@o2ib6) reconnecting Apr 28 10:11:00 fir-md1-s2 kernel: Lustre: Skipped 66 previous similar messages Apr 28 10:11:10 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b3889cb00 x1631559495593504/t0(0) o101->59eb580d-79f6-c9fa-7886-60aca8aaf8c9@10.9.101.22@o2ib4:15/0 lens 568/0 e 0 to 0 dl 1556471475 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:11:10 fir-md1-s2 kernel: Lustre: 28861:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 187 previous similar messages Apr 28 10:11:28 fir-md1-s2 kernel: LustreError: 28043:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471398, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a4938ec00/0x5431c80f6613be7e lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 587 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 28043 timeout: 0 lvb_type: 0 Apr 28 10:11:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471488.28595 Apr 28 10:11:28 fir-md1-s2 kernel: LustreError: 28043:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 244 previous similar messages Apr 28 10:11:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9a8bc7f0-674a-721d-c255-50108001b9f0 (at 10.8.0.66@o2ib6) reconnecting Apr 28 10:11:31 fir-md1-s2 kernel: Lustre: Skipped 307 previous similar messages Apr 28 10:11:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 97b55247-502a-4b0e-6bc7-38b7d7a6fbce (at 10.9.101.31@o2ib4) Apr 28 10:11:34 fir-md1-s2 kernel: Lustre: Skipped 1166 previous similar messages Apr 28 10:11:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: recovery is timed out, evict stale exports Apr 28 10:11:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: disconnecting 1 stale clients Apr 28 10:11:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery already passed deadline 4:00, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 10:11:36 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 28 10:11:45 fir-md1-s2 kernel: Lustre: 28764:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a22d1e300 x1631619675585632/t0(0) o41->3784f7ac-597b-9148-0293-184261b56c26@10.9.104.55@o2ib4:20/0 lens 440/0 e 0 to 0 dl 1556471510 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:11:45 fir-md1-s2 kernel: Lustre: 28764:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 163 previous similar messages Apr 28 10:11:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 3784f7ac-597b-9148-0293-184261b56c26 (at 10.9.104.55@o2ib4) reconnecting Apr 28 10:11:51 fir-md1-s2 kernel: Lustre: Skipped 386 previous similar messages Apr 28 10:11:59 fir-md1-s2 kernel: LustreError: 28804:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471429, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a1ab19b00/0x5431c80f66143dc0 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 590 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28804 timeout: 0 lvb_type: 0 Apr 28 10:11:59 fir-md1-s2 kernel: LustreError: 28804:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 49 previous similar messages Apr 28 10:12:27 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.18@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986b33b2f2c0/0x5431c80f6613b7af lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 49 type: IBT flags: 0x60000400000020 nid: 10.8.7.18@o2ib6 remote: 0x98ee8a55dcd7f8f expref: 10 pid: 28363 timeout: 470941 lvb_type: 0 Apr 28 10:12:27 fir-md1-s2 kernel: Lustre: 28709:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:57s); client may timeout. req@ffff9850b4b30900 x1631534820807696/t0(0) o101->205cae49-b70d-4635-7302-5f62b1c05bbe@10.9.102.18@o2ib4:0/0 lens 576/0 e 0 to 0 dl 1556471490 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:12:27 fir-md1-s2 kernel: LustreError: 28767:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.13.20@o2ib6: deadline 100:28s ago req@ffff986b2eab8900 x1632078388012400/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556471519 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:12:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 33aa3088-5e27-e4ab-6112-0fe513b018fa(at 10.8.13.20@o2ib6), waiting for 1328 known clients (1325 recovered, 2 in progress, and 1 evicted) already passed deadline 4:51 Apr 28 10:12:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 3:21, of 1328 clients 1327 recovered and 1 was evicted. Apr 28 10:12:27 fir-md1-s2 kernel: Lustre: 28709:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 693 previous similar messages Apr 28 10:12:30 fir-md1-s2 kernel: LustreError: 28811:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471460, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987ad1c6e300/0x5431c80f661455de lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 28811 timeout: 0 lvb_type: 0 Apr 28 10:12:30 fir-md1-s2 kernel: LustreError: 28811:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 28 10:12:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9a8bc7f0-674a-721d-c255-50108001b9f0 (at 10.8.0.66@o2ib6) reconnecting Apr 28 10:12:33 fir-md1-s2 kernel: Lustre: Skipped 386 previous similar messages Apr 28 10:12:52 fir-md1-s2 kernel: Lustre: 28975:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984caf675100 x1631535197373408/t0(0) o101->ebdeb423-653d-6c85-828e-54f29dd973b0@10.9.101.66@o2ib4:27/0 lens 376/0 e 0 to 0 dl 1556471577 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:12:52 fir-md1-s2 kernel: Lustre: 28975:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 227 previous similar messages Apr 28 10:13:01 fir-md1-s2 kernel: LustreError: 28750:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471491, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9870ceaf5100/0x5431c80f6614640f lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 600 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28750 timeout: 0 lvb_type: 0 Apr 28 10:13:01 fir-md1-s2 kernel: LustreError: 28750:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 28 10:13:18 fir-md1-s2 kernel: LNet: Service thread pid 28874 was inactive for 200.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:13:18 fir-md1-s2 kernel: Pid: 28874, comm: mdt01_086 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:13:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471598.28874 Apr 28 10:13:18 fir-md1-s2 kernel: Pid: 28436, comm: mdt01_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:13:18 fir-md1-s2 kernel: Pid: 28383, comm: mdt01_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:13:18 fir-md1-s2 kernel: Pid: 28596, comm: mdt01_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:13:18 fir-md1-s2 kernel: LNet: Service thread pid 28964 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:13:18 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 10:13:18 fir-md1-s2 kernel: Pid: 28964, comm: mdt00_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:13:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:13:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:13:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:13:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:13:18 fir-md1-s2 kernel: LNet: Service thread pid 28784 was inactive for 200.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:13:32 fir-md1-s2 kernel: LustreError: 28663:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471522, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a6d006300/0x5431c80f66147ba1 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 28663 timeout: 0 lvb_type: 0 Apr 28 10:13:32 fir-md1-s2 kernel: LustreError: 28663:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 28 10:13:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b5280270-3b22-224e-0daa-bad5776be543 (at 10.9.103.24@o2ib4) reconnecting Apr 28 10:13:39 fir-md1-s2 kernel: Lustre: Skipped 2475 previous similar messages Apr 28 10:13:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to b4d83e54-8cb6-ea71-956c-e7a98e667a27 (at 10.8.27.26@o2ib6) Apr 28 10:13:45 fir-md1-s2 kernel: Lustre: Skipped 2877 previous similar messages Apr 28 10:13:49 fir-md1-s2 kernel: LNet: Service thread pid 28963 was inactive for 200.76s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:13:49 fir-md1-s2 kernel: LNet: Skipped 301 previous similar messages Apr 28 10:13:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471629.28963 Apr 28 10:13:57 fir-md1-s2 kernel: LustreError: 28380:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471547, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98561fb9ca40/0x5431c80f661488c1 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x20/0x0 rrc: 600 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28380 timeout: 0 lvb_type: 0 Apr 28 10:13:57 fir-md1-s2 kernel: LustreError: 28380:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Apr 28 10:14:20 fir-md1-s2 kernel: LNet: Service thread pid 28828 was inactive for 200.48s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:14:20 fir-md1-s2 kernel: LNet: Skipped 12 previous similar messages Apr 28 10:14:20 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471660.28828 Apr 28 10:14:51 fir-md1-s2 kernel: LNet: Service thread pid 28851 was inactive for 200.16s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:14:51 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Apr 28 10:14:51 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471691.28851 Apr 28 10:14:57 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.27.34@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987b3afd98c0/0x5431c80f6613c164 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 600 type: IBT flags: 0x60200400000020 nid: 10.8.27.34@o2ib6 remote: 0xc001f692fec9db6c expref: 21 pid: 28620 timeout: 471091 lvb_type: 0 Apr 28 10:14:57 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 10:14:57 fir-md1-s2 kernel: LNet: Service thread pid 28376 completed after 299.26s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:14:57 fir-md1-s2 kernel: Lustre: 28843:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff986b3ee55700 x1631898800351792/t0(0) o101->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:27/0 lens 1768/0 e 0 to 0 dl 1556471577 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:14:57 fir-md1-s2 kernel: LustreError: 28843:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.12.6@o2ib6: deadline 30:120s ago req@ffff986a8bc34200 x1631386288340896/t0(0) o101->@:27/0 lens 576/0 e 0 to 0 dl 1556471577 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:14:57 fir-md1-s2 kernel: LustreError: 28843:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 10:14:57 fir-md1-s2 kernel: LNet: Skipped 20 previous similar messages Apr 28 10:15:03 fir-md1-s2 kernel: Lustre: 28383:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9867f9e1c800 x1631624065417104/t0(0) o101->0d8fe43d-85f9-8061-e5fc-2e0ec8fbd940@10.8.7.11@o2ib6:7/0 lens 576/3264 e 0 to 0 dl 1556471707 ref 2 fl Interpret:/0/0 rc 0/0 Apr 28 10:15:03 fir-md1-s2 kernel: Lustre: 28383:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6477 previous similar messages Apr 28 10:15:22 fir-md1-s2 kernel: LNet: Service thread pid 28850 was inactive for 200.41s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:15:22 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Apr 28 10:15:22 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471722.28850 Apr 28 10:15:47 fir-md1-s2 kernel: LNet: Service thread pid 28424 was inactive for 200.24s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:15:47 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Apr 28 10:15:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471747.28424 Apr 28 10:15:48 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471748.28902 Apr 28 10:15:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d3e22dd2-d25d-28e8-5f86-5d27043eaa8d (at 10.8.7.18@o2ib6) reconnecting Apr 28 10:15:52 fir-md1-s2 kernel: Lustre: Skipped 3378 previous similar messages Apr 28 10:15:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471753.28910 Apr 28 10:16:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 33aa3088-5e27-e4ab-6112-0fe513b018fa (at 10.8.13.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986ac85fe400, cur 1556471774 expire 1556471624 last 1556471547 Apr 28 10:16:27 fir-md1-s2 kernel: LustreError: 28633:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471697, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9873bbe3e300/0x5431c80f66154925 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 628 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28633 timeout: 0 lvb_type: 0 Apr 28 10:16:27 fir-md1-s2 kernel: LustreError: 28633:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 17 previous similar messages Apr 28 10:17:09 fir-md1-s2 kernel: LustreError: 28630:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471739, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985a3486c800/0x5431c80f66161487 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 44 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28630 timeout: 0 lvb_type: 0 Apr 28 10:17:09 fir-md1-s2 kernel: LustreError: 28630:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Apr 28 10:17:27 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.21@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987ae7e2e780/0x5431c80f6614ae0d lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 44 type: IBT flags: 0x60200400000020 nid: 10.8.12.21@o2ib6 remote: 0x3692d81195a69f1c expref: 18 pid: 28767 timeout: 471241 lvb_type: 0 Apr 28 10:17:27 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 10:17:27 fir-md1-s2 kernel: Lustre: 28824:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:89s); client may timeout. req@ffff987a803b6f00 x1631587951358528/t0(0) o101->6a8eb6f9-856d-6387-d01b-f8aa660b75d4@10.9.106.7@o2ib4:27/0 lens 480/0 e 0 to 0 dl 1556471758 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:17:27 fir-md1-s2 kernel: LNet: Service thread pid 28644 completed after 449.26s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:17:27 fir-md1-s2 kernel: LustreError: 28645:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.34@o2ib6: deadline 100:50s ago req@ffff987af56c2400 x1631545960380464/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556471797 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:17:27 fir-md1-s2 kernel: LustreError: 28645:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 10:17:27 fir-md1-s2 kernel: Lustre: 28824:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8446 previous similar messages Apr 28 10:18:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to eb03d68c-4477-fd95-4120-c15d0364314e (at 10.8.22.20@o2ib6) Apr 28 10:18:03 fir-md1-s2 kernel: Lustre: Skipped 6761 previous similar messages Apr 28 10:18:17 fir-md1-s2 kernel: LNet: Service thread pid 28595 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:18:17 fir-md1-s2 kernel: Pid: 28595, comm: mdt01_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:17 fir-md1-s2 kernel: Call Trace: Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:17 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:17 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:18:17 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471897.28595 Apr 28 10:18:17 fir-md1-s2 kernel: Pid: 28678, comm: mdt00_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:17 fir-md1-s2 kernel: Call Trace: Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:17 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:17 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:17 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:18:17 fir-md1-s2 kernel: Pid: 28633, comm: mdt02_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:17 fir-md1-s2 kernel: Call Trace: Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:18:18 fir-md1-s2 kernel: Pid: 28641, comm: mdt00_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:18:18 fir-md1-s2 kernel: Pid: 28899, comm: mdt01_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:18:18 fir-md1-s2 kernel: Call Trace: Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:18:18 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:18:18 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:18:18 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:18:18 fir-md1-s2 kernel: LNet: Service thread pid 28845 was inactive for 200.92s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:18:18 fir-md1-s2 kernel: LNet: Skipped 11 previous similar messages Apr 28 10:18:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471908.28834 Apr 28 10:18:48 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471928.28987 Apr 28 10:18:57 fir-md1-s2 kernel: LustreError: 28933:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471847, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9878e5ba1d40/0x5431c80f66165769 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 626 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28933 timeout: 0 lvb_type: 0 Apr 28 10:18:57 fir-md1-s2 kernel: LustreError: 28933:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 28 10:18:59 fir-md1-s2 kernel: LNet: Service thread pid 28617 was inactive for 200.02s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:18:59 fir-md1-s2 kernel: LNet: Skipped 16 previous similar messages Apr 28 10:18:59 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556471939.28617 Apr 28 10:19:19 fir-md1-s2 kernel: Lustre: 28607:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff985af3b9f500 x1631546084508432/t0(0) o41->9d74b6ae-a7c7-7ba5-ce12-a71a42e88718@10.8.27.21@o2ib6:24/0 lens 440/0 e 0 to 0 dl 1556471964 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 10:19:19 fir-md1-s2 kernel: Lustre: 28607:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6824 previous similar messages Apr 28 10:19:57 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.12@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff985a3486c800/0x5431c80f66161487 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 51 type: IBT flags: 0x60200400000020 nid: 10.8.12.12@o2ib6 remote: 0xbf262a216ffb62eb expref: 22 pid: 28630 timeout: 471391 lvb_type: 0 Apr 28 10:19:57 fir-md1-s2 kernel: LustreError: 28001:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 10:19:57 fir-md1-s2 kernel: Lustre: 28645:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:104s); client may timeout. req@ffff986b39123900 x1631546035575936/t0(0) o101->d9dead47-18af-0bad-b841-6c3aac9d942a@10.8.28.7@o2ib6:13/0 lens 576/0 e 0 to 0 dl 1556471893 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 10:19:57 fir-md1-s2 kernel: LNet: Service thread pid 28688 completed after 599.25s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:19:57 fir-md1-s2 kernel: LNet: Skipped 24 previous similar messages Apr 28 10:19:57 fir-md1-s2 kernel: LustreError: 28615:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.27.7@o2ib6: deadline 100:45s ago req@ffff9861d6e62100 x1631642299818992/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556471952 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:19:57 fir-md1-s2 kernel: LustreError: 28615:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 28 10:19:57 fir-md1-s2 kernel: Lustre: 28645:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3057 previous similar messages Apr 28 10:20:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2464bc60-606d-b5b0-408f-f61066cf3c2a (at 10.9.108.71@o2ib4) reconnecting Apr 28 10:20:12 fir-md1-s2 kernel: Lustre: Skipped 6081 previous similar messages Apr 28 10:20:47 fir-md1-s2 kernel: LNet: Service thread pid 28648 was inactive for 200.20s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:20:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472047.28648 Apr 28 10:20:48 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472048.28811 Apr 28 10:20:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472049.28975 Apr 28 10:21:03 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472063.28015 Apr 28 10:21:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472078.28735 Apr 28 10:21:27 fir-md1-s2 kernel: LustreError: 28622:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471997, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff984caf66e300/0x5431c80f66178bcb lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 670 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 28622 timeout: 0 lvb_type: 0 Apr 28 10:21:27 fir-md1-s2 kernel: LustreError: 28622:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 45 previous similar messages Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: Failing over fir-MDT0001 Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: 28654:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff987b3ec02400 ns: mdt-fir-MDT0003_UUID lock: ffff987a757c0000/0x5431c80f6613d1fd lrc: 3/0,0 mode: PR/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x1b/0x0 rrc: 668 type: IBT flags: 0x50200400000020 nid: 10.8.2.28@o2ib6 remote: 0xf8fb39dd0338d640 expref: 4 pid: 28654 timeout: 0 lvb_type: 0 Apr 28 10:21:46 fir-md1-s2 kernel: LNet: Service thread pid 28654 completed after 707.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0000-osp-MDT0001: operation mds_disconnect to node 10.0.10.51@o2ib7 failed: rc = -19 Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: 29936:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0000-osp-MDT0001: can't disconnect: rc = -19 Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: 28705:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:58s); client may timeout. req@ffff985b1a574200 x1631687413388064/t0(0) o101->5222ca4a-ce85-45b5-8641-48685c5cb7f4@10.8.30.31@o2ib6:18/0 lens 376/0 e 0 to 0 dl 1556472048 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.8.13.20@o2ib6 (stopping) Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: 29936:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0001-mdtlov: error cleaning up LOD index 0: cmd 0xcf031: rc = -19 Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: 28020:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.7.10@o2ib6: deadline 30:64s ago req@ffff986b3969da00 x1631532668161920/t0(0) o400->@:12/0 lens 224/0 e 0 to 0 dl 1556472042 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:21:46 fir-md1-s2 kernel: LustreError: 28020:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 10:21:46 fir-md1-s2 kernel: LNet: Skipped 299 previous similar messages Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.103.28@o2ib4 (stopping) Apr 28 10:21:46 fir-md1-s2 kernel: Lustre: Skipped 785 previous similar messages Apr 28 10:21:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.8.15@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:21:47 fir-md1-s2 kernel: Lustre: server umount fir-MDT0001 complete Apr 28 10:21:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:21:49 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.209@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 10:21:49 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.212@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 28 10:21:49 fir-md1-s2 kernel: LNetError: 19365:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 18 previous similar messages Apr 28 10:21:50 fir-md1-s2 kernel: LNet: Removed LNI 10.0.10.52@o2ib7 Apr 28 10:22:02 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 28 10:22:02 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 28 10:22:03 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl7 Apr 28 10:22:03 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 28 10:22:03 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 28 10:22:07 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 28 10:22:07 fir-md1-s2 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Apr 28 10:22:07 fir-md1-s2 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:22:07 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 28 10:22:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.27.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:22:08 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 10:22:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.103.41@o2ib4 (not set up) Apr 28 10:22:08 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0003: operation mds_connect to node 10.0.10.51@o2ib7 failed: rc = -114 Apr 28 10:22:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 10:22:08 fir-md1-s2 kernel: Lustre: fir-MDD0003: changelog on Apr 28 10:22:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: in recovery but waiting for the first client to connect Apr 28 10:22:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 10:22:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.105.49@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:22:08 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Apr 28 10:22:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.105.20@o2ib4) Apr 28 10:22:09 fir-md1-s2 kernel: Lustre: Skipped 34 previous similar messages Apr 28 10:22:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.103.27@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:22:09 fir-md1-s2 kernel: LustreError: Skipped 44 previous similar messages Apr 28 10:22:10 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.104.27@o2ib4) Apr 28 10:22:10 fir-md1-s2 kernel: Lustre: Skipped 19 previous similar messages Apr 28 10:22:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to f28b9d7c-ae20-e506-5bbf-0fe9ac4b3bdd (at 10.9.108.53@o2ib4) Apr 28 10:22:11 fir-md1-s2 kernel: Lustre: Skipped 189 previous similar messages Apr 28 10:22:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.106.25@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:22:11 fir-md1-s2 kernel: LustreError: Skipped 250 previous similar messages Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to fb4eab6b-7253-4fde-536d-07e03dd4756a (at 10.8.21.35@o2ib6) Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.112.17@o2ib4 (not set up) Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: Skipped 499 previous similar messages Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client 16146ee7-6409-e879-4132-04fda077b6bd(at 10.8.13.20@o2ib6), waiting for 1327 known clients (734 recovered, 42 in progress, and 0 evicted) already passed deadline 2:35 Apr 28 10:22:13 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0001: operation mds_connect to node 10.0.10.51@o2ib7 failed: rc = -114 Apr 28 10:22:13 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 28 10:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1327 clients reconnect Apr 28 10:22:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d9168572-e8eb-9c93-f44e-07dfcfc882d0 (at 10.8.23.24@o2ib6) Apr 28 10:22:17 fir-md1-s2 kernel: Lustre: Skipped 1174 previous similar messages Apr 28 10:22:21 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.0.10.106@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:22:21 fir-md1-s2 kernel: LustreError: Skipped 105 previous similar messages Apr 28 10:22:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 16146ee7-6409-e879-4132-04fda077b6bd(at 10.8.13.20@o2ib6), waiting for 1327 known clients (965 recovered, 193 in progress, and 0 evicted) already passed deadline 2:39 Apr 28 10:22:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.101@o2ib7 (at 10.0.10.101@o2ib7) Apr 28 10:22:26 fir-md1-s2 kernel: Lustre: Skipped 725 previous similar messages Apr 28 10:22:31 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.51@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 28 10:22:31 fir-md1-s2 kernel: LustreError: Skipped 12 previous similar messages Apr 28 10:22:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) Apr 28 10:22:46 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Apr 28 10:22:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 16146ee7-6409-e879-4132-04fda077b6bd(at 10.8.13.20@o2ib6), waiting for 1327 known clients (1091 recovered, 233 in progress, and 0 evicted) already passed deadline 3:11 Apr 28 10:22:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:23:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery already passed deadline 1:35, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 28 10:23:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 28 10:23:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery over after 0:56, of 1327 clients 1327 recovered and 0 were evicted. Apr 28 10:23:29 fir-md1-s2 kernel: Lustre: 31383:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff988b363fda00 x1631567053474304/t0(0) o101->6875868d-576c-ea3f-adc1-4798717b5abd@10.9.103.1@o2ib4:4/0 lens 568/0 e 0 to 0 dl 1556472214 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:23:29 fir-md1-s2 kernel: Lustre: 31383:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1781 previous similar messages Apr 28 10:23:31 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a53bf0f00 x1631558875005136/t0(0) o101->aebbd095-b50a-7733-a754-e1423fab741f@10.8.18.10@o2ib6:5/0 lens 576/0 e 0 to 0 dl 1556472215 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:23:31 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 26 previous similar messages Apr 28 10:23:32 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986ab471b600 x1631534836021392/t0(0) o101->712591ba-92f6-6e19-2523-c1aaf8221bbf@10.9.106.62@o2ib4:6/0 lens 576/0 e 0 to 0 dl 1556472216 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:23:32 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Apr 28 10:23:34 fir-md1-s2 kernel: Lustre: 31224:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98795dba8f00 x1631542999066848/t0(0) o101->acb643ef-75ad-6f92-b388-57634462f54f@10.8.28.6@o2ib6:8/0 lens 576/0 e 0 to 0 dl 1556472218 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:23:34 fir-md1-s2 kernel: Lustre: 31224:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Apr 28 10:23:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting Apr 28 10:23:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.0.62@o2ib4) Apr 28 10:23:35 fir-md1-s2 kernel: Lustre: Skipped 91 previous similar messages Apr 28 10:23:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ebb81670-151c-6f45-4a7e-a2c58a6ba974 (at 10.8.1.1@o2ib6) reconnecting Apr 28 10:23:35 fir-md1-s2 kernel: Lustre: Skipped 949 previous similar messages Apr 28 10:23:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9fc57edd-fd0b-00bd-7b44-a9b87664c63a (at 10.8.10.2@o2ib6) reconnecting Apr 28 10:23:36 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Apr 28 10:23:38 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986ade772400 x1631542774054176/t0(0) o101->cc4645df-283e-39bd-d7f5-be3d60722b5c@10.9.108.3@o2ib4:12/0 lens 576/0 e 0 to 0 dl 1556472222 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:23:38 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 98 previous similar messages Apr 28 10:23:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 088aec52-9508-3401-0290-3c12a91037c4 (at 10.9.106.9@o2ib4) reconnecting Apr 28 10:23:39 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Apr 28 10:23:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a521b5a8-ea52-b6d4-a4ea-eb91fa13faf3 (at 10.8.11.34@o2ib6) reconnecting Apr 28 10:23:44 fir-md1-s2 kernel: Lustre: Skipped 63 previous similar messages Apr 28 10:23:46 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986996efda00 x1631322804179808/t0(0) o101->b43efcd9-70d5-e061-8a32-efce0b4f865e@10.8.21.34@o2ib6:20/0 lens 576/0 e 0 to 0 dl 1556472230 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:23:46 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Apr 28 10:23:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client cb1e051f-12ef-c393-c1de-bc60ba01debc (at 10.8.13.11@o2ib6) reconnecting Apr 28 10:23:52 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 28 10:24:02 fir-md1-s2 kernel: Lustre: 31260:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff987ab00de300 x1631687413425008/t0(0) o101->5222ca4a-ce85-45b5-8641-48685c5cb7f4@10.8.30.31@o2ib6:7/0 lens 576/0 e 0 to 0 dl 1556472247 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:24:02 fir-md1-s2 kernel: Lustre: 31260:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1280 previous similar messages Apr 28 10:24:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client bb7e5f28-aaf5-df1e-4fde-6e2047c9a824 (at 10.8.10.36@o2ib6) reconnecting Apr 28 10:24:08 fir-md1-s2 kernel: Lustre: Skipped 959 previous similar messages Apr 28 10:24:34 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b2f31b000 x1631585670064864/t0(0) o101->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:9/0 lens 576/0 e 0 to 0 dl 1556472279 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 10:24:34 fir-md1-s2 kernel: Lustre: 31330:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1192 previous similar messages Apr 28 10:24:34 fir-md1-s2 kernel: LustreError: 30851:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472184, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff987a54eb0240/0x757b416b99eb9eee lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 599 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 30851 timeout: 0 lvb_type: 0 Apr 28 10:24:34 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472274.30828 Apr 28 10:24:34 fir-md1-s2 kernel: LustreError: 30851:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 291 previous similar messages Apr 28 10:24:34 fir-md1-s2 kernel: LustreError: 31349:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472184, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff98791ea98240/0x757b416b99ec407a lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 599 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 31349 timeout: 0 lvb_type: 0 Apr 28 10:24:34 fir-md1-s2 kernel: LustreError: 31349:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Apr 28 10:24:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 77cf16cf-22cf-31cd-be2c-d11a46be6b40 (at 10.8.2.18@o2ib6) Apr 28 10:24:39 fir-md1-s2 kernel: Lustre: Skipped 3020 previous similar messages Apr 28 10:24:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 16749711-2a27-479b-83fc-14b2199ba6af (at 10.9.104.18@o2ib4) reconnecting Apr 28 10:24:40 fir-md1-s2 kernel: Lustre: Skipped 1020 previous similar messages Apr 28 10:25:05 fir-md1-s2 kernel: LustreError: 31413:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472215, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff984f497b5e80/0x757b416b99ec5bc4 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x13/0x8 rrc: 55 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 31413 timeout: 0 lvb_type: 0 Apr 28 10:25:05 fir-md1-s2 kernel: LustreError: 31413:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Apr 28 10:25:33 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.12.21@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987afb64e0c0/0x757b416b99eb9e93 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 55 type: IBT flags: 0x60000400000020 nid: 10.8.12.21@o2ib6 remote: 0x3692d81195a69f1c expref: 15 pid: 30813 timeout: 471727 lvb_type: 0 Apr 28 10:25:33 fir-md1-s2 kernel: Lustre: 31191:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:119s); client may timeout. req@ffff98684afa4e00 x1631596192045728/t0(0) o101->d3e22dd2-d25d-28e8-5f86-5d27043eaa8d@10.8.7.18@o2ib6:4/0 lens 576/0 e 0 to 0 dl 1556472214 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 10:25:33 fir-md1-s2 kernel: LustreError: 31203:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.13.20@o2ib6: deadline 100:34s ago req@ffff986a1c291200 x1632079152438224/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556472299 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:25:33 fir-md1-s2 kernel: LustreError: 31203:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 28 10:25:33 fir-md1-s2 kernel: Lustre: 31191:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5144 previous similar messages Apr 28 10:25:36 fir-md1-s2 kernel: LustreError: 31411:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472246, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9854f2feee40/0x757b416b99ec8c5b lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x20/0x0 rrc: 619 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 31411 timeout: 0 lvb_type: 0 Apr 28 10:25:36 fir-md1-s2 kernel: LustreError: 31411:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 28 10:25:38 fir-md1-s2 kernel: Lustre: 31191:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986b3788a400 x1631707077626800/t0(0) o101->69dab34d-17fd-48dd-e004-ac65b026604e@10.8.24.10@o2ib6:13/0 lens 576/0 e 0 to 0 dl 1556472343 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:25:38 fir-md1-s2 kernel: Lustre: 31191:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2585 previous similar messages Apr 28 10:25:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 13458280-a046-3a7f-2bec-0301aba013a1 (at 10.8.28.12@o2ib6) reconnecting Apr 28 10:25:44 fir-md1-s2 kernel: Lustre: Skipped 1891 previous similar messages Apr 28 10:26:07 fir-md1-s2 kernel: LustreError: 31382:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472277, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9859b0e806c0/0x757b416b99ecc0f0 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 621 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 31382 timeout: 0 lvb_type: 0 Apr 28 10:26:24 fir-md1-s2 kernel: LNet: Service thread pid 30852 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:26:24 fir-md1-s2 kernel: Pid: 30852, comm: mdt00_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s2 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472384.30852 Apr 28 10:26:24 fir-md1-s2 kernel: Pid: 30836, comm: mdt01_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s2 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:26:24 fir-md1-s2 kernel: Pid: 31071, comm: mdt01_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:24 fir-md1-s2 kernel: Call Trace: Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:26:25 fir-md1-s2 kernel: Pid: 31073, comm: mdt01_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:25 fir-md1-s2 kernel: Call Trace: Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:26:25 fir-md1-s2 kernel: LNet: Service thread pid 31075 was inactive for 201.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:26:25 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Apr 28 10:26:25 fir-md1-s2 kernel: Pid: 31075, comm: mdt01_018 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:26:25 fir-md1-s2 kernel: Call Trace: Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:26:25 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:26:25 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:26:25 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:26:25 fir-md1-s2 kernel: LNet: Service thread pid 31076 was inactive for 201.15s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:26:25 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472385.31349 Apr 28 10:26:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to e2a571ed-a09d-5b66-3666-df63bf8e2019 (at 10.8.10.33@o2ib6) Apr 28 10:26:47 fir-md1-s2 kernel: Lustre: Skipped 3936 previous similar messages Apr 28 10:26:55 fir-md1-s2 kernel: LNet: Service thread pid 31395 was inactive for 200.14s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:26:55 fir-md1-s2 kernel: LNet: Skipped 301 previous similar messages Apr 28 10:26:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472415.31395 Apr 28 10:27:03 fir-md1-s2 kernel: LustreError: 30843:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472333, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986b3b49ba80/0x757b416b99ecdcfe lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 39 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 30843 timeout: 0 lvb_type: 0 Apr 28 10:27:03 fir-md1-s2 kernel: LustreError: 30843:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 19 previous similar messages Apr 28 10:27:26 fir-md1-s2 kernel: LNet: Service thread pid 31354 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:27:26 fir-md1-s2 kernel: LNet: Skipped 14 previous similar messages Apr 28 10:27:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472446.31354 Apr 28 10:27:46 fir-md1-s2 kernel: Lustre: 30846:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff986a1c287200 x1631748503163936/t0(0) o101->7e28a8a4-20f9-6ff9-2375-1f7c023f76c3@10.8.21.3@o2ib6:21/0 lens 576/0 e 0 to 0 dl 1556472471 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 10:27:46 fir-md1-s2 kernel: Lustre: 31224:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9871e7a0d100 x1631559671242624/t0(0) o101->408be9ad-1c10-b6aa-e3da-e3970b5ae7cb@10.8.8.4@o2ib6:21/0 lens 576/0 e 0 to 0 dl 1556472471 ref 2 fl New:/2/ffffffff rc 0/-1 Apr 28 10:27:46 fir-md1-s2 kernel: Lustre: 31224:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5747 previous similar messages Apr 28 10:27:46 fir-md1-s2 kernel: Lustre: 30846:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 28 10:27:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a521b5a8-ea52-b6d4-a4ea-eb91fa13faf3 (at 10.8.11.34@o2ib6) reconnecting Apr 28 10:27:52 fir-md1-s2 kernel: Lustre: Skipped 4116 previous similar messages Apr 28 10:27:57 fir-md1-s2 kernel: LNet: Service thread pid 31382 was inactive for 200.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:27:57 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Apr 28 10:27:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472477.31382 Apr 28 10:28:03 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.17.22@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff984b529f1f80/0x757b416b99ebea4a lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d1:0x0].0x0 bits 0x40/0x0 rrc: 39 type: IBT flags: 0x60200400000020 nid: 10.8.17.22@o2ib6 remote: 0xc646eb3e4014c567 expref: 12 pid: 31409 timeout: 471877 lvb_type: 0 Apr 28 10:28:03 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 10:28:03 fir-md1-s2 kernel: LNet: Service thread pid 31350 completed after 299.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:28:03 fir-md1-s2 kernel: Lustre: 30671:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:145s); client may timeout. req@ffff9873312f0f00 x1631814846745040/t0(0) o101->5b865b70-b09a-6289-5a82-a7dcb8a2f3ae@10.9.101.48@o2ib4:8/0 lens 608/0 e 0 to 0 dl 1556472338 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 28 10:28:03 fir-md1-s2 kernel: LustreError: 30843:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.26.9@o2ib6: deadline 100:44s ago req@ffff986b30e26600 x1631868645485968/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556472439 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:28:03 fir-md1-s2 kernel: LustreError: 30843:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Apr 28 10:28:03 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Apr 28 10:28:54 fir-md1-s2 kernel: LNet: Service thread pid 31377 was inactive for 200.72s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:28:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472534.31377 Apr 28 10:28:59 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472539.31378 Apr 28 10:29:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 16146ee7-6409-e879-4132-04fda077b6bd (at 10.8.13.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff986aa91be000, cur 1556472560 expire 1556472410 last 1556472333 Apr 28 10:29:33 fir-md1-s2 kernel: LustreError: 31413:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472483, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff985b010e1440/0x757b416b99ed40e8 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x13/0x8 rrc: 632 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 31413 timeout: 0 lvb_type: 0 Apr 28 10:29:33 fir-md1-s2 kernel: LustreError: 31413:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 28 10:30:33 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.12.22@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff987a6ee272c0/0x757b416b99eba9c2 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 632 type: IBT flags: 0x60200400000020 nid: 10.8.12.22@o2ib6 remote: 0xc3978a14320d23de expref: 11 pid: 30671 timeout: 472027 lvb_type: 0 Apr 28 10:30:33 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 28 10:30:33 fir-md1-s2 kernel: LNet: Service thread pid 31103 completed after 449.23s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:30:33 fir-md1-s2 kernel: Lustre: 31224:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:109s); client may timeout. req@ffff987a11b55400 x1631559495693456/t0(0) o101->59eb580d-79f6-c9fa-7886-60aca8aaf8c9@10.9.101.22@o2ib4:13/0 lens 656/0 e 0 to 0 dl 1556472524 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:30:33 fir-md1-s2 kernel: Lustre: 31224:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6319 previous similar messages Apr 28 10:30:33 fir-md1-s2 kernel: LustreError: 31224:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.17.22@o2ib6: deadline 30:115s ago req@ffff987a9cce7500 x1631534820374448/t0(0) o400->@:8/0 lens 224/0 e 0 to 0 dl 1556472518 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:30:33 fir-md1-s2 kernel: LustreError: 31224:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 9 previous similar messages Apr 28 10:30:33 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 28 10:31:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to dacb83f0-b432-ea21-cf1b-fb1ac63fd0b0 (at 10.9.101.62@o2ib4) Apr 28 10:31:03 fir-md1-s2 kernel: Lustre: Skipped 7837 previous similar messages Apr 28 10:31:23 fir-md1-s2 kernel: LNet: Service thread pid 31406 was inactive for 200.21s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 28 10:31:23 fir-md1-s2 kernel: LNet: Skipped 22 previous similar messages Apr 28 10:31:23 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472683.31406 Apr 28 10:31:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472684.31397 Apr 28 10:31:34 fir-md1-s2 kernel: LNet: Service thread pid 31379 was inactive for 200.71s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:31:34 fir-md1-s2 kernel: Pid: 31379, comm: mdt03_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:34 fir-md1-s2 kernel: Call Trace: Apr 28 10:31:34 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:34 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:34 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 28 10:31:34 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:34 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:34 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:35 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:35 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:31:35 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472695.31379 Apr 28 10:31:35 fir-md1-s2 kernel: Pid: 31365, comm: mdt03_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:31:35 fir-md1-s2 kernel: Call Trace: Apr 28 10:31:35 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:31:35 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:31:35 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:31:35 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:31:35 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:31:35 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:31:35 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:31:35 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:31:35 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:31:35 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:32:03 fir-md1-s2 kernel: LustreError: 31191:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472633, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff986ac85ff740/0x757b416b99eda1b4 lrc: 3/0,1 mode: --/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 632 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 31191 timeout: 0 lvb_type: 0 Apr 28 10:32:03 fir-md1-s2 kernel: LustreError: 31191:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Apr 28 10:32:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 40047af1-727c-af36-6cf4-0ce2eaf8f0e0 (at 10.8.7.28@o2ib6) reconnecting Apr 28 10:32:09 fir-md1-s2 kernel: Lustre: Skipped 7788 previous similar messages Apr 28 10:33:03 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.13.17@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff986a206b9b00/0x757b416b99ebab97 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 632 type: IBT flags: 0x60200400000020 nid: 10.8.13.17@o2ib6 remote: 0xbec60fa11c95a72f expref: 38 pid: 30842 timeout: 472177 lvb_type: 0 Apr 28 10:33:03 fir-md1-s2 kernel: LNet: Service thread pid 30850 completed after 599.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:33:03 fir-md1-s2 kernel: Lustre: 30848:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:135s); client may timeout. req@ffff986b38fc1800 x1631647168646496/t0(0) o101->ddea348b-e5a4-5330-325a-755d459e8dda@10.9.107.57@o2ib4:18/0 lens 584/0 e 0 to 0 dl 1556472648 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:33:03 fir-md1-s2 kernel: Lustre: 30848:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4594 previous similar messages Apr 28 10:33:03 fir-md1-s2 kernel: LustreError: 31081:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.12.22@o2ib6: deadline 30:110s ago req@ffff986a41524500 x1631534919775664/t0(0) o400->@:13/0 lens 224/0 e 0 to 0 dl 1556472673 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:33:03 fir-md1-s2 kernel: LustreError: 31081:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Apr 28 10:33:03 fir-md1-s2 kernel: LNet: Skipped 7 previous similar messages Apr 28 10:33:54 fir-md1-s2 kernel: LNet: Service thread pid 31191 was inactive for 200.70s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:33:54 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 28 10:33:54 fir-md1-s2 kernel: Pid: 31191, comm: mdt01_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:33:54 fir-md1-s2 kernel: Call Trace: Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:33:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:33:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:33:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472834.31191 Apr 28 10:33:54 fir-md1-s2 kernel: Pid: 31224, comm: mdt02_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:33:54 fir-md1-s2 kernel: Call Trace: Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:33:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:33:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:33:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:34:09 fir-md1-s2 kernel: LNet: Service thread pid 31366 was inactive for 200.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 28 10:34:09 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Apr 28 10:34:09 fir-md1-s2 kernel: Pid: 31366, comm: mdt03_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 28 10:34:09 fir-md1-s2 kernel: Call Trace: Apr 28 10:34:09 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 28 10:34:09 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 28 10:34:09 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 28 10:34:09 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 28 10:34:09 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 28 10:34:09 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 28 10:34:09 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 28 10:34:09 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Apr 28 10:34:09 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 28 10:34:09 fir-md1-s2 kernel: [] 0xffffffffffffffff Apr 28 10:34:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556472849.31366 Apr 28 10:34:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 32b4877a-e8a3-e77c-9d56-903e3045a875 (at 10.8.17.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff987855ef8400, cur 1556472860 expire 1556472710 last 1556472633 Apr 28 10:34:33 fir-md1-s2 kernel: LustreError: 31103:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556472783, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff988b3d4f4380/0x757b416b99ee1121 lrc: 3/1,0 mode: --/PR res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x20/0x0 rrc: 636 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 31103 timeout: 0 lvb_type: 0 Apr 28 10:34:33 fir-md1-s2 kernel: LustreError: 31103:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 28 10:35:33 fir-md1-s2 kernel: LustreError: 30592:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.17.10@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff984fdd2b7080/0x757b416b99ebac70 lrc: 3/0,0 mode: PW/PW res: [0x28001b768:0x1b5d0:0x0].0x0 bits 0x40/0x0 rrc: 639 type: IBT flags: 0x60200400000020 nid: 10.8.17.10@o2ib6 remote: 0xa3af2968c17e4564 expref: 14 pid: 30853 timeout: 472327 lvb_type: 0 Apr 28 10:35:33 fir-md1-s2 kernel: LNet: Service thread pid 31075 completed after 749.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 28 10:35:33 fir-md1-s2 kernel: Lustre: 30836:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff986a89a77200 x1631590743733136/t0(0) o101->3ddfc0e1-d9a8-93ac-6e7d-3e2edb9b897f@10.8.0.65@o2ib6:3/0 lens 576/0 e 0 to 0 dl 1556472813 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:35:33 fir-md1-s2 kernel: Lustre: 30836:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8060 previous similar messages Apr 28 10:35:33 fir-md1-s2 kernel: LustreError: 30836:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.12.22@o2ib6: deadline 100:34s ago req@ffff9865437de600 x1631534919792192/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1556472899 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 28 10:35:33 fir-md1-s2 kernel: LustreError: 30836:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 12 previous similar messages Apr 28 10:35:33 fir-md1-s2 kernel: LNet: Skipped 7 previous similar messages Apr 28 10:36:18 fir-md1-s2 kernel: Lustre: 31075:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98697cfa9800 x1631672945160256/t0(0) o101->1bd832b1-9250-3684-b26a-6a1cc941ff1c@10.9.101.20@o2ib4:23/0 lens 568/0 e 0 to 0 dl 1556472983 ref 2 fl New:/0/ffffffff rc 0/-1 Apr 28 10:36:18 fir-md1-s2 kernel: Lustre: 31075:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20618 previous similar messages