Feb 12 21:30:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 21:30:49 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 12 21:32:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 21:32:03 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 12 21:40:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 21:40:50 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 12 21:42:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 21:42:04 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 12 21:46:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 09a4715a-92a7-230a-a055-3b7920866714 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc40cc38c00, cur 1550036780 expire 1550036630 last 1550036553 Feb 12 21:46:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 12 21:50:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 21:50:51 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 12 21:51:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550037100.21800 Feb 12 21:52:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 21:52:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 12 21:52:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550037179.21324 Feb 12 21:53:20 fir-md1-s1 kernel: LustreError: 21800:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550036900, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc3fb3d1f80/0x6bcd99f659a77bd1 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 16 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21800 timeout: 0 lvb_type: 0 Feb 12 21:54:39 fir-md1-s1 kernel: LustreError: 21324:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550036979, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb4b04d8d80/0x6bcd99f659e433dd lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21324 timeout: 0 lvb_type: 0 Feb 12 21:58:15 fir-md1-s1 kernel: Lustre: 21445:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 21:58:31 fir-md1-s1 kernel: LustreError: 21356:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550037211, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc40cc71b00/0x6bcd99f65a974d0b lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 18 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21356 timeout: 0 lvb_type: 0 Feb 12 21:59:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550037571.20640 Feb 12 21:59:34 fir-md1-s1 kernel: Lustre: 21250:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 22:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 22:00:52 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 12 22:01:10 fir-md1-s1 kernel: LustreError: 20640:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550037370, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca228c8b180/0x6bcd99f65b0c2959 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20640 timeout: 0 lvb_type: 0 Feb 12 22:02:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 22:02:06 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 12 22:03:26 fir-md1-s1 kernel: Lustre: 21858:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 22:04:34 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550037874.21356 Feb 12 22:05:11 fir-md1-s1 kernel: LustreError: 21149:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550037611, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc4d4ffd100/0x6bcd99f65bbb4b1e lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 19 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21149 timeout: 0 lvb_type: 0 Feb 12 22:06:11 fir-md1-s1 kernel: Lustre: 21339:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply Feb 12 22:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 22:10:53 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Feb 12 22:12:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 22:12:07 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Feb 12 22:12:41 fir-md1-s1 kernel: Lustre: 21789:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 12 22:18:00 fir-md1-s1 kernel: Lustre: 20612:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Feb 12 22:20:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550038816.21149 Feb 12 22:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 22:20:54 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Feb 12 22:22:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 22:22:09 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Feb 12 22:28:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550039292.21644 Feb 12 22:29:51 fir-md1-s1 kernel: LustreError: 21644:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550039091, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc435fe3600/0x6bcd99f66030dc18 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 20 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21644 timeout: 0 lvb_type: 0 Feb 12 22:30:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 22:30:55 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Feb 12 22:32:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 22:32:10 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Feb 12 22:34:46 fir-md1-s1 kernel: Lustre: 21828:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 22:34:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550039687.21809 Feb 12 22:36:26 fir-md1-s1 kernel: LustreError: 21809:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550039486, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9eb8700900/0x6bcd99f66185a0d4 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 21 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21809 timeout: 0 lvb_type: 0 Feb 12 22:40:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 22:40:56 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Feb 12 22:41:21 fir-md1-s1 kernel: Lustre: 21318:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 22:42:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 22:42:11 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Feb 12 22:42:24 fir-md1-s1 kernel: LustreError: 21815:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550039844, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9d9a0606c0/0x6bcd99f662b94135 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 22 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21815 timeout: 0 lvb_type: 0 Feb 12 22:47:19 fir-md1-s1 kernel: Lustre: 21330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 22:50:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550040608.21815 Feb 12 22:50:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550040618.21820 Feb 12 22:50:32 fir-md1-s1 kernel: LustreError: 20525:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550040332, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca32beb69c0/0x6bcd99f664326667 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20525 timeout: 0 lvb_type: 0 Feb 12 22:50:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 22:50:57 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Feb 12 22:51:57 fir-md1-s1 kernel: LustreError: 21820:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550040417, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cd4966c21c0/0x6bcd99f66476f9b1 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21820 timeout: 0 lvb_type: 0 Feb 12 22:52:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 22:52:12 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Feb 12 22:52:35 fir-md1-s1 kernel: Lustre: 21831:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Feb 12 22:53:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550040808.21769 Feb 12 22:55:08 fir-md1-s1 kernel: LustreError: 21769:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550040608, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc433b0f740/0x6bcd99f66505bca3 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21769 timeout: 0 lvb_type: 0 Feb 12 22:55:27 fir-md1-s1 kernel: Lustre: 21839:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 22:59:27 fir-md1-s1 kernel: Lustre: 21854:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 12 23:00:04 fir-md1-s1 kernel: Lustre: 21447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 23:00:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 23:00:58 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Feb 12 23:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 23:02:13 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Feb 12 23:04:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eaf59f85-a4cb-323d-5061-5e1bf30e1e75 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb39eee0c00, cur 1550041449 expire 1550041299 last 1550041222 Feb 12 23:04:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 12 23:05:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550041535.20525 Feb 12 23:10:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 23:10:59 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Feb 12 23:12:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 23:12:14 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages Feb 12 23:17:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550042269.21858 Feb 12 23:19:28 fir-md1-s1 kernel: LustreError: 21858:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550042068, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc3e5486540/0x6bcd99f6693a2fda lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21858 timeout: 0 lvb_type: 0 Feb 12 23:21:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 23:21:00 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Feb 12 23:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 23:22:15 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages Feb 12 23:23:43 fir-md1-s1 kernel: LustreError: 21789:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550042323, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc4156ede80/0x6bcd99f669ea86da lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21789 timeout: 0 lvb_type: 0 Feb 12 23:24:23 fir-md1-s1 kernel: Lustre: 21806:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 23:28:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550042887.21789 Feb 12 23:28:38 fir-md1-s1 kernel: Lustre: 21457:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 23:31:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 23:31:02 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Feb 12 23:32:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 23:32:16 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Feb 12 23:38:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fbf97e21-d7bf-78c2-a67f-f6e04220bc0c (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc46bc04000, cur 1550043519 expire 1550043369 last 1550043292 Feb 12 23:38:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 12 23:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 12 23:41:03 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 12 23:41:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550043714.20535 Feb 12 23:42:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 23:42:17 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Feb 12 23:43:34 fir-md1-s1 kernel: LustreError: 20535:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550043514, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cd455d8ad00/0x6bcd99f66d0aba61 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20535 timeout: 0 lvb_type: 0 Feb 12 23:45:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 63263eaa-106a-9396-d424-027f80b6d6db (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb49b0cd000, cur 1550043933 expire 1550043783 last 1550043706 Feb 12 23:45:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 12 23:48:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9fcee8f7-97bf-b7d8-e8fc-46d398ed949c (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc41ba8d800, cur 1550044093 expire 1550043943 last 1550043866 Feb 12 23:48:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 12 23:48:29 fir-md1-s1 kernel: Lustre: 21793:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 12 23:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 12 23:51:04 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Feb 12 23:52:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 12 23:52:18 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Feb 12 23:54:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550044440.21455 Feb 12 23:55:40 fir-md1-s1 kernel: LustreError: 21455:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550044240, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb42ef54ec0/0x6bcd99f66eeee43c lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 29 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21455 timeout: 0 lvb_type: 0 Feb 12 23:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ad11031a-a96a-0b89-84a5-716b36beffd7 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb47b1c1800, cur 1550044560 expire 1550044410 last 1550044333 Feb 12 23:56:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 00:00:35 fir-md1-s1 kernel: Lustre: 21440:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 00:01:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 00:01:05 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Feb 13 00:02:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 00:02:19 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Feb 13 00:02:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550044969.21828 Feb 13 00:04:28 fir-md1-s1 kernel: LustreError: 21828:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550044768, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc409fc2880/0x6bcd99f6713d2705 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 30 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21828 timeout: 0 lvb_type: 0 Feb 13 00:07:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c4214df-2647-4f45-5cc6-7a729161f7a3 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3c11cfc00, cur 1550045253 expire 1550045103 last 1550045026 Feb 13 00:07:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 00:09:23 fir-md1-s1 kernel: Lustre: 21466:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 00:11:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 00:11:06 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Feb 13 00:12:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 00:12:20 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Feb 13 00:12:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550045542.21871 Feb 13 00:14:01 fir-md1-s1 kernel: LustreError: 21871:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550045341, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca45778cec0/0x6bcd99f672ea6b87 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21871 timeout: 0 lvb_type: 0 Feb 13 00:18:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 807d7fbf-b301-cb35-ff6f-385b0544f088 (at 10.8.11.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca483334800, cur 1550045886 expire 1550045736 last 1550045659 Feb 13 00:18:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 00:21:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 00:21:07 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Feb 13 00:21:31 fir-md1-s1 kernel: Lustre: 21811:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 13 00:22:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 00:22:21 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Feb 13 00:24:28 fir-md1-s1 kernel: Lustre: 21773:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Feb 13 00:25:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550046352.20774 Feb 13 00:27:32 fir-md1-s1 kernel: LustreError: 20774:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550046152, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb3b4814a40/0x6bcd99f675649405 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20774 timeout: 0 lvb_type: 0 Feb 13 00:28:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0016f7ca-8f34-8f96-ecfb-03f782a1cda7 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3d8be9c00, cur 1550046514 expire 1550046364 last 1550046287 Feb 13 00:28:34 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 13 00:28:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550046524.21448 Feb 13 00:30:24 fir-md1-s1 kernel: LustreError: 21448:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550046324, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb3b890a880/0x6bcd99f675cca238 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21448 timeout: 0 lvb_type: 0 Feb 13 00:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 00:31:08 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Feb 13 00:32:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 00:32:23 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Feb 13 00:35:03 fir-md1-s1 kernel: Lustre: 20528:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 13 00:35:19 fir-md1-s1 kernel: Lustre: 21822:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 00:35:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550046928.21581 Feb 13 00:37:08 fir-md1-s1 kernel: LustreError: 21581:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550046728, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc409f79b00/0x6bcd99f676bdec84 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 34 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21581 timeout: 0 lvb_type: 0 Feb 13 00:41:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 00:41:09 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Feb 13 00:42:03 fir-md1-s1 kernel: Lustre: 21357:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 00:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 00:42:24 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Feb 13 00:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 00:51:10 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Feb 13 00:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 00:52:25 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Feb 13 00:57:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7565fda6-f350-9e4f-aba5-95f926517952 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb48a82e400, cur 1550048242 expire 1550048092 last 1550048015 Feb 13 00:57:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 01:01:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 01:01:11 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Feb 13 01:02:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 01:02:26 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Feb 13 01:02:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550048574.21793 Feb 13 01:04:34 fir-md1-s1 kernel: LustreError: 21793:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550048374, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca010847080/0x6bcd99f67b3c85f4 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21793 timeout: 0 lvb_type: 0 Feb 13 01:07:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9428e727-0d4e-3ed2-b36a-d4c8d4a846dd (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc41a920800, cur 1550048845 expire 1550048695 last 1550048618 Feb 13 01:07:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 01:09:29 fir-md1-s1 kernel: Lustre: 21812:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 01:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 01:11:12 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Feb 13 01:12:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 01:12:27 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Feb 13 01:18:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d525ad7f-9b95-395d-0b67-64423ee1be5a (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb2d47e5800, cur 1550049521 expire 1550049371 last 1550049294 Feb 13 01:18:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 01:20:38 fir-md1-s1 kernel: LustreError: 21812:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550049338, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cd465d8ca40/0x6bcd99f67d747b64 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 36 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21812 timeout: 0 lvb_type: 0 Feb 13 01:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 01:21:13 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages Feb 13 01:22:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 01:22:28 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Feb 13 01:25:33 fir-md1-s1 kernel: Lustre: 21816:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 01:31:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 01:31:14 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Feb 13 01:32:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 01:32:29 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 01:35:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550050542.21812 Feb 13 01:41:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 01:41:15 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 01:42:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 01:42:30 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 01:51:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 01:51:16 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 01:52:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 01:52:31 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Feb 13 02:01:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 02:01:17 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 02:02:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 02:02:32 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 02:11:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 02:11:19 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Feb 13 02:12:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 02:12:33 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 13 02:15:47 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550052940/real 1550052940] req@ffff9c9e254bf500 x1625309462488544/t0(0) o106->fir-MDT0002@10.8.12.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550052947 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 13 02:15:47 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 13 02:16:08 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550052961/real 1550052961] req@ffff9c9e254bf500 x1625309462488544/t0(0) o106->fir-MDT0002@10.8.12.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550052968 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 13 02:16:08 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 13 02:16:43 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550052996/real 1550052996] req@ffff9c9e254bf500 x1625309462488544/t0(0) o106->fir-MDT0002@10.8.12.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550053003 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 13 02:16:43 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Feb 13 02:17:53 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550053066/real 1550053066] req@ffff9c9e254bf500 x1625309462488544/t0(0) o106->fir-MDT0002@10.8.12.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550053073 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 13 02:17:53 fir-md1-s1 kernel: Lustre: 21346:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Feb 13 02:18:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 22ee14eb-5a96-ad04-6e5f-188b7aec897d (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca47e5d2c00, cur 1550053139 expire 1550052989 last 1550052912 Feb 13 02:18:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 02:19:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 22ee14eb-5a96-ad04-6e5f-188b7aec897d (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca482d80800, cur 1550053140 expire 1550052990 last 1550052913 Feb 13 02:21:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 02:21:20 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Feb 13 02:22:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 02:22:34 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Feb 13 02:25:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550053552.21450 Feb 13 02:27:32 fir-md1-s1 kernel: LustreError: 21450:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053352, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc3f26cba80/0x6bcd99f6865ce82e lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 39 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21450 timeout: 0 lvb_type: 0 Feb 13 02:29:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550053764.20527 Feb 13 02:31:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 02:31:21 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Feb 13 02:31:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550053905.21740 Feb 13 02:31:59 fir-md1-s1 kernel: LustreError: 21575:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053619, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc43a038d80/0x6bcd99f686ea69e4 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 44 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21575 timeout: 0 lvb_type: 0 Feb 13 02:31:59 fir-md1-s1 kernel: LustreError: 21575:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 13 02:32:27 fir-md1-s1 kernel: Lustre: 20815:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 02:32:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 02:32:35 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Feb 13 02:33:24 fir-md1-s1 kernel: LustreError: 21740:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053704, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb424e93f00/0x6bcd99f68718b92f lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 45 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21740 timeout: 0 lvb_type: 0 Feb 13 02:33:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550054036.20528 Feb 13 02:34:31 fir-md1-s1 kernel: LustreError: 21346:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053771, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9d91851b00/0x6bcd99f68738097f lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 45 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21346 timeout: 0 lvb_type: 0 Feb 13 02:34:31 fir-md1-s1 kernel: LustreError: 21346:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 13 02:35:36 fir-md1-s1 kernel: LustreError: 20528:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053836, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb4485b0480/0x6bcd99f68752d444 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 45 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20528 timeout: 0 lvb_type: 0 Feb 13 02:36:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550054182.21575 Feb 13 02:36:41 fir-md1-s1 kernel: LustreError: 21866:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053901, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc459a86e40/0x6bcd99f68770993e lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 45 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21866 timeout: 0 lvb_type: 0 Feb 13 02:36:54 fir-md1-s1 kernel: Lustre: 21791:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 02:36:54 fir-md1-s1 kernel: Lustre: 21791:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Feb 13 02:37:47 fir-md1-s1 kernel: LustreError: 21797:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550053967, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9ecaa40480/0x6bcd99f6878c7baf lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 45 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21797 timeout: 0 lvb_type: 0 Feb 13 02:38:19 fir-md1-s1 kernel: Lustre: 21830:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 02:39:26 fir-md1-s1 kernel: Lustre: 21856:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 02:41:08 fir-md1-s1 kernel: Lustre: 21891:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-32), not sending early reply Feb 13 02:41:08 fir-md1-s1 kernel: Lustre: 21891:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Feb 13 02:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 02:41:22 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages Feb 13 02:42:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 02:42:36 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Feb 13 02:42:42 fir-md1-s1 kernel: Lustre: 21852:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 02:43:16 fir-md1-s1 kernel: Lustre: 21457:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-95), not sending early reply Feb 13 02:43:54 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550054634.21758 Feb 13 02:45:59 fir-md1-s1 kernel: LustreError: 21832:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550054459, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb42430c5c0/0x6bcd99f6884d939b lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 49 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21832 timeout: 0 lvb_type: 0 Feb 13 02:48:01 fir-md1-s1 kernel: LustreError: 21271:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550054581, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc470195340/0x6bcd99f6887aa90c lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 49 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21271 timeout: 0 lvb_type: 0 Feb 13 02:50:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550055015.21866 Feb 13 02:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 02:51:23 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Feb 13 02:52:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 02:52:38 fir-md1-s1 kernel: Lustre: Skipped 40 previous similar messages Feb 13 02:52:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550055171.21797 Feb 13 02:53:29 fir-md1-s1 kernel: Lustre: 21736:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 13 02:55:31 fir-md1-s1 kernel: Lustre: 20533:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 13 02:55:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550055351.21792 Feb 13 02:57:30 fir-md1-s1 kernel: LustreError: 21792:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550055150, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9f9849d100/0x6bcd99f6894954e8 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 50 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21792 timeout: 0 lvb_type: 0 Feb 13 02:57:30 fir-md1-s1 kernel: LustreError: 21792:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 13 03:01:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550055662.21832 Feb 13 03:01:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 03:01:24 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Feb 13 03:02:25 fir-md1-s1 kernel: Lustre: 20818:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 03:02:25 fir-md1-s1 kernel: Lustre: 20818:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Feb 13 03:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 03:02:39 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages Feb 13 03:03:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550055785.21739 Feb 13 03:06:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550055973.21725 Feb 13 03:07:53 fir-md1-s1 kernel: LustreError: 21725:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550055773, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cd484a206c0/0x6bcd99f68a0af3be lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 52 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21725 timeout: 0 lvb_type: 0 Feb 13 03:10:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550056229.21767 Feb 13 03:11:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 03:11:25 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Feb 13 03:12:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 03:12:40 fir-md1-s1 kernel: Lustre: Skipped 45 previous similar messages Feb 13 03:15:23 fir-md1-s1 kernel: Lustre: 21711:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply Feb 13 03:21:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 03:21:26 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Feb 13 03:22:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 03:22:41 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Feb 13 03:24:42 fir-md1-s1 kernel: LustreError: 21248:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550056782, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb3b556a400/0x6bcd99f68b31b95b lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 56 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21248 timeout: 0 lvb_type: 0 Feb 13 03:24:42 fir-md1-s1 kernel: LustreError: 21248:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 13 03:29:37 fir-md1-s1 kernel: Lustre: 20645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 03:29:37 fir-md1-s1 kernel: Lustre: 20645:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Feb 13 03:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 03:31:27 fir-md1-s1 kernel: Lustre: Skipped 48 previous similar messages Feb 13 03:32:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 03:32:42 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Feb 13 03:37:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550057829.21329 Feb 13 03:38:11 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550057891.20532 Feb 13 03:38:49 fir-md1-s1 kernel: LustreError: 21329:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550057629, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9d7891af40/0x6bcd99f68c02c382 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21329 timeout: 0 lvb_type: 0 Feb 13 03:38:49 fir-md1-s1 kernel: LustreError: 21329:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Feb 13 03:39:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550057985.21248 Feb 13 03:40:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550058015.21721 Feb 13 03:40:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550058016.20536 Feb 13 03:40:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550058050.21836 Feb 13 03:41:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 03:41:28 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages Feb 13 03:41:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550058112.21895 Feb 13 03:42:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 03:42:43 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Feb 13 03:43:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550058215.21318 Feb 13 03:43:44 fir-md1-s1 kernel: Lustre: 21736:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 03:43:44 fir-md1-s1 kernel: Lustre: 21736:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Feb 13 03:51:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 03:51:29 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Feb 13 03:52:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 03:52:44 fir-md1-s1 kernel: Lustre: Skipped 56 previous similar messages Feb 13 03:53:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550058833.21884 Feb 13 04:01:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0d5aa1b6-c4ff-4332-61aa-6339adb4fe2a (at 10.9.107.24@o2ib4) reconnecting Feb 13 04:01:30 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Feb 13 04:02:17 fir-md1-s1 kernel: LustreError: 20616:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550059037, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca3d09c0240/0x6bcd99f68d373f2f lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 63 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20616 timeout: 0 lvb_type: 0 Feb 13 04:02:17 fir-md1-s1 kernel: LustreError: 20616:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Feb 13 04:02:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 04:02:45 fir-md1-s1 kernel: Lustre: Skipped 61 previous similar messages Feb 13 04:07:12 fir-md1-s1 kernel: Lustre: 21341:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 04:07:12 fir-md1-s1 kernel: Lustre: 21341:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Feb 13 04:11:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 83e72c3d-872c-7a5f-f5c1-edf566d41d60 (at 10.9.107.1@o2ib4) reconnecting Feb 13 04:11:31 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Feb 13 04:12:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.5@o2ib4) Feb 13 04:12:46 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Feb 13 04:14:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550060083.21803 Feb 13 04:16:23 fir-md1-s1 kernel: LustreError: 21803:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550059883, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc4147a33c0/0x6bcd99f68dd55ae1 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 66 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21803 timeout: 0 lvb_type: 0 Feb 13 04:17:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550060238.20616 Feb 13 04:18:21 fir-md1-s1 kernel: LustreError: 21339:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550060001, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca100b0c5c0/0x6bcd99f68dec7299 lrc: 3/1,0 mode: --/PR res: [0x2c0001541:0x82f7:0x0].0x0 bits 0x13/0x0 rrc: 66 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21339 timeout: 0 lvb_type: 0 Feb 13 04:20:56 fir-md1-s1 kernel: Lustre: 21067:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (133:38369s); client may timeout. req@ffff9cd43ff83600 x1624748520738048/t38705069379(0) o36->3d122a91-53f0-f449-1f10-d08490897e63@10.9.106.65@o2ib4:292/0 lens 504/424 e 5 to 0 dl 1550022087 ref 1 fl Complete:/0/0 rc 0/0 Feb 13 04:20:56 fir-md1-s1 kernel: Lustre: 21067:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 13 previous similar messages Feb 13 06:44:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3724d9a0-17fd-0e0a-31ff-19cac289c2d8 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca47e46b400, cur 1550069084 expire 1550068934 last 1550068857 Feb 13 06:44:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 13 06:47:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.21.19@o2ib6) Feb 13 06:47:36 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Feb 13 07:11:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 54ce30c9-3bed-a43b-b985-a5c2f449d263 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb494e0dc00, cur 1550070665 expire 1550070515 last 1550070438 Feb 13 07:11:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 07:14:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.21.19@o2ib6) Feb 13 07:14:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 08:31:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 170953e0-8a7f-c44f-f842-e35ed505b2cf (at 10.9.108.17@o2ib4) Feb 13 08:31:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 08:44:56 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550076289/real 1550076289] req@ffff9c9eb6a78600 x1625309505950016/t0(0) o106->fir-MDT0002@10.8.14.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550076296 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 13 08:44:56 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Feb 13 08:45:17 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550076310/real 1550076310] req@ffff9c9eb6a78600 x1625309505950016/t0(0) o106->fir-MDT0002@10.8.14.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550076317 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 13 08:45:17 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 13 08:45:52 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550076345/real 1550076345] req@ffff9c9eb6a78600 x1625309505950016/t0(0) o106->fir-MDT0002@10.8.14.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550076352 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 13 08:45:52 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Feb 13 08:47:02 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550076415/real 1550076415] req@ffff9c9eb6a78600 x1625309505950016/t0(0) o106->fir-MDT0002@10.8.14.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1550076422 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 13 08:47:02 fir-md1-s1 kernel: Lustre: 21339:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Feb 13 08:47:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 488cfd93-1121-504d-019d-485c13be114d (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca48295e000, cur 1550076465 expire 1550076315 last 1550076238 Feb 13 08:47:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 08:47:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 488cfd93-1121-504d-019d-485c13be114d (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9519010400, cur 1550076470 expire 1550076320 last 1550076243 Feb 13 08:47:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7cba68d6-88e2-7226-0b4f-cb83c3107f8f (at 10.9.102.34@o2ib4) reconnecting Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: Skipped 46 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.115.8@o2ib4) Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.107.60@o2ib4, removing former export from same NID Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: 20337:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550077297/real 1550077298] req@ffff9cc45918d400 x1625309508319424/t0(0) o13->fir-OST002f-osc-MDT0002@10.0.10.108@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1550077304 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: 20337:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: fir-OST002f-osc-MDT0002: Connection to fir-OST002f (at 10.0.10.108@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: 21849:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff9cc45918d100 x1624711786995040/t0(0) o400->085fe76c-3bbe-b7f0-aec1-b26db125ea9d@10.8.10.27@o2ib6:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: 21849:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 5 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: 21843:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=1, svcEst=31, delay=5931 Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.12.27@o2ib6, removing former export from same NID Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.17.5@o2ib6) Feb 13 09:01:45 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Feb 13 09:01:46 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.8.21@o2ib6, removing former export from same NID Feb 13 09:01:46 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 13 09:01:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.22@o2ib6) Feb 13 09:01:47 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages Feb 13 09:01:48 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.67@o2ib6, removing former export from same NID Feb 13 09:01:48 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Feb 13 09:01:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.13.3@o2ib6) Feb 13 09:01:51 fir-md1-s1 kernel: Lustre: Skipped 396 previous similar messages Feb 13 09:01:52 fir-md1-s1 kernel: LustreError: 21619:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9cd451bce450 x1624748079209280/t0(0) o4->033127f9-d684-a263-9e10-535f772c4f1a@10.9.106.25@o2ib4:426/0 lens 488/448 e 1 to 0 dl 1550077336 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:01:52 fir-md1-s1 kernel: LustreError: 21619:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 19 previous similar messages Feb 13 09:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 033127f9-d684-a263-9e10-535f772c4f1a (at 10.9.106.25@o2ib4), client will retry: rc = -110 Feb 13 09:01:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.10@o2ib6, removing former export from same NID Feb 13 09:01:52 fir-md1-s1 kernel: Lustre: Skipped 227 previous similar messages Feb 13 09:01:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with fc60883d-f9c1-82aa-8312-f53a10d6b6ff (at 10.8.9.1@o2ib6), client will retry: rc = -110 Feb 13 09:01:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 13 09:01:54 fir-md1-s1 kernel: Lustre: fir-OST0004-osc-MDT0000: Connection to fir-OST0004 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Feb 13 09:01:54 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Feb 13 09:01:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 033127f9-d684-a263-9e10-535f772c4f1a (at 10.9.106.25@o2ib4), client will retry: rc = -110 Feb 13 09:01:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 13 09:02:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.16.5@o2ib6) Feb 13 09:02:01 fir-md1-s1 kernel: Lustre: Skipped 374 previous similar messages Feb 13 09:02:08 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.3@o2ib6, removing former export from same NID Feb 13 09:02:08 fir-md1-s1 kernel: Lustre: Skipped 65 previous similar messages Feb 13 09:02:14 fir-md1-s1 kernel: Lustre: 21758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1550077296/real 1550077307] req@ffff9cc431cc1b00 x1625309508306912/t0(0) o104->fir-MDT0002@10.9.105.71@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1550077303 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Feb 13 09:02:14 fir-md1-s1 kernel: Lustre: 21758:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 46 previous similar messages Feb 13 09:02:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.103.35@o2ib4) Feb 13 09:02:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 13 09:02:21 fir-md1-s1 kernel: LustreError: 21616:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9ca3522eb050 x1624705837563408/t0(0) o4->ff5b2a25-2019-8855-6f34-e3245844a49b@10.8.6.25@o2ib6:453/0 lens 488/448 e 2 to 0 dl 1550077363 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:02:21 fir-md1-s1 kernel: LustreError: 21616:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 4 previous similar messages Feb 13 09:02:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with ff5b2a25-2019-8855-6f34-e3245844a49b (at 10.8.6.25@o2ib6), client will retry: rc = -110 Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: 21836:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff9cb454f0e600 x1624712381231168/t0(0) o101->43c491ee-b68b-5359-c63e-5195c978bbc4@10.8.18.28@o2ib6:0/0 lens 1768/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.24.18@o2ib6, removing former export from same NID Feb 13 09:02:43 fir-md1-s1 kernel: LustreError: 21678:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9cd451bc9850 x1624758604721616/t0(0) o4->0343f8c1-f803-943e-238c-e83a0eb1a3ba@10.9.106.34@o2ib4:472/0 lens 488/448 e 3 to 0 dl 1550077382 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 0343f8c1-f803-943e-238c-e83a0eb1a3ba (at 10.9.106.34@o2ib4), client will retry: rc = -110 Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=4 reqQ=0 recA=4, svcEst=57, delay=8468 Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9ca33fd8e450 x1624764084917936/t0(0) o4->f51b1f69-6751-8ab9-8118-6a0d3a2fd4f3@10.8.28.9@o2ib6:451/0 lens 488/448 e 2 to 0 dl 1550077361 ref 2 fl Interpret:/0/0 rc 0/0 Feb 13 09:02:43 fir-md1-s1 kernel: Lustre: 21616:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Feb 13 09:02:44 fir-md1-s1 kernel: Lustre: fir-OST0017-osc-MDT0002: Connection to fir-OST0017 (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Feb 13 09:02:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 13 09:02:47 fir-md1-s1 kernel: Lustre: 20340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550077358/real 1550077363] req@ffff9cc3dfc1ce00 x1625309525848992/t0(0) o13->fir-OST0005-osc-MDT0000@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1550077365 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 Feb 13 09:02:47 fir-md1-s1 kernel: Lustre: 20340:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 295 previous similar messages Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 21467:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9ca35044e050 x1624748343693344/t0(0) o4->8feac3a9-5d0e-b456-91aa-b72196a5e39e@10.8.29.4@o2ib6:449/0 lens 488/448 e 2 to 0 dl 1550077359 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:02:47 fir-md1-s1 kernel: Lustre: 21662:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (57:6s); client may timeout. req@ffff9ca34dfdf850 x1624706394962944/t0(0) o4->d3acdff5-12e0-1cdf-9a27-8dcc902cb72a@10.8.6.28@o2ib6:451/0 lens 488/448 e 2 to 0 dl 1550077361 ref 1 fl Complete:/0/ffffffff rc -110/-1 Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 21467:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 3 previous similar messages Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 20241:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cd49c237c00 Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 20241:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cd49c231e00 Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 20241:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cd49c236c00 Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 20240:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cd49c235600 Feb 13 09:02:47 fir-md1-s1 kernel: LustreError: 20243:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cb457f36000 Feb 13 09:02:48 fir-md1-s1 kernel: LustreError: 21560:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9cc3faf50c50 x1624748343693392/t0(0) o4->8feac3a9-5d0e-b456-91aa-b72196a5e39e@10.8.29.4@o2ib6:469/0 lens 488/448 e 3 to 0 dl 1550077379 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:02:48 fir-md1-s1 kernel: Lustre: 21489:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff9cc3f8439850 x1624745420761600/t0(0) o4->400e2c70-3670-eb05-66c0-e754ea5cd280@10.8.29.7@o2ib6:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Feb 13 09:02:48 fir-md1-s1 kernel: Lustre: 21489:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 3 previous similar messages Feb 13 09:02:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.106.34@o2ib4) Feb 13 09:02:51 fir-md1-s1 kernel: Lustre: Skipped 453 previous similar messages Feb 13 09:02:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 69e04fcf-27a0-cb59-92a0-ef1d06a212ef (at 10.9.104.6@o2ib4) reconnecting Feb 13 09:02:58 fir-md1-s1 kernel: Lustre: Skipped 937 previous similar messages Feb 13 09:03:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 135b49e4-9bbb-43fc-66e9-1f7ec8c75a96 (at 10.9.113.3@o2ib4), client will retry: rc = -110 Feb 13 09:03:07 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 13 09:03:16 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.0.65@o2ib6, removing former export from same NID Feb 13 09:03:16 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Feb 13 09:03:21 fir-md1-s1 kernel: LustreError: 20571:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9cb3dfcbc850 x1624745420761632/t0(0) o4->400e2c70-3670-eb05-66c0-e754ea5cd280@10.8.29.7@o2ib6:519/0 lens 488/448 e 0 to 0 dl 1550077429 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:03:21 fir-md1-s1 kernel: LustreError: 21932:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9cb4c4034c50 x1624745420761696/t0(0) o4->400e2c70-3670-eb05-66c0-e754ea5cd280@10.8.29.7@o2ib6:528/0 lens 488/448 e 0 to 0 dl 1550077438 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:03:21 fir-md1-s1 kernel: LustreError: 21932:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 3 previous similar messages Feb 13 09:03:21 fir-md1-s1 kernel: LustreError: 20571:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages Feb 13 09:03:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with 3d5ac7cc-5cb1-6aef-d2d6-2af538ce79c6 (at 10.8.25.22@o2ib6), client will retry: rc -110 Feb 13 09:03:28 fir-md1-s1 kernel: Lustre: fir-OST0006-osc-MDT0002: Connection to fir-OST0006 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Feb 13 09:03:28 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Feb 13 09:03:31 fir-md1-s1 kernel: Lustre: 21657:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff9cc2a02a3600 x1624745420763568/t0(0) o101->400e2c70-3670-eb05-66c0-e754ea5cd280@10.8.29.7@o2ib6:0/0 lens 1768/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Feb 13 09:03:37 fir-md1-s1 kernel: LustreError: 21986:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff9cb4beb57c50 x1624712426096416/t0(0) o4->0edeac5b-ee1e-024f-de97-9e0fc3efb1af@10.8.6.2@o2ib6:536/0 lens 504/448 e 2 to 0 dl 1550077446 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:03:48 fir-md1-s1 kernel: Lustre: 30162:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:2s); client may timeout. req@ffff9cc3f4b02400 x1624736830128000/t0(0) o103->c657c695-de10-8808-d9fb-1a856e787dad@10.8.18.34@o2ib6:516/0 lens 328/0 e 0 to 0 dl 1550077426 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Feb 13 09:03:48 fir-md1-s1 kernel: Lustre: 30162:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Feb 13 09:03:51 fir-md1-s1 kernel: LustreError: 22994:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.105.71@o2ib4: deadline 46:5s ago Feb 13 09:03:51 fir-md1-s1 kernel: LustreError: 22994:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message Feb 13 09:03:51 fir-md1-s1 kernel: Lustre: 22994:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (46:5s); client may timeout. req@ffff9cc42533bc00 x1624699750345952/t0(0) o103->82bcfbd3-d6e5-0967-d3f2-c921c94e988c@10.9.105.71@o2ib4:516/0 lens 328/0 e 0 to 0 dl 1550077426 ref 2 fl Interpret:H/0/ffffffff rc 0/-1 Feb 13 09:03:51 fir-md1-s1 kernel: Lustre: 38429:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9cc3f4b02400 x1624736830128000/t0(0) o103->c657c695-de10-8808-d9fb-1a856e787dad@10.8.18.34@o2ib6:516/0 lens 328/0 e 0 to 0 dl 1550077426 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Feb 13 09:03:51 fir-md1-s1 kernel: Lustre: 38429:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Feb 13 09:03:52 fir-md1-s1 kernel: Lustre: 20811:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550077425/real 1550077425] req@ffff9cb3fc5ead00 x1625309525875584/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1550077432 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 13 09:03:52 fir-md1-s1 kernel: Lustre: 20811:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9609 previous similar messages Feb 13 09:03:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.17.11@o2ib6) Feb 13 09:03:55 fir-md1-s1 kernel: Lustre: Skipped 1843 previous similar messages Feb 13 09:04:05 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.9.105.71@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9cc3e5df3180/0x6bcd99f6a6f1a8e7 lrc: 3/0,0 mode: PR/PR res: [0x2c00016d5:0x9c:0x0].0x0 bits 0x40/0x0 rrc: 1090801 type: IBT flags: 0x60000400010020 nid: 10.9.105.71@o2ib4 remote: 0xf3592e10fedd7d21 expref: 1095702 pid: 20581 timeout: 62424 lvb_type: 0 Feb 13 09:04:05 fir-md1-s1 kernel: Lustre: 20503:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (46:3s); client may timeout. req@ffff9cc2a6a88f00 x1624705426306432/t0(0) o103->8c99250d-76e9-284a-5406-5556d3865e14@10.8.29.5@o2ib6:532/0 lens 328/0 e 0 to 0 dl 1550077442 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Feb 13 09:04:05 fir-md1-s1 kernel: Lustre: 20503:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 21 previous similar messages Feb 13 09:04:06 fir-md1-s1 kernel: LustreError: 21260:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.105.71@o2ib4: deadline 46:4s ago Feb 13 09:04:06 fir-md1-s1 kernel: LustreError: 21260:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 355 previous similar messages Feb 13 09:04:06 fir-md1-s1 kernel: Lustre: 22970:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9cc28fb6f500 x1624699749625024/t0(0) o103->82bcfbd3-d6e5-0967-d3f2-c921c94e988c@10.9.105.71@o2ib4:532/0 lens 328/0 e 0 to 0 dl 1550077442 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 Feb 13 09:04:06 fir-md1-s1 kernel: Lustre: 22970:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Feb 13 09:04:27 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.19.1@o2ib6, removing former export from same NID Feb 13 09:04:27 fir-md1-s1 kernel: LustreError: 20577:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9cd451bc9c50 x1624748505084592/t0(0) o4->8f273097-d34e-d604-e427-2da4f99ca32a@10.9.106.26@o2ib4:571/0 lens 488/448 e 0 to 0 dl 1550077481 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:04:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 8f273097-d34e-d604-e427-2da4f99ca32a (at 10.9.106.26@o2ib4), client will retry: rc = -110 Feb 13 09:04:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 13 09:04:27 fir-md1-s1 kernel: Lustre: Skipped 1321 previous similar messages Feb 13 09:04:40 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.9.107.26@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff9ca479093f00/0x6bcd99f6a8aeb431 lrc: 3/0,0 mode: PR/PR res: [0x200003745:0x3049:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.107.26@o2ib4 remote: 0x1305149723726bf8 expref: 326 pid: 21838 timeout: 62459 lvb_type: 0 Feb 13 09:04:40 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1090748 previous similar messages Feb 13 09:04:40 fir-md1-s1 kernel: LustreError: 21794:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9ca482923400 ns: mdt-fir-MDT0000_UUID lock: ffff9cd4628b9f80/0x6bcd99f6a8aeb52d lrc: 3/0,0 mode: PW/PW res: [0x200003745:0x3049:0x0].0x0 bits 0x40/0x0 rrc: 2 type: IBT flags: 0x50200000000000 nid: 10.9.107.26@o2ib4 remote: 0x1305149723726c37 expref: 33 pid: 21794 timeout: 0 lvb_type: 0 Feb 13 09:04:42 fir-md1-s1 kernel: LustreError: 21625:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9ca34dfdec50 x1624712252668752/t0(0) o4->519ac832-a77f-51ba-e3c7-51aa4fe15024@10.8.15.3@o2ib6:597/0 lens 488/448 e 2 to 0 dl 1550077507 ref 1 fl Interpret:/2/0 rc 0/0 Feb 13 09:04:42 fir-md1-s1 kernel: LustreError: 21625:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message Feb 13 09:04:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550077496.21758 Feb 13 09:05:00 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.102.1@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff9ca361dd8000/0x6bcd99f6a8aed8ea lrc: 3/0,0 mode: PR/PR res: [0x20000177f:0x3a33:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.9.102.1@o2ib4 remote: 0x64fad7802cc76e15 expref: 295 pid: 21341 timeout: 62479 lvb_type: 0 Feb 13 09:05:17 fir-md1-s1 kernel: LustreError: 21984:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(114688) req@ffff9cb423a4f850 x1624712426096416/t0(0) o4->0edeac5b-ee1e-024f-de97-9e0fc3efb1af@10.8.6.2@o2ib6:626/0 lens 504/448 e 0 to 0 dl 1550077536 ref 1 fl Interpret:/2/0 rc 0/0 Feb 13 09:05:17 fir-md1-s1 kernel: LustreError: 21924:0:(ldlm_lib.c:3273:target_bulk_io()) @@@ truncated bulk READ 0(4096) req@ffff9cb3b4320000 x1624672695777584/t0(0) o37->50290e53-c65d-a70c-6960-ed601e5d1ddb@10.8.1.35@o2ib6:615/0 lens 448/440 e 3 to 0 dl 1550077525 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:05:17 fir-md1-s1 kernel: LustreError: 21984:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 1 previous similar message Feb 13 09:05:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 62fbe12e-ecd5-1b64-bac3-afb945463f32 (at 10.9.101.34@o2ib4) reconnecting Feb 13 09:05:28 fir-md1-s1 kernel: Lustre: Skipped 2384 previous similar messages Feb 13 09:05:45 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.106.18@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9cb407b75100/0x6bcd99f6a89fd389 lrc: 3/0,0 mode: PR/PR res: [0x2c000339a:0x10ea0:0x0].0x0 bits 0x5b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.106.18@o2ib4 remote: 0x8ad5c2e089701280 expref: 1314 pid: 20526 timeout: 62524 lvb_type: 0 Feb 13 09:05:45 fir-md1-s1 kernel: LustreError: 21324:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9ca47e106c00 ns: mdt-fir-MDT0002_UUID lock: ffff9cb3b90af740/0x6bcd99f6a8af0f54 lrc: 1/0,0 mode: --/PW res: [0x2c000339a:0x10ea0:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x54a01000000000 nid: 10.9.106.18@o2ib4 remote: 0x8ad5c2e0897017e3 expref: 1060 pid: 21324 timeout: 0 lvb_type: 0 Feb 13 09:06:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.24.31@o2ib6) Feb 13 09:06:04 fir-md1-s1 kernel: Lustre: Skipped 2573 previous similar messages Feb 13 09:07:30 fir-md1-s1 kernel: LustreError: 21758:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550077350, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cd45c3fb180/0x6bcd99f6a8aa4dcb lrc: 3/0,1 mode: --/PW res: [0x2c00016d5:0x9c:0x0].0x0 bits 0x40/0x0 rrc: 1089612 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 21758 timeout: 0 lvb_type: 0 Feb 13 09:08:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1550077731.21733 Feb 13 09:12:10 fir-md1-s1 kernel: LustreError: 21374:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x200001768:0x2f9d:0x0] Feb 13 09:13:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3c603ef7-b311-5012-38a9-d1ff9ba9b526 (at 10.9.104.13@o2ib4) reconnecting Feb 13 09:13:32 fir-md1-s1 kernel: Lustre: Skipped 229 previous similar messages Feb 13 09:13:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.104.13@o2ib4) Feb 13 09:13:32 fir-md1-s1 kernel: Lustre: Skipped 33 previous similar messages Feb 13 09:13:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.113.2@o2ib4, removing former export from same NID Feb 13 09:13:37 fir-md1-s1 kernel: Lustre: Skipped 410 previous similar messages Feb 13 09:13:55 fir-md1-s1 kernel: LustreError: 20571:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x200003797:0x19a5:0x0] Feb 13 09:14:27 fir-md1-s1 kernel: LustreError: 21497:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9cc3f843ec50 x1624748079770800/t0(0) o4->033127f9-d684-a263-9e10-535f772c4f1a@10.9.106.25@o2ib4:453/0 lens 488/448 e 0 to 0 dl 1550078118 ref 1 fl Interpret:/0/0 rc 0/0 Feb 13 09:14:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Bulk IO write error with 033127f9-d684-a263-9e10-535f772c4f1a (at 10.9.106.25@o2ib4), client will retry: rc = -110 Feb 13 09:14:27 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 13 09:14:49 fir-md1-s1 kernel: Lustre: 21834:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply Feb 13 09:20:42 fir-md1-s1 kernel: LustreError: 21791:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9cc427ab8000 x1625309528987984/t0(0) o104->fir-MDT0002@10.9.105.71@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Feb 13 09:20:42 fir-md1-s1 kernel: LustreError: 21791:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 100516 previous similar messages Feb 13 09:23:07 fir-md1-s1 kernel: LustreError: 20509:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 145s: evicting client at 10.9.105.71@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9cc447be6300/0x6bcd99f6a8aa0093 lrc: 3/0,0 mode: PR/PR res: [0x2c000155d:0x16e7f:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.105.71@o2ib4 remote: 0xf3592e10ff3eb0b5 expref: 1084234 pid: 21849 timeout: 63566 lvb_type: 0 Feb 13 09:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client da7f09b2-cefc-256e-025a-17645b86ee8d (at 10.9.112.1@o2ib4) reconnecting Feb 13 09:23:56 fir-md1-s1 kernel: Lustre: Skipped 113 previous similar messages Feb 13 09:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.112.1@o2ib4) Feb 13 09:23:56 fir-md1-s1 kernel: Lustre: Skipped 168 previous similar messages